OpenAI transcribed over a million hours of YouTube videos to train its LLMs, Google engaged in same practice
In order to access more reputable English language-based text on the internet in 2021, OpenAI researchers created a speech recognition tool called Whisper, writes The New York Times. It was designed to transcribe audio from YouTube videos, giving the company a trove of data to train its LLMs.
Read Entire Article
https://www.techspot.com/news/102536-openai-transcribed-over-million-hours-youtube-videos-train.html?utm_source=dlvr.it&utm_medium=blogger
Read Entire Article
https://www.techspot.com/news/102536-openai-transcribed-over-million-hours-youtube-videos-train.html?utm_source=dlvr.it&utm_medium=blogger