How To Extract the Transcript for Captions on YouTube Videos

Posted about 2 years ago.

Few people I know find satisfaction in transcription. At most, it’s a necessary pain.

For those transcribing all or bits of YouTube videos, it’s a double pain because of the difficulty in rewinding or fast forwarding in precise increments.

There’s a way around this though, which is to grab the transcript that uploaders created, or that Google’s captioning system auto-generates.

While the auto-generated captions are often hit and – mostly – miss, they usually give a general sense of what’s being said. This is helpful if you don’t want to sit through a 45 minute talk to see if there’s something in it of note.

More important is that the FCC timeline for US broadcasters to provide better captions is fast approaching. Here are the dates that Internet clips previously shown on TV must be captioned (via the FCC):

  1. January 1, 2016, where the video clip contains a single excerpt of a captioned TV program with the same video and audio that was shown on TV (“straight lift” clips).
  2. January 1, 2017, where a single file contains multiple straight lift video clips (“montages”).
  3. July 1, 2017, for video clips of live and near-live TV programming (such as news or sporting events).

Point being, transcripts from broadcast programs will soon be pretty clean and the screencast above is a way that you can quickly grab them from clips posted to the likes of YouTube.

While I gloss over the XML file below that you end up with when getting captions, note that they are timestamped with a “start” attribute which is important for longer videos. For example, the following starts approximately 193 seconds into the video, or (maths!), at 3 minutes and 13 seconds:

    
     <text start="193.939" dur="5.58">
     really caught between two forces in the Assad regime you were arrested a number
     </text>
     <text start="199.519" dur="6.8">
     of times I think by the Assad regime in 2012 and then you have the ISIS forces
     </text>
    

For those who want a cheat sheet or don’t feel like sitting through the video, here’s an image of the steps you take to get the transcript. If a PDF is more your thing, here’s the same in that format.

Finally, as I was writing this, I learned that you can simply enter the following JavaScript into the Console of the developer tools in your browser.

    
     if(yt.config_.TTS_URL.length) window.location.href=yt.config_.TTS_URL+"&kind=asr&fmt=srv1&lang=en"
     

On a Mac, you enter it here:

  1. Chrome: View > Developer > JavaScript Console
  2. Firefox: Tools > Web Developer > Web Console
  3. Safari: Develop > Show Web Inspector

Note that the JavaScript above ends with lang=en, meaning the language is English. If the transcript is in a different language, replace “en” with the two letter language code of the language you want to retrieve. For example, “ar” for Arabic or “es” for Spanish.

Here’s a list of language codes to help you out.

Happy extracting.