Formatting a VTT Caption File into a Transcript with Sublime Text and Multiple Regular Expressions

on February 8, 2018

Do you ever need to use a text editor to apply regular expressions to files? If so, this post is for you! If not, you may wanna skip it and just return to it if you ever need it.

Ever since I started working with data, I’ve occasionally had to format raw text files for a variety of reasons. Sometimes it’s a one-time import to a database system, or a one-time analysis. Often it’s not directly related to a database at all, but it’s part of a business process.

I first learned about regular expressions when formatting a large data file: I needed to remove line endings, and regular expressions were the answer.

Throughout my career, those regular expressions have come in handy. I need them just often enough that I can usually get the job done after a bunch of missteps and cursing.

These days, I use a series of regular expressions to format caption files for my training videos

Here’s how my workflow works. I use a Mac, and I list the online providers I use in the steps below.

  • I create and edit videos, then upload them to my host (currently Vimeo)
  • After the videos are processed and available, I put in an order with my caption provider (currently Rev.com, one of Vimeo’s recommended partners for captions) to generate captions for those videos and automatically upload the captions to the video
  • After the captions are done, I review the captions in the editor provided by Rev.com. Some of the captions are great, and some of the captions are… hilariously bad, I suppose depending on who did them. I make corrections in the editor, then press an upload button to update the captions in the video

All of that is human-brain-thinky work, no regular expressions needed.

Here’s the thing: I also want a transcript of the corrected work

Rev.com doesn’t make it easy to download a transcript after my edits. I can download a transcript file from the orders page, but it’s before all the fixes, and I’m not going to do the whole editing process twice.

I figured out a workaround to download the edited captions quickly, then process the text so it’s ready to become a transcript.

  • Click ‘Go to video’ on the pop-up in Rev.com
  • Click the ‘Download’ button on Vimeo, then ‘Download captions and subtitles’. It downloads as a .vtt file (but with a .vtt.txt set of extensions for some reasons)
  • Open the caption file in Sublime Text 3 on my Mac
  • Open the Command Palette (Command + Shift + P on a Mac)
  • Select my custom formatter (a sequence I set up with RegReplace to apply a series of regular expressions) and hit enter

Voila, my text file is no longer formatted as a VTT file, it is now a block of text!

From here, I (or an editor, once I get to the point to pass this whole process off to someone else), can add paragraphs and headings to make this a pretty transcript to go below the video.

The only part of this which takes a while is correcting the goofs in the captions. (Although many of the goofs are funny, they’re pretty distracting if you use the captions or try to read the transcript.)

Need to set up something similar? Here’s the resources I use

  • Download Sublime Text 3 and get comfy with it
  • Install the Reg Replace Sublime Text 3 plugin
  • Follow the Reg Replace User Guide to do two things:
    1. Define your replacements in the reg_replace_rules.sublime-settings file.
      • To open this from Sublime Text 3 in the menus, it is: Sublime Text -> Preferences -> Package Settings -> RegReplace -> Rules - User
      • The replacements I’m using are here
    2. Set up a sequence to call the replacements. I chose to do the method of creating an entry for the Command Palette, because that’s one of the few things I know how to use in Sublime Text. This basically means editing the Default.sublime-commands file.
      • There is probably some way to access this in the menus but… I have no idea. My file is at /Users/kendralittle/Library/Application Support/Sublime Text 3/Packages/User
      • Swap in your username unless you are also a user named kendra little
      • The contents of my Default.sublime-commands file are here

The good news is that once you do this, everything is pretty darn fast and you don’t have to worry about it much.

And it’s sure a heck of a lot faster than editing the VTT files manually.