Say What? Easily Transcribing Audio and Video Files Using the Google Cloud
We’re producing more audio and video than ever. Video is a great way to communicate but it’s limited in it’s utility when it comes to searching and analyzing that content.
In this stream, we’re going to build another tiny cloud project. This time, it’s a quick command line tool to take an audio or video file, send it to the Google Cloud, and get a detailed text transcription back.
(* Also super useful for subtitles! )
You can watch the live stream on demand 👆 and there’s more detail below…
The “pre-work” done for this stream was a lot more than the last one.
I’ve worked with this API before but the last time was during the beta for v2. Referencing my old code, I brought forward a few utility functions. Namely, an easy way to upload files to Google Cloud Storage, convert the results from the Google Cloud Speech-to-Text API into something a bit more usable, and to convert those results into a
.srt file for use as video subtitles.
Critically, I also brought forward the hard fought lessons on audio file formats. The Speech-to-Text API is a bit picky when it comes to the file format it accepts for transcription.
The code for the stream is available at https://github.com/marknca/tiny-cloud-projects
The documentation for the RecognitionConfig object
Synchronous and asynchronous calls for the API
During this stream, I ran the code through a Jupyter notebook instead of flipping back and forth between my code editor and the command line. I think this worked really well, as I could also integrate more detailed comments and documentation with Markdown blocks.
Reviewing the video, it definitely looks like a streamlined viewer experience. What did you think?
Let me know on Twitter, where I’m @marknca.