Most journalists find the task of transcribing interviews frustrating and it’s not hard to see why. It is time-taking and quite unwieldy when you have to type out five interviews for a single story on a deadline. The good news is, there are apps and software that can transcribe these interviews automatically and in minutes. These tools make use of artificial intelligence, automatic speech recognition and speech-to-text technologies to reproduce the words.
They work on two models. One lets you upload the recorded clip—audio or video—on the platform and email the transcribed document when it’s ready. The other lets you record the interview on the app and transcribe it in real-time; more like a dictation-taker.
For this week’s newsletter, I am listing out digital tools that journalists in my network use to transcribe or to generate subtitles for video assignments. I have incorporated their feedback along with reviews I scouted online. Plus, I signed up on these platforms to find out how accurately they interpret the Indian accents and whether they are worth the time and money if you decide to buy one of their subscription plans.
OTTER.AI: This was a crowd-favourite and it’s my go-to transcription tool as well. In the free or ‘basic’ plan, you can record and transcribe up to 10 hours of interviews in real-time, per month. However, you can import only three files, of not more than 40 minutes each, which means, you may have to clip your files first. If you buy the ‘pro’ plan, you can record for 100 hours, import unlimited files of four hours each and save 200 new words in its vocabulary every month.
And what about accuracy? For a speech in an Indian accent, it was about 80% accurate. And for accents like American or British, it is likely to be more accurate. The quality and clarity of the audio make a big difference too.
Pros: It identifies different voices well, it provides timestamp at the start of a sentence or a paragraph, the interface is easy-to-use, with an excellent text editor and a smooth playback; simply place the cursor on a word you want to hear again. It transcribes Google Meet calls on the go but you need to have a paid plan to do the same with Zoom sessions.
Cons: The transcription time for a 34-minute clip was about 10 minutes. While the website says the pro plan costs Rs 600, the app quotes Rs 1,100; not sure what’s going on there.
Available on Android, iOS, and web.
GOOGLE LIVE TRANSCRIBE: This came as a surprise. The transcription accuracy was almost 95% in English, 98% in Hindi and 90% in Tamil. Yes, it supports seven regional languages—from Hindi and Urdu to Bengali, Marathi, Tamil and Telugu. The accuracy of the app can be attributed to the fact that it was designed primarily as an accessibility tool for the deaf.
It’s a speech-to-text app. So, you can either record your interview on it or play the file from an external speaker. Make sure the source of the sound is not more than two feet away in either case.
Pros: The accuracy is high and so is the speed. It doesn’t ‘hang’ or skip words as I noticed was the case with Google Docs when I used the speech-to-text option. The interface is simple, and it is free to use.
Cons: It doesn’t have a text editor or a playback feature, which means, you need to export the copy to clean it up. You can’t save the file on the app, so you have to highlight the text and send it over Gmail, for instance. The text auto-deletes after three days.
Available on Android.
GOOGLE PINPOINT: It’s a multi-purpose tool designed by Google for journalists, to help them screen, transcribe and study investigation material. I want to highlight my experience here. I recorded an audio file in my voice and fed it for transcription in UK English first and then in US English. There was quite a difference. It interpreted US English far better, to the tune of 85%.
Pros: With a quick turnaround, the transcription was ready in two minutes. The copy has timestamps and playback buttons. You can upload unlimited files and for free. It has a simple interface and supports a few regional Indian languages such as Marathi, Tamil, Malayalam, and Kannada.
Cons: The transcribed copy is exported in a PDF, so editing it might be tricky.
Available on Web.
TRINT: Like any AI-powered audio transcription out there, it spares you the grunge work and a lot of time. But given the hype around it, I was a bit disappointed to see the transcription results, where simple, common words like ‘cashew’ became ‘cash you’ and ‘many’ ended up as ‘maniatty’. I did two tests and I would score its transcription accuracy to 70%.
Pros: Transcribes pretty fast, identifies different speakers, you can choose the playback speed from 0.2 to 2.0x and the text editor has lots of features (find/replace, highlight, strikethrough). You can upload large files (of up to 3 hours and/or 3GB), generates subtitles for Adobe Premiere® Pro by adding an extension, and the paid version transcribes Zoom calls too.
Cons: Files took a long time to upload, longer than all other platforms I tried. Free use is limited to three files of any size and for seven days, post which the platform deducts the fee equivalent to the plan you have chosen. It didn’t process a .wav file even though it claims to support that format.
Available on iOS and Web
DESCRIPT: This one’s handy if you are working on a project with multiple collaborators and is a one-in-all tool to do screen recording, transcription, editing and publishing. Just like Trint, I had heard and read good reviews about it but I didn’t find my transcription up to the mark. ‘Aggression’ became ‘griffin’ and ‘part of’ became ‘butterfly’. I must mention here I have run the same set of files and interviews on all these tools.
Pro: From uploading to transcribing, this a real time-saver. It displays the timecode for every word you click and a dictionary pops up on double-clicking a word. You can insert a comment anywhere for your collaborator and choose the playback speed between 0.5x to 3x to clean up funny errors.
Cons: The interface is dull but functional. It does not support the .wav format file though. Through the trial period, only three hours of material is allowed after which you can choose to pay USD 12 per month or more depending on the plan you sign up for.
Available as a desktop app.
TEMI: It lives up to the legacy of its creators, who are a team of PhD speech scientists and MIT engineers. The transcription accuracy was above 90% but the remaining 10% was hilarious. Cue in ‘dive processes we’ll go’ instead of ‘thought processes’ and ‘sewing’ for ‘showing’.
Pros: No sign-up was required on the platform. The turnaround time from importing a 34-minute-long file to receiving the transcribed copy over email was a meagre four minutes. The interface is as clean as Otter’s. It detects different speakers. If it can’t process a word or bunch of words, it types out [inaudible] instead, which is a much better notification than skipping the task altogether.
Cons: A lot of words highlighted in brown colour showed up and I don’t know what purpose it serves. Unlike Otter, where you can start the playback by just dropping the cursor on a word, here you need to press and stop the play button manually. It offers a one-time free use for a file of 45 minutes but after that, every transcription cost USD 0.25 per minute.
Available on Android, iOS and web.
SONIX: It’s one among the “industry-leading, speech-to-text algorithms” but looking at what it did with my audio files, I would give it an accuracy score of 85% to 90%. I mean, how would you feel if “three-four days travelling” becomes “California darling”?
Pros: Quick to process and quick to transcribe, it supported the .wav file I was having trouble with on platforms like Trint and Descript. The text editor is feature-packed and can be customised. The playback begins at whichever word you place the cursor on. I haven’t tried this feature but you can also align a transcript with its audio. You can transcribe in Hindi too.
Cons: Signing up is tedious and you need to enter credit card details before you activate the trial period. You’ll have to pay USD 5-10 after using up three hours of free transcription service.
TL;DR: Which audio transcription tool will I vote for? I liked Otter and Google Live Transcribe as they are hassle-free and produce good-enough copies for me to clean up and use later. But your experience and expectations could be different from mine. So, try these apps for free before deciding to buy a higher plan.
A friendly reminder: No matter what you decide, remember these are just tools to speed up your workflow. Don’t forget to listen to the interviewee carefully and make mental and physical notes because if the recording fails, what will you transcribe? Tweet us your pick @inoldnews.