YouTube's Automatic Captioning Just Got Even Smarter

24 March 2017, 15:28 | Updated: 17 July 2017, 12:25

We the Unicorns

By Josh Lee

YouTube's automatic captioning can now recognise laughter, applause and music in your videos thanks to some nifty artificial intelligence.

YouTube's automatic captioning has grown in sophistication since its introduction in 2009. Now, with a billion videos making use of the service, the platform has upped the ante by introducing automatic sound effect captioning.

Automatic captioning can now identify laughter, applause and music, according to YouTube's blog.


These changes will help provide important context clues for users who are hard of hearing. Deaf YouTubers have long campaigned for YouTube and Creators to improve captioning, which - despite its many plus-points - is still pretty gaffe-prone.

According to the blogpost, sound effect captioning is just the start of YouTube's drive to make captions even more accessible. They hope that it "will spur further work and discussion in the community around improving captions using not only automatic techniques, but also around ways to make creator-generated and community-contributed caption tracks richer."

The three sounds were chosen because they are the easiest to identify.

YouTube explained: “While the sound space is obviously far richer and provides even more contextually relevant information than these three classes, the semantic information conveyed by these sound effects in the caption track is relatively unambiguous, as opposed to sounds like [RING] which raises the question of ‘what was it that rang – a bell, an alarm, a phone?’”

You can find out more about the science behind this development on YouTube's blog. Or you can see it in action using this interview with Janelle Monae and Pharrell Williams: