A.I. to subtitle Netflix content!
Sajid Ali -
Online streaming platforms have broken the language barriers and expanded its global reach. That was entirely possible only because of the “Subtitles” which comes attached to a show or a series. For example, the subtitles made the Academy Award Winning “PARASITE” a global phenomenon. These days the audiences are fine with watching a TV show that’s not in their native language. And OTT’s have somehow succeeded in achieving this. But one challenge every OTT face is making the translations accurate.
Netflix is well-known for its rigorous process in scrutinizing and discarding inaccurate subtitles. So a group of ML researchers at Netflix have introduced a method to overcome this hurdle. They’re calling it as “Automatic Pre-Processing or APP” that acronym is convenient. The researchers claim that the translations given by this app will be close to the native language.
Translation quality for low-resource translation, that is from English into a low resource language, in the Black-Box MT (BBMT) setting is quite challenging. Now the researchers at Netflix have introduced a method to improve via Automatic Pre-Processing (APP) using sentence simplification.
Let’s look at an example if a sentence read “The Vice President should feel free to jump in” and has to be translated into Hindi using Google translate you will get “Vice President should feel free to jump inside”. The system was unable to correctly translate the idiomatic and non-compositional phrase “jump in”.
That definitely looks like an issue but not a grave one. Machine translations (MT) systems are trained usually on smaller sets which gives results that deviate from the context. The challenging task is to grasp phrases, idioms, or complex word language. To be precise, back-translation conveys a different meaning than the natural source. That being said the researchers adapted the method of translating back-translations is easier than translating the source sentences.
So the AI a.k.a APP develops on this observation that human reference translations when back-translated to the original language, it simplifies (E.g. “jump in” is simplified to “take part”). This observation let the Netflix team arrive at two notions;
Back-translating the human reference translations to the source language gives a simplified version of the original source.
And this can be learned by training a sequence to sequence (s2S) model.
The evaluation of this AI was done on GIGS, Wilkage and the Open Subtitles datasets. GIGS dataset comes from subtitles appearing on 12,301 TV shows and movies from a subscription VOD provider.
The researchers used Transformer architecture through the tensor2tensor library. All experiments were conducted using this architecture with 6 blocks in the encoder and decoder which was running on 4 NVIDIA V100 GPUs.
Netflix’s current focus is on simplifying English based subtitles. But the model can be used universally for other languages too.
Netflix states that the errors in subtitles might not be critical but will be subtle. But these experiences might spoil the user engagement. There are times when Netflix rejects a subtitle even if they’re grammatically correct but falls short on getting the simple phrases and colloquialisms right.
When AI is considered to be the ultimate future, we can’t ignore the fact that it will be taking over almost every field of work eventually. But translating a native language into English will be a task though and might require thorough research. This current model may be successful as the focus is mainly on English based. But translating multiword expressions and phrases will be a tricky task.
Everyone is aware and has experienced the results of regional translations. So do you think using an AI to subtitle content on OTT will justify the intent? Please let us know in the comments below.