Over 330,000 videos are currently available through the Europeana website, and this number will grow in the future. Video is a rich medium, but as a combination of different components - images and sound - it can be difficult to make it accessible. One way to address this is to add subtitles and translations to the videos, something which the Europeana Subtitled project worked on. This has included developing resources which offer cultural heritage professionals guidance about subtitling and translating audiovisual collections.
To create the Europeana Subtitled AI pipeline, the project created and integrated two systems for automatic speech recognition (ASR) and machine translation (MT). The first one allows the transcription of audio into the corresponding text, formatting it into proper closed captions. The second one translates the transcribed text from a local EU language into English, formatting it into proper subtitles.
The subtitling pipeline allows data providers to generate automatic captions and English subtitles for thousands of videos in seven European languages: German, Greek, Spanish, Italian, Dutch, Romanian and Slovenian. Approximately 250 hours of videos were post-edited by professionals and compared with two metrics commonly used for the evaluation of machine translation: Translation Edit Rate (TER) and the BiLingual Evaluation Understudy (BLEU).
The pipeline to automatically produce the subtitles was developed by the project’s technical partners, FBK and Translated, who adapted and optimised the ASR and the MT engines. The pipeline is an independent third-party service which is currently integrated with Europeana via the Europeana APIs. The pipeline will be maintained for at least three years after the end of the project and it will be developed further within the project AI4Europeana.
To explain the workings of the Europeana Subtitled AI pipeline, project partners created the Europeana Subtitled training suite, which consists of video tutorials and guidelines which outline the impact of automatic speech recognition and machine translation technologies and the use of the developed tools for the cultural heritage sector. The training suite was used for the instructor led training sessions in October and November 2022.
The resources developed by the project below help cultural heritage professionals and aggregators to understand:
The steps you need to take to enrich audiovisual collections with subtitles and translation
Which variables to take into account to determine the usability of audiovisual collections, from training AI to publication with subtitles and translations
Where to find more information in case you want start working on subtitles and translation for audiovisual collections
An introduction to Automatic Speech Recognition and Machine Translation
Subtitling has been a manual process since its introduction in the early 1900’s. Today, computers can interpret the audio and guess what is being said through Automatic Speech Recognition (ASR). This transforms speech into text, which can then be used for Machine Translation (MT). For an outsider this might appear an easy process, but the process of generating ASR and MT is very complex. So complex that humans are still needed to improve and validate the groundwork that machines can do. In the video below, researchers Mauro Cettole and Matteo Negri explain what happens in the black box of ASR and MT. You can also explore the slides.