Europeana Subtitled AI pipeline and training suite

Adding subtitles and translations can improve the accessibility and visibility of audiovisual heritage collections. The Europeana Subtitled project created a pipeline and training suite which supports this work - explore it below.

Over 330,000 videos are currently available through the Europeana website, and this number will grow in the future. Video is a rich medium, but as a combination of different components - images and sound - it can be difficult to make it accessible. One way to address this is to add subtitles and translations to the videos, something which the Europeana Subtitled project worked on. This has included developing resources which offer cultural heritage professionals guidance about subtitling and translating audiovisual collections.

AI Pipeline

To create the Europeana Subtitled AI pipeline, the project created and integrated two systems for automatic speech recognition (ASR) and machine translation (MT). The first one allows the transcription of audio into the corresponding text, formatting it into proper closed captions. The second one translates the transcribed text from a local EU language into English, formatting it into proper subtitles.

The subtitling pipeline allows data providers to generate automatic captions and English subtitles for thousands of videos in seven European languages: German, Greek, Spanish, Italian, Dutch, Romanian and Slovenian. Approximately 250 hours of videos were post-edited by professionals and compared with two metrics commonly used for the evaluation of machine translation: Translation Edit Rate (TER) and the BiLingual Evaluation Understudy (BLEU).

The pipeline to automatically produce the subtitles was developed by the project’s technical partners, FBK and Translated, who adapted and optimised the ASR and the MT engines. The pipeline is an independent third-party service which is currently integrated with Europeana via the Europeana APIs. The pipeline will be maintained for at least three years after the end of the project and it will be developed further within the project AI4Europeana.

Training suite

To explain the workings of the Europeana Subtitled AI pipeline, project partners created the Europeana Subtitled training suite, which consists of video tutorials and guidelines which outline the impact of automatic speech recognition and machine translation technologies and the use of the developed tools for the cultural heritage sector. The training suite was used for the instructor led training sessions in October and November 2022.

The resources developed by the project below help cultural heritage professionals and aggregators to understand:

The steps you need to take to enrich audiovisual collections with subtitles and translation
Which variables to take into account to determine the usability of audiovisual collections, from training AI to publication with subtitles and translations
Where to find more information in case you want start working on subtitles and translation for audiovisual collections

An introduction to Automatic Speech Recognition and Machine Translation

Subtitling has been a manual process since its introduction in the early 1900’s. Today, computers can interpret the audio and guess what is being said through Automatic Speech Recognition (ASR). This transforms speech into text, which can then be used for Machine Translation (MT). For an outsider this might appear an easy process, but the process of generating ASR and MT is very complex. So complex that humans are still needed to improve and validate the groundwork that machines can do. In the video below, researchers Mauro Cettole and Matteo Negri explain what happens in the black box of ASR and MT. You can also explore the slides.

The video above discussed the different types of video needed for ASR and MT. These are:

Videos to train machines for ASR and MT. These are videos which have been transcribed and translated already.
Videos to train machines in domain specific topics, for example medical jargon, legal phrasing or even accents of speakers.
Video to be used by the machines which will add ASR and MT to them.

There are more variables to take into account to determine the eligibility and usability of audiovisual content for the different uses, which are discussed in the videos below.

Aggregation to Europeana: content and metadata of audiovisual heritage

In the video below, Ilektra Osmani of the Europeana Foundation explains what the technical requirements are for audiovisual collections to be offered for ingestion in Europeana and used for ASR and MT. You can also explore the slides.

Find out more information on the Europeana Publishing Framework and supported MIME types.

Copyright considerations in subtitling activities

The usability of your audiovisual collection can be limited by different kinds of rights. In the video below, Ariadna Matas of the Europeana Foundation, explains the rights that you need to take into account. You can also explore the slides.

Find out more about Copyright when sharing data with Europeana.

Content of the video

A final determining factor to consider when subtitling or translating videos is the content itself. You should consider if the audio quality of the video is good enough. For example:

How good are individual voices distinguishable from other sounds?
How good do speakers articulate their words? Do they have an accent that makes it harder to understand what they are saying?
Is the video you want to add ASR and MT to of the same period in time? Or do people speak differently from the dataset that was used to train the machines?

The determination if a video can be used for ASR and MT can be a case by case consideration. For some collections this can be a consideration on a collection level, for example for news reports by the same reporter.

Making improvements

To improve and validate the automatically generated subtitles and closed captions, the project partners created the Subbit! Platform, which allows the post edit of the machine generated enrichments by cultural heritage professionals and the general public through online edit campaigns. Find out more.