This website uses cookies to ensure you get the best experience. By clicking or navigating the site you agree to allow our collection of information through cookies. More info

Posted on Tuesday February 15, 2022

Updated on Tuesday February 15, 2022

Heritage Metadata Automatic Translation system

The Heritage Metadata Automatic Translation system is a set of APIs for language detection and translation of text data fields submitted directly or via Open Archives Initiative (OAI) feeds. The system was developed by Pangeanic under the Europeana Generic Services project Europeana XX: Century of Change.

main image
Title:
A screenshot from the API Swagger page, which shows available endpoints for translation of the text segments or OAI feed as well as language detection.
Creator:
Pangeanic
Date:
2021

About

The Heritage Metadata Automatic Translation system (HM ATS) is a machine translation system which supports translation of Open Archives Initiative (OAI) and Europeana Data Model (EDM) records from European languages to English (including Spanish, Catalan, French, German, Dutch, Italian, Greek, Swedish, Czech and Polish). The system is available via an API. It includes automatic detection of the source language when it is not known from the data. 

Alonsgide the translation and language detection results, a confidence level is provided which indicates how accurate the results are. The system also features a validation framework, allowing users to import translations and validate them one by one. The user can propose better translations and rate the quality of the automatic result. Finally, user feedback can be exported and used to improve the quality of the automatic translation.

Benefits

The system:

  • Supports automatic translation of metadata text fields from European languages to English.

  • Ensures that translation of metadata supplied as an OAI field occurs instantly without interrupting the process that is already running (‘on-the-fly').

  • Can be used to support data enrichment and improve the discoverability of textual metadata. 

  • Provides users with an overview of translation results and improves the engines using their feedback.

Technical information 

HM ATS can be used by cultural heritage institutions and aggregators as a standalone service. It was developed in Python using the Sanic framework. At its core are translation engines which are state-of-the-art AI models fine-tuned for Machine Translation tasks. The validation platform is built on top of the open-source solution called Label Studio

This tool applies the automatic translations in the aggregators’ environment. These are then submitted back to Europeana. In order to distinguish the provenance of the enrichments coming from the tool and the original one from the data partner, a special EDM profile was created. 

All built Machine Translation engines were released publicly as Docker images in the Docker Hub under Apache 2.0 licence and shared via ELRC-Share. HM ATS code is available in the public repository and the API service. Documentation can be found here. The validation system is available here.

Use the tool

The platform is maintained by Pangeanic and the Jewish Heritage Network. It will also be developed further under the Europeana Translate project. If you would like to use the system or learn more about the validation platform please contact info@jhn.ngo or a.raginsky@pangeanic.com.

Tutorials for validation system usage are available here.

top