About
The Heritage Metadata Automatic Translation system (HM ATS) is a machine translation system which supports translation of Open Archives Initiative (OAI) and Europeana Data Model (EDM) records from European languages to English (including Spanish, Catalan, French, German, Dutch, Italian, Greek, Swedish, Czech and Polish). The system is available via an API. It includes automatic detection of the source language when it is not known from the data.
Alonsgide the translation and language detection results, a confidence level is provided which indicates how accurate the results are. The system also features a validation framework, allowing users to import translations and validate them one by one. The user can propose better translations and rate the quality of the automatic result. Finally, user feedback can be exported and used to improve the quality of the automatic translation.
Benefits
The system:
Supports automatic translation of metadata text fields from European languages to English.
Ensures that translation of metadata supplied as an OAI field occurs instantly without interrupting the process that is already running (‘on-the-fly').
Can be used to support data enrichment and improve the discoverability of textual metadata.
Provides users with an overview of translation results and improves the engines using their feedback.
Technical information
HM ATS can be used by cultural heritage institutions and aggregators as a standalone service. It was developed in Python using the Sanic framework. At its core are translation engines which are state-of-the-art AI models fine-tuned for Machine Translation tasks. The validation platform is built on top of the open-source solution called Label Studio.
This tool applies the automatic translations in the aggregators’ environment. These are then submitted back to Europeana. In order to distinguish the provenance of the enrichments coming from the tool and the original one from the data partner, a special EDM profile was created.
All built Machine Translation engines were released publicly as Docker images in the Docker Hub under Apache 2.0 licence and shared via ELRC-Share. HM ATS code is available in the public repository and the API service. Documentation can be found here. The validation system is available here.
Use the tool
The platform is maintained by Pangeanic and the Jewish Heritage Network. It will also be developed further under the Europeana Translate project. If you would like to use the system or learn more about the validation platform please contact info@jhn.ngo or a.raginsky@pangeanic.com.
Tutorials for validation system usage are available here.