Help build multilingual systems for digital cultural heritage

What we are doing

As part of the next steps to progress multilingual reach in Europeana, we have put together a technical discussion paper which offers several proposals for improving the multilingual aspects of the Europeana Collections portal. This paper presents our current approach for handling the search, browse and display of the various types of translatable data Europeana holds (object metadata, textual objects, editorial content, user interface) and proposes new ways to develop these aspects.

For example, it suggests the use of a multilingual knowledge graph, built on top of existing linked data sources including multilingual vocabularies available in our domain, to enhance the multilingual performance of our metadata-based search engine. Automatic translation (to English) could be used for the metadata that is not covered by the knowledge graph, and systematically for full-text content (such as newspapers) that is harder to align with a knowledge graph.

We also continue to explore the opportunities offered by new technology such as the European Commission's eTranslation automatic translation service and examine the challenges that Europeana and the wider cultural heritage sector will face as we build multilingual systems that can benefit our users and stakeholders. You can read more about our approach to multilingualism in a previous Pro post.

How can you help?

Everyone in the Europeana community - and beyond - can contribute to building multilingual systems for digital cultural heritage, and we would like to invite you to contribute in two ways.

We have made our technical discussion paper on improving the multilingual aspects of the Europeana Collections portal open to all, and we invite you to send us your feedback on these proposals by January 15. Please either comment directly in the document or email us to share your thoughts.

You can also help by sharing any data related to cultural heritage that contains natural language text or metadata with the European Commission’s eTranslation service. eTranslation is based on Artificial Intelligence and the quality of this technology is greatly improved when it can be trained with suitable data. But to date, cultural heritage is under-represented in the training resources, which means that the service is less well-equipped to handle the specific aspects of cultural heritage data. To help redress this balance, cultural heritage institutions are invited to contribute their own data into a training pool.

Any dataset is welcome, though multilingual data are of course highly prized. You can share your data through the ELRC-SHARE platform; don’t forget to indicate that your dataset is relevant to the Europeana Digital Service Infrastructure! If you have questions you can also contact us and we will be happy to help and put you in touch with the relevant people.