The language you speak shouldn’t be a barrier to finding what you want on Europeana Collections but right now, it might be. Find out what we’re doing to put that right.
Europeana Collections contains material from galleries, libraries, archives and museums in all 28 EU member countries - and more. You can navigate the website in 27 languages, and it’s easy to search for items described in your own language. But things get more complicated when you want to see items that match your search but are described in a different language.
In total, 37 languages are used to describe the collections. However, more than half of all the material (57%) uses one of just five languages - English, German, Dutch, Norwegian or French.
Making an item described in one language turn up in the results or related material when searched for in another language is not easy. And we know we have a long way to go but making positive changes in this area is one of our priorities.
We want people to find what they’re looking for - even if they’re not using the language that their target item is described in.
We want to increase the chances that searching for something in one language brings up results that match your criteria in another language.
Automatic translation is getting better but it isn’t foolproof as you’ll have seen if you’ve ever used an online tool like Google Translate. In the case of Europeana Collections, there are added complications. We’re not concentrating on one specific language being translated into another. We’re working with collections described in 37 languages and trying to match them to search terms that could come in any language. What’s more, metadata isn’t like natural language with full sentences and predictable grammar; it’s often presented in short phrases or even single words meaning that the context required for an accurate translation is hard to find. Adding another layer of complexity is the fact that the terms used can be very specific - they might look like a common term but have a different meaning when used in the context of describing digital cultural material.
Automated processes can only work when they are fed the correct and appropriate information. That sounds obvious but as we’ve seen in the earlier posts in this series, the information provided to Europeana varies greatly in terms of its depth and its quality.
In order for anything to be translated, we need to know what language the original element is provided in. Our systems will not guess. So each element (like the title and description) needs a language marker. That’s another layer of information that cultural heritage institutions need to provide.
What we’re doing
This year, we revised our quality standard, the Europeana Publishing Framework, to include standards for metadata in addition to the existing standards for content. Now, it encourages the people who work on metadata to translate elements like titles into multiple languages, and to include context like place names - which are themselves multilingual - from contextual vocabularies (see item below). The Framework also encourages the use of those all-important language tags to show which language is being used. This takes the guesswork out and means that more automatic linking and translation processes can be implemented.
As well as using expertise from within the Europeana Foundation and Network Association, we rely on the work of others to improve multilingualism on Europeana Collections. In the past year, we carried out a pilot project with the eTranslation team - another European Union-funded DSI project. We're now building on that pilot with further experimentation activities with a view to taking advantage of the project’s automatic translation potential for Europeana.
When a phrase is given the right context, it’s much easier to translate it. We continue to use metadata enrichment to provide more context for the material you find on Europeana Collections. Our efforts here include the use of ‘contextual vocabularies’, especially those available as Linked Open Data. These datasets give us additional details like multilingual labels, translations of key concepts, or different variants of names for people and places. This makes it easier for people to search for and find items on Europeana Collections. Vocabularies can be used either by data providers or by Europeana as part of various (semi-)automatic metadata enrichment processes.
Europe is multilingual. We need to be too. We’re thankful to our partners and friends for helping us translate important elements like the Europeana Publishing Framework and the rights statements Europeana uses (the information that tells you what you can do with an item you find on Europeana, e.g. is it in copyright or in the public domain?) into more languages. So far, the rights statements have seven translations with six more on the way.
The Europeana Collections website is available in 27 languages, and this year, we released a new exhibition - Heritage at Risk - in seven languages.
What to look out for…
Making Europeana more multilingual is a priority and the subject of a two-day event this October under Finland’s presidency of the Council of the EU. The event will see the Europeana Foundation and the Finnish Ministry of Education and Culture focussing on needs, expectations and ways forward for multilingualism in digital cultural heritage.
You can help too. With our partners, we run ‘Transcribathon’ events which invite anyone to join in (either at home online, or at a physical event) and type up the contents of often handwritten text documents so that they can then be more easily accessed and searched, and so they can be machine-translated. In the last year, five transcribathons were organised in cooperation with heritage institutions throughout Europe (Germany, Italy, Belgium, Austria, Romania) and almost 3,000 documents related to the First World War were transcribed.
Find out more
Find out more about our automatic enrichments or the European Union’s eTranslation activity.
And if you’re interested in the fine details of the language element of the Europeana Publishing Framework, you can also see the Europeana Publishing Guide, which details exactly what is required when submitting data to Europeana.
EuropeanaTech has carried out efforts in the area of multilingualism, see for example the Best Practices for Multilingual Access and the various presentations on tackling language issues at the last EuropeanaTech conference.
And help make Europeana more multilingual by joining in at Transcribathon.eu. There you’ll find tutorials to help you get started in English, French and German as well as information about our next events.