Cultural heritage institutions re-ingesting enriched metadata
Guest post by Sam Alloing, Inés Matres, Nathalie Poot and Antoine Isaac
Fostering the re-use of content in Europeana, Europeana Inside launched a pilot recently using enrichments made by Europeana for the content provided within the framework of the project. Semantic and multilingual enrichment is a process by which the content providers’ metadata is linked automatically to external resources such as multilingual vocabularies and gazetteers, providing more context to the data and pulling in additional information such as other language labels and alternative spellings. With this pilot, Europeana Inside tested a new functionality introduced in the Collections Management Systems (CMS) of several museums in Europe via the Europeana Connection Kit (ECK). In the following post, we present the outcomes of this pilot.
Content re-ingestion is an opportunity introduced and tested by Europeana Inside, by which providers get metadata about their objects back, after it has been processed in the Europeana Inside connection toolkit (ECK) and Europeana. It constitutes the last evolution of the ECK. Through this functionality that connects the aggregator with the CMS of the participating cultural institutions, ECK is used to retrieve, select, re-ingest the content from Europeana back to the collections’ source with minimal intervention.
Enrichment tracking and re-ingestion
In May, Europeana Inside provided information on the type of information contained within a metadata record that can be enriched by Europeana (agents, concepts as well as geographical and time coverage). Enriched records can be found directly in the Europeana portal as follows:
- Go to the Europeana portal and click right below ‘all providers’
- Select the collection for which you want to check enrichments (the collections are listed under their aggregating project / facet)
- Once on the collection grid, click on one of the objects.
- The collection number provided by Europeana is located in the URL after '/record/'.
- Enter the desired query in Europeana:
- europeana_collectionName: 000000* AND edm_agent:*
- europeana_collectionName: 000000* AND edm_place:*
- europeana_collectionName: 000000* AND edm_timespan:*
- europeana_collectionName: 000000* AND skos_concept:*
Query that retrieves enriched records directly on the Europeana.eu portal
The ‘enrich and return’ functionality developed by Europena Inside uses the API, which contains more precise enrichment information than the information displayed on the portal, which mixes providers’ enrichments with Europeana’s. ECK makes it possible for cultural institutions to retrieve the enrichments in their own CMS, thus fostering the re-use of content in Europeana.
Not all cultural heritage institutions that publish data on Europeana are familiar with the enrichment process. Guidelines about the enrichments should be made generally available like the EDM guidelines. Content partners can gain more insight about how and in what fields the enrichment takes place by following up the developments of the enrichment task force. Understanding the enrichment process would not only ease the evaluation process of the enrichments, but it would also help content partners to improve their metadata in such a way to increase enrichment success.
Though it has been regarded as a promising test, institutions are preoccupied; in case correct metadata is provided and incorrect data is displayed, cultural institutions would desire that a workflow exists to get errors fixed. Cultural institutions expect it to be as easy to share their collections as to amend or report inaccuracies.
After the pilot, following participating institutions are able to consult enriched records from within their own collection system: the National Gallery-Alexandros Soutzos Museum (GR), Benaki Museum (GR), Royal Museums of Art and History (BE), Stiftelsen Lansmuseet Vasternorrland (SE), Pet?fi Literary Museum and Hungarian National Museum (HU). These museums have adopted in their CMS the ‘enrich and return’ functionality already during the project, therefore they were able to evaluate it. In the future, other participants of Europeana Inside and other cultural heritage institutions will have this functionality available by updating their collections’ software.
Evaluation of enrichments
In collaboration with Europeana, a survey to evaluate the enrichments was elaborated and twelve content partners from the Europeana Inside Consortium analysed and evaluated the enrichments. Here is what was asked:
- Identify the fields from the original metadata that have been enriched (agents, places, time periods, concepts)
- Provide feedback on the quality of the enrichments. Are they accurate? Explain why or why not.
- Which from the enriched fields do you consider to be the most useful?
- How do you plan to re-use the enriched data?
- What do you consider to be the main advantage of the enriched metadata?
From the possible fields to be enriched by Europeana, the multilingual enrichments provided by Geonames for geographical information and GEMET and DBpedia for concepts, were identified as the most valuable. Some institutions stated that their public could be very interested in using enriched agent information, if this is accurate.
Cultural institutions identify that one added value of the enrichments is to overcome language barriers. The most probable use of multilingual enrichments is to improve the front end search functionality on their own websites. Another practical use is to combine enrichments with the original metadata, in order to supply enriched records to future aggregation processes (e.g. to other portals). But in order for this to happen, enrichments have to prove reliable.
Here are some patterns for inaccuracies or other issues relating to semantic enrichment:
- Places: Europeana finds wrong matches for certain geographical coverage information, especially when they are ambiguous localities or towns (e.g. http://bit.ly/spk-ethno2 this trumpet was found in Trujillo, Peru, but in the enrichments list, we find Trujillo in the Spanish province of Extremadura).
- Some content providers noticed that there is a quality score of completeness provided by Europeana but not provided with documentation or criteria for this ranking.
- Agents: Even though the Europeana API keeps track of which enrichments come from the source and distinguishes them from the enrichments, in the display on the Europeana.eu portal, some information provided by the original metadata is displayed as enrichment (e.g. http://bit.ly/spk-ethno1 the original metadata record had information about the contributor). Cultural institutions take the display issue very seriously; actually the capacity of ECK to preview the records in Europeana before providing metadata has been evaluated as most useful.
For Europeana, this metadata re-ingestion study is important feedback for its enrichment work. It confirms some of the finding on an earlier EuropeanaTech Task Force on Multilingual and Semantic Enrichment Strategy. The task force had analysed controlled vocabularies, collections, and metadata fields on the Europeana portal in order ´for the metadata enrichments to enfold their whole potential and act as facilitators of multilingual access´ (here is the full report). The Task Force had already resulted in better documenting the enrichment process at Europeana, and assessing some of its successes and shortcomings. The Europeana Inside study further demonstrates interest from data providers for the enrichment, giving more motivation to Europeana for measuring the quality of the current enrichment rules and enhancing them. It also hints that many data providers are willing to spend resources trying to make enrichments better. In the past, some enrichment issues have been fixed after individual requests – and the Europeana team was happy to do so. This could be further generalised.
Data aggregators will very likely have a great role to play here. As they are the experts of their domain, they could even develop their own enrichment processes, alleviating the need for Europeana to do so at a more general level. They are also in the best position to help distilling the application scenarios on the provider side. This would in turn trigger more motivation – and hopefully more resources – to enhance the quality of metadata at the level of Europeana’s entire network.
Metadata of a record provided to Europeana by the Pet?fi Literary Museum in their CMS (Qulto)
The same object after returning the enrichments from Europeana on concepts by GEMET and geographical information by Geonames