Europeana semantic enrichment

Find out how Europeana performs semantic enrichment and how you can enrich your metadata with linked open vocabularies.

Automatic semantic enrichment at Europeana

Europeana enriches its data providers’ metadata by automatically linking text strings found in the metadata to controlled terms from Linked Open dataset or vocabularies. This process of “augmenting” the source metadata with additional terms is called semantic enrichment.

The enrichment process can be summarised to two main steps:

Matching the metadata of Europeana CH objects to external semantic data, results in links between these objects and resources from external datasets. The example below shows that the object was automatically enriched with the concept of “Costume” from the DBpedia dataset.
The created links point to additional data such as translated labels or broader labels. In the example given above, this means that the record is supplemented with all the translated labels of the DBpedia concept, as well as, with a link to the broader concept in DBpedia “Fashion” and all its translated labels.

For instance, Europeana enriches places with Geonames, while person names and concepts are enriched with DBpedia.

For more details refer to the Europeana Semantic Enrichment Framework

Example of a Europeana record semantically enriched (or contextualisation) with concepts terms from DBpedia. A man building a wig on to the head of a woman on a kind of scaffolding; another woman wearing a tall wig looks on, Wellcome Trust: http://www.europeana.eu/portal/record/9200105/BibliographicResource_3000006114081.html

Help Europeana semantic enrichment by enriching your own metadata

The Europeana Data Model (EDM) gives support for contextual resources — the so-called ‘semantic layer', including concepts from ‘value vocabularies' like thesauri, authority lists, classifications, either coming from the network of Europeana's providers or from third-party data sources. This means that data providers are strongly encouraged to include links from open and multilingual vocabularies in the metadata you send to Europeana following the EDM recommendations for metadata on contextual resources.

Europeana has developed a small tool that ‘dereferences' the URIs, i.e., that fetches all the multilingual and semantic data that are published as Linked Open Data for vocabulary concepts and other contextual resources on third-party services. Europeana currently dereferences several vocabularies from internationally established initiatives or more specific projects, which you can use as well. The vocabulary mappings to EDM and configuration files used for dereferencing are available on GitHub. If you would like to have your own Linked Open Data vocabulary dereferenced, please mention it to your Europeana contact.

Selecting target datasets for automatic semantic enrichment

The selection of the datasets to perform enrichment with is a crucial step to improve the quality of the enrichment and the overall metadata. We recommend to follow the following steps during the selection:

Analyse the source data: a good knowledge of the source data in terms of topic coverage, gaps, quality issues is necessary before selecting an enrichment target.
Identify the enrichment requirements: before performing an enrichment, the enricher should have already defined the expected results. For instance an enrichment could be performed to improve the overall quality of a dataset. In this case the quality issues to be fixed should be identified before performing the enrichment.
Find datasets available on the Web. We recommend selecting datasets available on the Web. Several inventories are available to help enrichers to source enrichment targets.
Select the enrichment targets. Before selecting a target, the enricher has to evaluate potential targets. We have identified criteria that can be used to evaluate targets against the source data.
- Availability and Access: We recommend selecting targets available on the Web and compliant with the Linked Data recipes. These targets should be properly documented and usable under an open licence.
- Granularity and Coverage. The enricher should select targets that have the same coverage than the source data or that can complement the source data. Coverage of several languages is highly desirable.
- Quality. The enricher should pay attention to the quality of the target in terms of semantic and data modelling (see section 2.4.4. of this EuropeanaTech Task Force document).
- Connectivity. We recommend selecting well-connected targets with incoming and outgoing equivalence links to other targets.
- Size
Test the selected target on a sample of source data. One the target is selected, it should be tested on a sample of data before being applied to the whole dataset. A test will allow to verify whether the target really covers the source data or whether it doesn’t introduce semantic ambiguities.

More details and examples of targets datasets and vocabularies can be found in this EuropeanaTech Task Force document.