About
The lack of rich, descriptive metadata can affect the searchability and usability of digital content from museums, libraries and archives and on Europeana. SAGE is an online system which seeks to address this issue by allowing the manipulation of metadata in the Europeana Data Model (EDM) and other formats and models (including CSV, XML, JSON, RDF) so that it can be automatically enriched through external services that employ state-of-the-art AI and web semantic technologies.
The system is able to produce enrichments in the form of URIs linked to selected metadata fields, or even harvest additional information from external sources like Wikidata. The enrichments can then be manually validated through an integrated validation sub-system that allows bulk validations through text grouping and text frequency sorting. This means that the user only has to validate each text once and the changes will be applied to all the respective records, giving priority to texts that appear in more records in case a complete validation of a dataset is not feasible.
Via this tool, all relevant keywords can be translated to various languages through the respective Wikidata and Getty links, in order to create a multilingual vocabulary to fit the project’s needs.
The Sage tool was used in Europeana CEF Telecom project Pagode-Europeana China to automatically semantically enrich more than 20,000 records. It will also be used in the Europeana CEF Telecom project CRAFTED to analyse metadata fields and text extracted from AI content analysis tools in order to identify and remove uncertainty from named entities. The ultimate aim is to enrich more than 100,000 records and enable user validation and assessment of automatically extracted entities.
Benefits
The platform allows cultural heritage institutions to:
Integrate different types of data from multiple sources into a single Resource Description Framework (RDF) record or collection.
Improve searchability and indexing.
Access a clear overview of the results through the validation procedure which involves validation from a person.
Technical information
The system transforms data to the Resource Description Framework (RDF) and stores them in a Virtuoso triple store, using SPARQL to retrieve and manipulate them. The external annotator services that are used for the enrichment of the metadata employ state-of-the-art technologies like BERT (an attention-based transformer deep neural network), lemmatisation, and named entity recognition and disambiguation techniques.
SAGE is an open-source platform under the Apache Licence 2.0. More details and link to the source code will be made publicly available upon completion of the Europeana XX: Century of Change project.
Use the platform
If you want to try out SAGE or learn more about the platform, contact the SAGE team (sage@ails.ece.ntua.gr) and sign up to the platform.