This website uses cookies to ensure you get the best experience. By clicking or navigating the site you agree to allow our collection of information through cookies. Check our Privacy policy.

Posted on Tuesday February 15, 2022

Updated on Monday November 6, 2023

SAGE

SAGE is a web-based tool for generating, enriching and publishing content. The tool was developed by the National Technical University of Athens (NTUA) under the Europeana Generic Services project Europeana XX: Century of Change and enhanced during CRAFTED project.

main image
Title:
A screenshot from the validation progress tab indicating the percentage of enrichments that have been manually validated.
Creator:
SAGE
Date:
2021

About

The lack of rich, descriptive metadata can affect the searchability and usability of digital content from museums, libraries and archives and on Europeana.eu. SAGE is an online system which seeks to address this issue by allowing the manipulation of metadata in the Europeana Data Model (EDM) and other models and formats (including CSV, XML, JSON, RDF), so that it can be automatically enriched by employing state-of-the-art AI and web semantic technologies.

Enrichments

The SAGE tool offers a suite of services that can be used to analyse textual data in order to link them with terms from various linked data sources. SAGE supports several types of annotators designed to cover different use cases. The annotators support the application of pattern matching rules to link with terms from controlled vocabularies uploaded by the user (SKOS vocabularies), can connect to general-purpose named entity recognition and disambiguation services, or issue SPARQL queries to SPARQL endpoints like Wikidata. The annotators make use of state-of-the-art NLP software libraries, such as the Stanza Stanford NLP library to analyse textual content in different languages and can connect to external services.

The SAGE tool supports the import of metadata records in EDM via the MINT aggregation tool and can also be used to produce annotations by analysing other forms of textual data, such as the outputs of Optical Character Recognition.

The SAGE tool was used in Europeana CEF Telecom project Pagode-Europeana China to automatically semantically enrich more than 20,000 records. SAGE was also used in the WEAVE project to produce more than 9,000 high-quality enrichments (91.2% acceptance rate by human validators, 9,621 accepted enrichments) on more than 3,800 records.

Validation environment

The tool also has a validation environment in which users can review and validate automatic enrichments. The validation process allows bulk validations through text grouping and text frequency sorting. This means that the user only has to validate each text once and the changes will be applied to all the respective records, giving priority to texts that appear in more records in case a complete validation of a dataset is not feasible.

The validation environment has been equipped with a number of new features implemented during the CRAFTED project. These include, among other, the setup of validation campaigns; supporting several ways of ranking the annotations for validation; enabling validators to specify the target field of the annotation; recognising and treating in a special way URIs that are already included in some metadata field etc.

The annotations added by SAGE can be delivered to Europeana as enrichments embedded in the EDM record via the MINT aggregation tool.

Benefits

The platform allows cultural heritage institutions to improve the quality and visibility of their collections at large scale via the following steps:

  • Integrate different types of data from multiple sources into a single Resource Description Framework (RDF) collection

  • Automatically analyse multilingual metadata and create semantic enrichments that connect to relevant open data vocabularies

  • Review and validate the automatic enrichments through a validation environment

  • Export the moderated enriched metadata to Europeana via the MINT aggregation platform, so as to make their collections more searchable and reusable

Technical information 

The system transforms data to the Resource Description Framework (RDF) and stores them in a Virtuoso triple store, using SPARQL to retrieve and manipulate them. The external annotator services that are used for the enrichment of the metadata employ state-of-the-art technologies for lemmatisation, pattern matching with controlled vocabularies, and named entity recognition and disambiguation techniques.

SAGE is an open-source platform under the Apache Licence 2.0. More details and links to the source code will be made publicly available upon completion of the Europeana XX: Century of Change project and enhanced during the WEAVE and Crafted project.

Sage resources:

Use the platform

If you want to try out SAGE or learn more about the platform, contact the SAGE team ([email protected]) and sign up to the platform.

top