New Datasets Available for Linked Open Data Initiatives
Two weeks ago, Europeana opened up its huge cultural dataset for re-use under the Creative Commons Zero Universal Public Domain Dedication. Now we can announce that datasets reflecting this change are available as 'data dumps' for re-use in Linked Open Data initiatives.
Downloading the datasets
Here's what developers and technical staff need to know about downloading and using the datasets. (Non-techies, please feel free to share this blog with your technical teams!)
The datasets are available to download at data.europeana.eu/download/2.0/
An overview of the datasets is available in a spreadsheet. It is possible to preview a dataset in the Europeana portal by using the first numbers of its name followed by a wildcard: e.g. 'europeana_collectionName: 08602*' or 'europeana_collectionName: 03486*'
Screenshot of the dataset download page.
You'll see that the 'datasets' folder has two sub-folders, 'nt' and 'rdf'. These contain the files corresponding to each individual dataset, expressed using the N-Triples and RDF/XML syntaxes for RDF. In both datasets, the data model used is the Europeana Data Model (EDM).
The 'links' folder contains links to other Linked Data sources. These links are the results of the semantic enrichment done by Europeana. Co-reference links to Linked Data services maintained by Europeana partners (e.g. SOCH, the Swedish cultural heritage aggregator) are also provided in this folder.
We're also planning to offer access to these data sources via a SPARQL-endpoint. This work has just begun and we'll make an announcement when it is ready.
In parallel to working with making Europeana metadata available for Linked Open Data initiatives, we're currently developing a new REST-style API. This new API will allow querying of the Europeana repository with responses in JSON-format and the returned metadata modelled in EDM.
Future versions of the Europeana portal will be developed using the same back-end as this new API so it will always keep pace with our own portal's search capabilities. We hope to make this new API available in late autumn. Please contact us if you want access to our current API and to receive news as soon as the new API is available.
API - Application Programming Interface - a specification intended to be used as an interface by software components to communicate with each other.
Linked Open Data - In computing, linked data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried. View Europeana's Linked Open Data page.
N-Triple - A format for storing and transmitting data.
RDF - Resource Description Framework - a metadata data model for expressing web resources.
REST - REpresentational State Transfer - a style of software architecture for distributed systems such as the World Wide Web. REST has emerged as a predominant Web service design model.
SPARQL - An RDF query language for databases, able to retrieve and manipulate data stored in Resource Description Framework format.
Wildcard - In computer (software) technology, a wildcard character can be used to substitute for any other character or characters in a string.
XML - Extensible Markup Language - defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.