Harvesting and Downloads

The Europeana API suite provides a wide range of APIs tailored to address specific needs. Besides our main APIs for searching and retrieving metadata about objects, we also offer other methods for downloading and harvesting metadata from Europeana.eu that are better suited for scale. On this page, you can explore the two solutions available. 

If you are interested in obtaining the metadata only once or if it is the first time you need to access it, then we suggest using our Downloads solution where you can download object metadata from our FTP server as pre-generated compressed zip files.

If you want to be kept up-to-date as metadata is changed or if you already use harvesting software, then we recommend using our Harvesting solution using the OAI-PMH service. The files available for download in ZIP format and through the OAI-PMH service are available in XML and are appropriate for data processing activities, especially for digital cultural heritage research. For researchers who are used to working with semantic frameworks and tools such as JENA and SPARQL, we also offer compressed zip files for download formatted in Turtle.

Before starting to use either of these options, please read our Introduction page on how data is structured into Records and Datasets, the API Terms of Use and the Usage Guidelines for metadata.

Discover how others have used Europeana's data

Exploring new resources in CLARIN’s Virtual Language Observatory
Title: A Zodial Sphere and Celectial Globe, Peking Observatory
Creator: John Thomson
Institution: Wellcome Collection
Country: United Kingdom
CC BY

Exploring new resources in CLARIN’s Virtual Language Observatory

Since 2017, CLARIN and Europeana have worked together to increase the number of cultural heritage objects available for quick and easy discovery as well as processing by humanities and social sciences scholars. In this post, we take a look at the new resources integrated into CLARIN’s Virtual Language Observatory.

CLARIN and Europeana make discovery and processing quick and easy for 135,000 cultural heritage objects
Title: [Fàbrica Gròber]
Creator: Thomas Bigas, Josep
Date: 1910/1920
Institution: Ajuntament de Girona
Country: Spain
Public Domain

CLARIN and Europeana make discovery and processing quick and easy for 135,000 cultural heritage objects

In 2017, CLARIN carried out a pilot exploring the possibilities of integrating Europeana Collections’ material into its infrastructure and thus opening up new possibilities for the discovery and linguistic processing of textual cultural heritage content for a social sciences and humanities research audience. This integration is now entering a new stage, offering improved quality and increased processing potential.

Introducing our image classification pilot

With lowered barriers to access and the development of new practices for Artificial Intelligence (AI), it’s no surprise that AI-related activities in the cultural heritage sector are increasing -  a topic in focus on this month on Europeana Pro. In this post, we share work taking place at the Europeana Foundation to create an image classification pilot which uses computer vision algorithms to improve metadata in our records.

Downloads

To foster the reuse of the metadata that is published in Europeana, our offer includes compressed zip files containing the metadata of all objects in Europeana's repository readily available for bulk download. These files are generated on Sunday evening each week using our harvesting solution, which guarantees that the data is as up-to-date as possible while making sure our harvesting service is working as expected.

FTP listing and file structure

All the files are available in our FTP server at ftp://download.europeana.eu/dataset/. You can connect to an FTP server by using software programs like FileZilla, or you can connect to an FTP server as a Shared Network Location or using the Command Prompt. If you are using a Linux OS, you can run the command: wget -m ftp://download.europeana.eu/dataset/XML

Information on how to login to the FTP Server:

Host:ftp://download.europeana.eu/dataset/
User:anonymous
Password:[leave blank]
Port:21

The structure in the FTP server is organised in the following way:

  • A directory for each available format. For the time being only two formats are available: XML for the RDF-XML format and TTL for Turtle.

  • Each directory then lists a compressed zip file for each Dataset in Europeana, where the name of the file is the dataset identifier (e.g. 2021672.zip). Under this directory will be a respective MD5 checksum file under the file extension .md5sum (e.g. 2021672.zip.md5sum) which can be used to validate the file upon download.

  • On each compressed zip file there will be a file for each Europeana metadata record where the name of the file will be the local identifier of the Record in Europeana.

Example

The data for the Girl with the Pearl Earring from the Mauritshuis encoded using the RDF-XML format will be available at the following URL ftp://download.europeana.eu/dataset/XML/2021672.zip . To find to which dataset any record belongs, you can check the URL of the record (for the Girl with the pearl earring, the Europeana item URL is https://www.europeana.eu/nl/item/2021672/resource_document_mauritshuis_670 ), or you can find the dataset name next to the field 'Collection Name' in the 'More Metadata' tab on the item page.

The FTP server will provide you with a ZIP file with the metadata for all the objects in the dataset with the dataset number '2021672' if you request the URL ftp://download.europeana.eu/dataset/XML/2021672.zip. Unzipping the ZIP File will give you an XML file for every digital cultural heritage object. You can find the metadata for the “Girl with the Pearl Earring” in the ZIP file with the ID of that object, 'resource_document_mauritshuis_670' in the XML file named "resource_document_mauritshuis_670.xml"

Harvesting (OAI-PMH)

The Europeana OAI-PMH Service offers a way to collect large amounts of Europeana data from our repository through a protocol named OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting, presently in v2.0). This service allows you to harvest the entirety, or a selection per dataset, or date of creation/modification, of all Europeana metadata so that it can be integrated into other services or applications. 

You can learn more about the harvesting protocol on the Open Archives Initiative (OAI) website and also by reading the OAI for beginners tutorial from the Open Archives Forum.

Available requests

Below you can find the available requests. The base URL for all requests is https://api.europeana.eu/oai/record/. These links and requests return XML, for which you need to use an XML-aware browser or viewing application.

List of available requests defined by the OAI-PMH protocol:

Structure and Format of the Data

The records in the OAI-PMH service are grouped into Datasets and are available as EDM RDF/XML. An example of a dataset ID that is accepted by the OAI-PMH service is 2022608_Ag_NO_ELocal_DiMu. The records are identified by their URIs. An example of such an identifier is http://data.europeana.eu/item/2022608/AAK_AAKS_2007_02_0206. To learn more about data.europeana.eu and its resources please see the EDM definitions at the introduction page.

Known limitations

Europeana currently doesn't maintain a deleted record registry. Therefore we recommend you re-harvest or download the entire collection at least every six months to ensure your copy of the Europeana repository is up-to-date.

Roadmap and Changelog

We deploy new versions of the service primarily to fix any outstanding issues or introduce new features. The current version of the OAI-PMH Service is 0.8 Beta (2020-10). To see the changes made for this version and also all previous releases, see the API changelog in the project GitHub.

top