Metadata and Content Ingestion for the Europeana Cloud Project

In the Europeana Cloud project we aimed to ingest a great variety of data with special relevance to scholars and the Humanities and Social Sciences, i.e. the core target audience of Europeana Research. In a previous blogpost, I highlighted some examples of data sets that we ingested in this context. In addition, the project ingested a wide range of materials which enhanced the existing Europeana dataset, and reflects Europe’s linguistic and cultural diversity.

Valuable scholarly metadata (a total of over 2.4 million items) was sourced from institutions as diverse as the Croatian Academy of Arts and Sciences, the Hungarian University of Debrecen, and the Bavarian Library Consortium. Languages featured include Czech, Dutch, German, English, French, Italian, Latin, Russian, Greek and Hungarian. The datasets cover a great variety of materials, including digitised maps, manuscripts, incunabula, archival materials, pamphlets, playbills, dissertations and journals, as well as visual materials such as portraits, architectural drawings, photographs, images of plaster casts, films and video. Topics covered include (in no particular order) political studies, economics, law, philology, linguistics, psychology, education, history, Judaic studies, philosophy, religion, theatre studies, history of fencing, folklore, architecture, geography, literature, Egyptology, medieval history, etc.

Bust (1920): Mitropolitul Andrei Saguna (1808-1873), from the Biblioteca Facultatii deTeologie "Andrei Saguna’din Sibiu". Public Domain

The new content, with solid contributions from Belgium, Croatia, Finland and Romania filled some of the lacunae in the existing dataset, while adding audio and video files from the middle of the 20th century helped to address the 20th-century black hole in Europeana Collections.

The project aggregated metadata for digital objects (including, of course, the all-important link to this object), but a second task in the Ingestion Work Package worked on the ingestion of actual digital content. The project ingested Newspaper material and content contributed by project partners.

Several associate partners of the Europeana Newspapers project had not been able to aggregate their full-text/digitized newspapers under the umbrella of that project. The Cloud project subsequently stepped in to load the data for some of these libraries in the course of the Europeana Cloud project.

The datasets for the National Libraries of Belgium and Iceland were thus processed. The dataset of the Royal Library of Belgium (KBR) resulted in 17,129 metadata records, and 135,330 thumbnails. As the KBR image server is not fully supported by the The European Library portal the images are not currently visible in the Newspaper browser that was developed for the Newspaper project. 302,172 records for full text issues from the National and University Library of Iceland (BOK) have also been processed. Both datasets will be migrated to the Cloud infrastructure and will then be available via the Europeana Cloud API in the same way as the newspaper collections that were part of the Europeana Newspaper project.

The newspaper collection of the National Library of Wales became part of the Research and Development work published in Europeana Cloud D4.4, Recommendations for enhancing EDM to represent digital content. Together with the National Library of Wales, the Europeana Research and Development team explored improved Europeana data interoperability with IIIF. The work on making use of IIIF to present this newspaper data holds a great promise for a much better user experience.

The Poznań Supercomputing and Networking Centre - PSNC has ingested its 1.8 million thumbnails into the Europeana Cloud infrastructure. With this migration the Europeana Cloud has become the repository from which the images are drawn when a results set is displayed in the PSNC search interface.

The Open University is responsible for COnnecting REpositories - CORE. The mission of CORE is to aggregate all open access research outputs from repositories and journals worldwide and make them available to the public. Five million digital objects were added to the Europeana Cloud infrastructure, in five representations (oai-pmh xml representation of the original record; enriched json representation of the above metadata; .pdf file of the article (where available); extracted text (if the above pdf was available); and a thumbnail preview of the article (if pdf was available) in png format).

Both metadata and content ingested in the course of this project are stored in the Europeana Cloud, with the aim of creating a supportive environment for innovative exploration and analysis of Europe’s digitised content. Europeana Cloud is not an archive but supports active storage and direct access to European cultural heritage content for its manipulation and reuse. Europeana Cloud data may be accessed, and downloaded, using the Europeana Cloud API.

Oliva, Franciscus, n.d. Portolan Charts, (s.l.): (s.n.) [Marseille, 1650].
Image by the University of Edinburgh. Public Domain