Discover how the J-Ark project connects data aggregation with preservation

About J-Ark

J-Ark (European Jewish Community Archive) is a Generic Services project funded under CEF Telecom eArchiving. The project is developing European eArchiving standard-based, long-term preservation solutions for Jewish heritage archives, which it has integrated into the aggregation flow of a Europeana thematic aggregator for Jewish heritage, Judaica (operated by the Jewish Heritage Network)

The project is unique in using software components from three different initiatives funded by the European Commission (eArchiving, Europeana, eTranslation) to build a comprehensive end-to-end solution for small and mid-size cultural heritage institutions, an important group of stakeholders for European infrastructures.

Enhanced aggregation workflow

J-Ark has brought together the open-source, eArchiving-compliant, long-term preservation solution RODA (provided by KEEP SOLUTIONS, Portugal) and services for machine translation and anonymisation of heritage content (provided by Pangeanic, Spain) as a new, integrated solution to Judaica. This means that the aggregator now offers a service which ingests digital objects and metadata via an orchestrated workflow, starting with a web-based or file-based pipeline for submitting digital objects (as E-ARK SIPs, the eArchiving specification for submitting an archival package). Metadata can be added manually through the user-friendly CMS offered by the aggregator or uploaded in a spreadsheet file.

The project explored a direct integration with a CMS that stores the original metadata in order to test and to showcase potential issues and possibilities. We built a custom integration with dLibra, the digital library system actively used in Poland, which allows cultural heritage institutions using dLibra to upload a collection of objects in dLibra to the archival service with several clicks.

After digital objects and metadata are submitted, the metadata is automatically translated to European languages (English, French, Spanish, German, Italian) and anonymised to address privacy requirements of preserved archival collections. Representation of the metadata in the Europeana Data Model (a Linked Data description format is stored by RODA (the preservation system), which now added EDM to the list of supported formats.

The project has been piloted on the data sets of two partners: Brama Grodzka - Teatr NN, a Jewish heritage centre from Lublin, Poland, and the Jewish Community of Lithuania. Content partners explored different metadata and digital content submission workflows during hackathons, which were then implemented by the project.

These pilots highlighted the importance of two fundamental aspects when integrating a preservation solution into an existing digital environment. First, the solution needs to integrate harmoniously with the current workflows (such as content publishing, metadata production etc.) and systems (CMS, digital asset management) around digital objects and metadata. Second, it is necessary to find the right strategy to decide how the complexity and specifics of the data structure at source is reflected on the preservation end. For example, harvesting Brama Grodzka - Teatr NN data (based on dLibra system) required a custom harvester to map a complex hierarchical data structure to more "flat" EDM representation.

If you are a cultural heritage institution interested in a long-term preservation solution compatible with aggregation flows please get in touch.

Future perspectives on preserving data within data spaces

Over the course of the J-Ark project, it has become clear that connecting the different services of European digital infrastructures will be a core requirement of the next generation of European ecosystems for data - the data spaces. Developing the common European data spaces is a new flagship initiative of the European Commission aiming to support the growth of the digital economy in strategic sectors and domains of public interest. Interoperable data spaces cover/domains from manufacturing and health to energy and agriculture, and ensure both public and private sector organisations and research institutions can make available and exchange data in a trustworthy and secure manner.

The project’s experiences connecting digital preservation to the aggregation flow highlight the importance of considering long-term preservation of cultural heritage assets in the broader perspective of data space design. They also raise some of the bigger questions about the future of digital preservation and how they can be addressed by the common European data spaces. How long is ‘long-term’ preservation: 10, 50 or maybe 300 years? How do we make it truly inclusive by preserving not only mainstream collections, usually safeguarded by institutions well-equipped with preservation means (or at least an established Content Management System for objects and metadata), but also archives of small organisations, communities, and even personal archives? How do we save on costs by looking at the ecosystem as a whole rather than at specific use cases?

If you are interested in exploring some of these questions, the project has organised a free online event on Wednesday 22nd February 2023, and we’d love to see you there. The event will include updates from the project, the European Commission and the Europeana Foundation and a round-table that looks at the future of digital preservation and the common European data spaces. Find out more and book your space below.

Discover how the J-Ark project connects data aggregation with preservation

About J-Ark

Enhanced aggregation workflow

Future perspectives on preserving data within data spaces

Preservation of Digital Heritage in the data space