Facilitating archival research on the study of Greece in the 1940s
Digital curation is becoming a familiar concept to many in the cultural heritage sector, but do we realise its potential in making digital cultural heritage reusable for research? In this post, Agiatis Benardou discusses the work of the APOLLONIS Task Force, and the processes it has followed to help researchers access disparate archives to study the 1940s in Greece.
In the context of the Greek Infrastructure for Digital Arts, Humanities and Language Research and Innovation, APOLLONIS, a designated Task Force led by ATHENA R.C., is working on identifying and supporting researchers’ needs when accessing disparate archives. It focuses on archival material from the decade of the 1940s, a turbulent period in Greek history due to its significant events (WWII, Occupation, Opposition, Liberation, Civil War), and has assembled digitised historical archives from various providers to shed light on different historical aspects of these events.
The Task Force has two main aims:
- To design and develop a joint repository for metadata and indexes for people, organisations, places, times, topics and events, to allow people to search content across different archives
- To define digital curation activities and workflows, so that the work taken to develop the repository can be replicated, and support further analysis and processing of the enriched content.
The Task Force, which is still underway, includes members from ATHENA R.C., co-ordinator of APOLLONIS, the Academy of Athens, FORTH, the Institute of Communications and Computer Systems/NTUA, and the Athens School of Fine Arts.
Bringing archives together - the process
The activities of the Task Force are interdisciplinary and varied. From bringing together resources to offering them in a new form to researchers, we are recording our workflows. They include the initial curation process of the digitised archives, ingestion, joint indexing of the data, generation of semantic graph representations and, finally, their publication. Below, we detail the processes the Task Force went through to achieve this.
After we acquired the source materials, we investigated their structure and content in order to map different archive metadata onto a common metadata schema, enabling joint indexing and establishing semantic links in archival content. The common metadata schema is an enriched version of the Europeana Data Model (EDM).
The next step was data cleaning, where ‘dirty’ data that included typographical errors and invalid or incorrect values were corrected. We then enhanced the datasets by identifying additional information and annotating it in the records - this information is mostly related to people, places, armed units, dates and recurrent topics, and we therefore benefitted from Natural Language Processing (NLP) techniques. Challenges addressed include different content formats and schemas, variations in vocabularies and terminologies, inconsistencies in standardisation of content within the same collection and across collections, as well as spelling and typographical errors, use of Greek and Latin characters, abbreviations, and declinations.
The resulting files were expressed in XML format and aggregated. This structuring process leaves room for further enrichment by researchers familiar with the topic. But our primary goal is to support complex research queries. To this end, data was finally organised by linking different sources.
The Task Force’s immediate plans include full-scale ingestion and indexing of the material from a number of archives to produce a corresponding semantic graph. The incorporation of new archives would be the natural continuation of our work, and further collaborations would be welcome.
Improved archives to support research
This ongoing work will improve the current user experience by facilitating access to content in new and innovative ways, in addition to addressing preservation issues. Researchers who use APOLLONIS will not have to search across six different archives or face issues related to the chronology of the items or to the ways in which they had been recorded.
When our work is finalised, researchers will be able to access different archives and enriched resources simultaneously. They will also be able to use curation and content analysis workflows developed as part of the project. The project therefore demonstrates and reflects how digital curation can be an intermediary step to offering useful resources to researchers, and how researchers can collaborate with cultural heritage institutions to enrich their resources.
Find out more
The APOLLONIS Task Force is one of the projects on WWII digital resources that will be discussed in a webinar organised by Athena RC on 10 September 2020 in the framework of its collaboration with Europeana Research 2018-2020. Explore the programme and register!