Datasheets for digital cultural heritage Working Group
This Working Group, set up within the Europeana Research Community and EuropeanaTech Community, works to adapt the concept of datasheets for the cultural heritage sector.
This Working Group, set up within the Europeana Research Community and EuropeanaTech Community, works to adapt the concept of datasheets for the cultural heritage sector.
A datasheet is a standardised publication format for documenting a dataset, providing the context needed for reusing the data. In 2018 Timnit Gebru et al introduced Datasheets for Datasets to the machine learning community. A datasheet encourages the creators of a dataset to carefully reflect on the provenance of the data. For potential users of a dataset, the datasheet provides information to make informed decisions about using the data.
The study says, ’Datasheets for datasets have the potential to increase transparency and accountability within the machine learning community, mitigate unwanted societal biases in machine learning models, facilitate greater reproducibility of machine learning results, and help researchers and practitioners to select more appropriate datasets for their chosen tasks.’
Cultural heritage data differs from contemporary, industrial data in a number of ways. Most importantly, digital cultural heritage collections are rarely created with the intention of being used as data. Most often they originate from non-digital objects that have been created for very different (cultural) purposes and were digitised at a later stage. They can be copyright protected, tend to be complex and heterogeneous, and might grow over time or contain sensitive content. All of this needs to be communicated to potential users. To facilitate communication between cultural institutions managing digital collections and all those interested in the reuse of cultural heritage datasets for academic and research purposes, datasheets for digital cultural heritage need to reflect these characteristics.
The Europeana Research Community and the EuropeanaTech Community have made this a collaborative endeavour, by launching a call for input to their communities in September 2022 and setting up this group of experts who are currently working on the topic. This group is composed of cultural heritage professionals, technical experts, and researchers working in academia. The findings will be published in scientific articles and presented at conferences.
The expert group has developed the first version of datasheets and discussed methodology and recommendations in Alkemade, H., Claeyssens, S., Colavizza, G., Freire, N., Lehmann, J., Neudecker, C., Osti, G. and van Strien, D., 2023. Datasheets for Digital Cultural Heritage Datasets, Journal of Open Humanities Data, 9 (1), p.17. DOI: 10.5334/johd.124
Expert group members presented the work on datasheets at several conferences:
J. Lehmann at the EuropeanaTech Conference, The Hague, 10-12 October 2023; S. Claeyssens & J. Lehmann at Collections as Data. Collaborating across data spaces for cultural heritage and open science, Brussels, 19-20 February 2024; S. Claeyssens at Unlocking 3D Cultural Heritage: FAIR and more, online, 6 March 2024; S. Claeyssens at DH Benelux 2024, Leuven, 5-7 June 2024; S. Claeyssens, at the CLARIAH Conference, Leiden, 13 June 2024; A. Irollo, at the DARIAH Annual Event, Lisbon, 18-20 June 2024; G. Osti at the 2024 DPASSH Conference:Collections as Data/ Data as Collections, Limerick, 27-28 June 2024.
Contextualising Collections with 'Datasheets for Digital Cultural Heritage Datasets', A Conversation with Steven Claeyssens and Beth Kanazook, Digital Repository of Ireland - Blog, 21 May 2024
Get in touch if you want to share your thoughts and experiences on the topic, or would like us to present our work to your institution, by writing to [email protected]!