Enrichment policy: stakeholder survey report

The survey received a total number of 166 complete responses. A majority of respondents identified themselves as heritage professionals (58.96%), followed by researchers (32.08%), educators (27.61%), members of an aggregation service (21.64%) and culture enthusiasts (20.89%). They engage with enrichments through their production, reuse, organisation of enrichment activities and development of enrichment tools.

Data quality

An overwhelming majority (76.87%) of respondents perceived enrichments as very important for discoverability. Some argue this is because the objects need to be discovered first before anything else. Respondents also value the role of enrichments in correcting potential issues with the source data, updating it against contemporary views and improving the data quality (60.45%) in general, enhancing accessibility (69.4%), adding additional context to the object including links to other objects (62.69%), and offering more multilingual metadata (59.7%), which improves engagement.

Overall respondents found correctness to be important to them. Some incorrectness is tolerated, depending on the (re)use scenario (especially in research, incorrectness is less accepted), the type of enrichment and the definition of correctness (e.g. whether incorrect enrichment is completely wrong or imprecise). Some respondents added incorrectness could be mitigated via further correction activities and stated they prefer incorrectness over no enrichment at all.

Finally, respondents highlighted enrichments can increase the reuse potential of data and user engagement, while others also mentioned the negative impact when enrichments add incorrect data. For example, when used to improve discoverability, incorrectness could have a negative impact on the extent and relevancy of the results obtained.

Transparency

About half of the respondents (51.9%) indicated that being able to distinguish between authoritative data provided by cultural heritage institutions (CHIs) and enriched data is of great importance to them. This is also one of the factors when discussing respondents’ tolerance for incorrectness. For example, some respondents would be more lenient towards incorrectness if enriched data is clearly separated from data originating from cultural heritage institutions. One respondent mentioned this also lessens the negative impact of incorrect enrichments on the reputation of cultural heritage institutions that created the authoritative data.

A large majority of the respondents (81.25%) would like to have information about whether the enrichment is machine and/or human-generated, followed by if and to what degree the enrichment’s accuracy had been validated (78.13%). Some respondents would welcome even greater insight over the enrichment process (e.g. when, in which conditions and for what purpose it was performed), the actors that are involved in the creation of enrichments (e.g, the professional role of the person who created it, details and settings of the software/tool that was used, confidence levels exposed by the tool) and how the validation was performed. Having access to this information would help them estimate the reliability of enrichments, assess potential biases or limitations of enriched data and this way would inform their decisions about if and for which purposes they can reuse enrichments.

Approximately one-third of the respondents (35.88%) indicated that if they would contribute enrichments (e.g. by participating in a crowdsourcing campaign) they would like their name acknowledged. In a separate question, slightly less respondents (32.81%) indicated they would like to see the name of the person who created the enrichment (if human generated). However, through additional comments, respondents indicated they are more interested in understanding the expertise, professional role or background of the person than their name, as this would help them decide about trustworthiness of enrichments.

Reusability and copyright

Respondents mentioned several reuse scenarios when asked for which purposes they use enrichments. Their answers can be broadly categorised into three groups that coincide with their professional role. Content holders are using enrichments to improve their collections and offer a better experience to their audience, developers to improve and/or offer new features for their software applications and other stakeholders as sources for research and education.

A great majority of the respondents (74.99%) agreed or strongly agreed that ‘data resulting from enrichment efforts should have no new copyright protection’ and that ‘no new rights should be applied to enrichments, whether they are machine or human generated’. A few respondents questioned whether enrichments are original enough to justify copyright protection. Many commented that even if there were situations in which enrichments attracted additional copyright protection, they should be removed in favour of being able to reuse enriched data and share the knowledge, fostering research, innovation and progress.

Less than one-fifth (17.9%) disagreed or strongly disagreed with the statement, with no profile of respondents particularly standing out as disagreeing more than others. Among those, some respondents clarified that they would like to ensure attribution, sustain a business model, compensate an effort, or recognise copyright protection, but would not be against openly licensing the results.

Collaboration and support

When asked about types of partnerships that would be helpful to respondents, it was clear that some respondents would like to extend and refine existing relationships, and engage with communities, while others are seeking out new types of relationships to improve their workflows and more quickly take advantage of the advances in research. In particular CHIs and aggregators look for technical partnerships that can help them better address their needs, while technical partners seek CHIs for expertise in their domain and to provide concrete cases to apply and tailor their tools.

The most common reasons for seeking out partnerships among CHIs and some aggregators are lack of resources, more specifically financial limitations, lack of technical expertise and dedicated/free software, required training to use the tools, and finally time and/or man-power to invest in enrichment activities including validation.

All respondents would welcome knowledge sharing between different actors in enrichment activities, in particular the sharing of (new) technologies and standards, business cases, workflows, methodologies, policies, case studies, good practices, and the output of enrichment activities.

Participation, diversity and inclusion

A big majority (87.81%) of the respondents indicated contributions of individuals (i.e. crowdsourcing) for enrichment efforts are either valuable or very valuable. When asked about their concerns regarding crowdsourcing in cultural heritage, they expressed unease around potential privacy issues related to publishing personal or sensitive data, possible biases and divergence of opinion of contributing authors and raised the need for content moderation. For many, pre-defining scope and guidelines of activities and implementing mechanisms of quality control for outputted information is key to harnessing benefits of crowdsourcing, for example, making sure that you carefully match the right expertise with the scope and/or target of the enrichment activity.

When asked how to ensure the participation of minority communities in enrichment activities, respondents replied we should understand and reduce barriers to participation. As strategies for overcoming barriers they listed making sure enrichment tools, software and platforms are user-friendly and accessible for users with different needs, provide adequate training and support, close any gaps in expertise through upskilling and to cater to different language needs. To actively improve and foster engagement with minorities we should approach them in a meaningful and targeted way, for example, directly at in-person events or through relevant non-governmental organisations (NGOs), organise workshops, social media campaigns and focus groups. On top of that, respondents suggest building a two-way relationship, where we learn from each other instead of “us” designing for “them,” making sure collaboration is long term, and offer compensation in financial or other forms by, for example, giving appreciation and recognition for invested time and effort.

Environmental sustainability

More than half of the respondents (53.33%) indicated they haven’t considered the environmental aspect of enrichment activities.

Some (20.83%) believe that the impact could be mitigated through the creation of enrichment software and tool(s) by opting for less energy demanding methods, using sustainable energy, equipment and labour, and optimising data storage.

Some other respondents (18.33%) indicated environmental impact could be minimised with careful selection of enrichment techniques, questioning which methods are more energy efficient, human or machine-based. For example, a few respondents mentioned it is necessary to consider if the quality of AI-generated enrichments justifies higher-energy consumption necessary to train AI models.

Some also added it is necessary to define clear enrichment goals and to coordinate enrichment activities in order to minimise duplication of efforts, and some went even further and noted we should not be looking at enrichment activities in isolation from other data management activities.

More information

For more information on the results of the survey, you can download the report below which provides more detailed responses.

Next steps

The very insightful results of the survey will be used to create a draft of the enrichments policy. In the coming weeks, we will reach out to relevant stakeholders for them to review the draft of the policy. The policy will be made available online in the coming months.