Artificial Intelligence and copyright in the cultural heritage sector: views from Creative Commons

Developments in artificial intelligence (AI) present a host of exciting opportunities for GLAMs (galleries, libraries, archives and museums) in the digital world. These range from the development of models or algorithms perfected through data processing, to mining, analysing and enriching datasets with new metadata. While these opportunities are likely to propel GLAMs forward through their digital transformation, they also raise questions in the area of copyright, especially when it comes to using GLAMs’ digital collections to train AI and the treatment of AI-generated outputs under copyright law.

At Creative Commons (CC), we are currently reflecting on some of the issues and in this post, we share our perspective on three key points: the use of collections by GLAMs for AI training; the copyright/public domain status of AI-generated content; and the barriers beyond copyright to opening up and sharing GLAM collections in light of the lack of clarity surrounding AI.

Use of GLAM collections as input for AI training

CC fully supports GLAMs in using the massive amounts of data in their digital collections for AI-training purposes (including machine-learning) in order to fulfil their public interest missions. Legally, there remains significant uncertainty as to whether copyright limitations and exceptions allow the use of copyright content for AI training. This uncertainty is likely to have a chilling effect on GLAMs wishing to take advantage of AI technologies. This is one reason why, at CC, we argue that the use of copyright works to train AI should be considered non-infringing by default. As concerns CC-licensed content, where copyright permission is required to train AI systems, the licenses grant that permission under different terms and conditions depending on the particular CC license. A flowchart helps visualise whether the licenses are triggered and if so, what conditions may apply.

No copyright in AI 'creative' content

AI has been seen to generate ‘creative’ content through processes such as Markov chains and artificial neural networks like GPT-3 (Generative Pre-trained Transformer 3, a deep learning model that can produce text). Such content might very well become part of GLAMs’ collections as it starts to gain appreciation as a new form of ‘creative’ expression. Likewise, the content generated by GLAMs using AI technology (like enriched datasets) is likely to become abundant as more and more institutions explore the opportunities offered by AI.

While the copyright status of such content is unclear under existing law, CC is of the firm view that there should be no copyright on AI-generated content and that it should be in the public domain. Public domain material can be widely accessed, used and reused by GLAMs in fulfilment of their public-interest mission as well as by the general public. We recently stated that we all benefit when knowledge, culture, and history are made accessible and shareable. That’s why, in line with the principles proclaimed in the Europeana Public Domain Charter, we must continue to advocate for open access to knowledge and culture and resist further enclosures of our shared public domain.

Barriers beyond copyright

Beyond copyright, several obstacles to sharing and using GLAM collections related to ethics, privacy and data protection need to be assessed to bring clarity to the rapidly evolving role that AI is playing in the GLAM sector. If you are interested in joining the conversation on AI and openly-licensed content with policy experts from all over the world, become a member of the CC Copyright Platform by joining our CC Policy Mailing List.