Discover insights from the Copyright office hours on artificial intelligence

Can works and data be mined to train an algorithm without asking rightsholders for permission?

Our session began with an overview of the new legal safeguards that make text and data mining (TDM) possible without the need to ask rightsholders for permission. These safeguards were introduced by the Copyright in the Digital Single Market (CDSM) Directive in 2019 under Articles 3 and 4. The CDSM Directive does not give a particular definition of TDM, but refers to it as, ‘automated computational analysis of information in digital form, such as text, sounds, images or data.’ As such, TDM is a method that facilitates the analysis and extraction, for example by researchers, of large amounts of data, including copyright-protected works. This technique is an integral part of training algorithms and AI models.

The Directive allows TDM activities carried out by ‘research organisations and cultural heritage institutions’ for the purposes of scientific research. These institutions can mine copyright protected works or extract data from a database to which they have lawful access without needing the rightsholders’ permission, and without rightsholders being able to oppose such uses, as long as they are carried out for scientific research.

The Directive also allows individuals and institutions beyond ‘research organisations and cultural heritage institutions’ to make reproductions and extractions of works to which they have lawful access for TDM, regardless of the purpose behind the TDM activities (therefore, any commercial purposes are also included). However, there is a catch: this broader provision applies on the condition that the rightsholders have not expressly reserved their rights, in a machine-readable way, and excluded their works from the scope of the exception. In that case, undertaking TDM without the permission of the rightsholder would constitute infringement of their copyright.

There are various initiatives looking into the technical means for rightsholders to reserve their rights in line with the second safeguard mentioned above, such as the initiative by W3C Community Group, and Have I Been Trained? by Spawning.ai. Participants in our event identified a significant challenge around the difficulty of ensuring that those conducting or deploying TDM apply such standards and respect their use.

How do Creative Commons licences interact with the new legal safeguards for TDM?

The role of the Creative Commons (CC) licences in facilitating the mining of works to which they apply was also discussed in our session.

CC licences are, in essence, contracts to permit certain uses of copyrighted works. However, they are not above the legal safeguards provided in the form of exceptions and limitations to copyright by the law. This means they cannot be used to further restrict exceptions and limitations to copyright provided by law and cannot also be understood as a means to do so. Such a practice would also be contrary to the purpose of CC licenses. Therefore, in the context of the broader provision defining the legal safeguard for TDM in the EU, CC states very clearly that the terms of CC licences cannot be interpreted and function as a rights reservation (opt-out) by rightsholders from this broader TDM safeguard.

Furthermore, in line with the reasoning above, CC licenses cannot also be used to override certain exceptions and limitations to copyright. Specifically, the law forbids any contractual override of the legal safeguard provided to research organisations and cultural heritage institutions for TDM. So where a use falls under this safeguard, the terms of CC licences will not apply.

In the EU, can an AI generated or supported output be subject to copyright protection?

Copyright protection, as a human-centric system, is granted only when there is (human) originality, understood as ‘an author’s own intellectual expression’. Nothing excludes the possibility for an AI assisted output to be a copyright-protected work if the human author was able to make creative decisions, which can be detected in the output, and thus the output is an original expression of the author.

Generally, typing a prompt wouldn’t meet the above requirements. Therefore, it can be said that many AI assisted or generated outputs are not subject to copyright. However, for now there is only doctrinal debate but no case law from the Court of Justice of the EU that exemplifies how to measure or identify originality and human involvement in AI generated or assisted outputs, or who exactly would own the copyright in a purely AI-generated work, if such works are deemed protectable by copyright.

Join us in the upcoming sessions

Did you enjoy this post? Then join our upcoming Copyright and Policy Office hours! You can see an overview of all upcoming sessions here. The next event on 19 July will focus on open access, copyright and ethics with discussions on the intersection between the lack of copyright limitations and ethical reuse.