Artificial Intelligence and Copyright | Centro de Autonomía Digital

November 11, 2024

First a disclaimer: In this article I will use “artificial intelligence” and “AI” to refer to certain machine learning algorithms in the sense used these days. However, I personally don’t feel that the word AI is appropriate for these algorithms. I would much rather use the more specific terms. However, in the interest of legibility, I will continue using the terms in this post.

The last two years have seen a huge surge and interest in the use of artificial intelligence - especially in the context of machine learning algorithms such as Large Language Models (LLMs) and generative algorithms. We have covered these kinds of algorithms in previous articles on this blog. What is interesting about them is that because of the amount of data they have been able to collect, they can generate data in a large amount of contexts. This includes programming language. In this article I would like to talk about some implications for copyright when using generated material.

In the world of free software and open source, it is common to receive code contributions from a number of programmers. Often the relationship to these contributors is loose - although some organizations do use contracts or agreements to manage these contributions. But in many cases, it is not clear who wrote code that was actually contributed. In companies that produce proprietary software or services, this works differently. An employer will usually include language in employment contracts that ensures that the rights for code written by an employee belongs to the company or organization.

But what does all of this have to do with AI generated content, and specifically source code? The thing that most programmers don’t really think about is that copyright is fundamental for how we manage legal rights to source code. The whole open source and free software movement is based on the idea of licenses, such as the BSD license or the GPL. These specify what other people can and can’t do with source code. In a proprietary setting, the license is used to restrict access to source code, for example so that it can’t be legally distributed. Of course, this also means that a breach can lead to lawsuits.

The connection between copyright and a license is quite simple. The person that owns the copyright for something can assign the legal rights for how it is to be used. In other words, the license. So, no matter if we are talking about a private company, or a free software project, the license is assigned by the person that has the copyright for source code. In practice, this is often done implicitly, by for example submitting code that gets merged into a project with a specific license. But the assignment of license is still there. This is something that becomes more visible if a project wants to change its license. This involves contacting every contributor to that project and ensure you have permission to change the license. In a private setting, this becomes easier, since the employment contract usually gives the company full rights to change license whenever they want.

So where does this connect with AI algorithms? Simply this: Content generated by an algorithm does not have copyright. Since the algorithm is not considered a human creator, nothing generated by it can be copyrighted, by anyone. So even if you designed the prompt to generate source code or other material, that does not give you the copyright. In practice, the generated content has no copyright.

The implication from that is only one step. Since AI generated content does not have copyright, it also cannot be licensed. However, it is not clear what license applies to this kind of content if it doesn’t have any. Some observers have noted that having no license is the same as public domain - meaning anyone can do anything with it. But others believe that public domain is also a type of license, and for that reason, the perspective becomes that you can’t do anything with the content, since you have no license that gives you rights to use it. No matter which direction you go, it becomes problematic.

For this reason, I would be very hesitant in accepting code contributions made using AI. If it is for an open source project, it would mean that parts of the code base must have a different license (or more specifically no license). Licensing the contribution under the license of the project is not correct and could be considered legally dubious. Of course, if a person contributes AI generated code under false pretenses, saying it is code they wrote themself, can lead to other types of problems.

In the context of a private company, the legal department should be very careful with these situations, since it could potentially invalidate proprietary licenses if employees use AI generated code mixed in with proprietary code.

No matter how you view this, it is a serious problem, and project managers need to be careful to avoid legal problems in these situations. Until the situation clarifies, my perspective is that it is too risky to use AI generated source code.