One programmer is suing GitHub, Microsoft, and OpenAI over artificial intelligence (AI) technology designed to generate its computer code.
In late June 2022, Microsoft unveiled a new type of artificial intelligence technology that could generate an automatic computer code.
Known as Copilot, this tool was developed to hasten the work of professional programmers. As they typed away on their computers, the AI would suggest ready-made blocks of computer code they could immediately add to their own.
Most programmers loved this new tool or were to some extent intrigued by it. However, Matthew Butterick, a programmer, writer, designer, and lawyer from Los Angeles, was not thrilled. Earlier this month, he formed a team of lawyers who filed a lawsuit that is targeting class-action status against Microsoft and all the other high-profile firms that designed and distributed Copilot.
Just like most of the other cutting-edge artificial intelligence technologies, Copilot developed its automated skills by analyzing massive amounts of data. Notably, it relied heavily on billions of lines of computer code published on the internet.
Mr. Butterick, 52, said that this skill development method is similar to piracy since the system never acknowledges its debt to existing work. His lawsuit alleges that Microsoft and its partners violated the legal rights of millions of programmers who spent many years writing the original code.
The suit is thought to be the first legal attack on a design technique known as “A.I. training,” which is a means of building artificial intelligence that is expected to remake the tech sector. In recent years, most writers, artists, pundits, and privacy activists have complained that firms are training their AI systems using data that never belong to them.
This lawsuit has had many echoes in the past few decades in the technology sector. In the 1990s and into the 2000s, Microsoft combated the surge of open-source software, terming it an existential threat to the future of the firm’s business. As the benefits of open source grew, Microsoft embraced it and even purchased GitHub, a home to open-source programmers and a place where they created and stored their code.
Nearly all new generation of technology – even online search engines – has encountered similar legal hiccups. An intellectual property lawyer who specializes in the majorly important area of the law, Bradley J. Hulbert, stated:
“Often, there is no statute or case law that covers it.”
The suit is now part of a groundswell of worry over artificial intelligence. Writers, artists, composers, and other creative types majorly worry that firms and researchers are using their work to develop new technology without their consent and without offering any compensation. Firms train a huge variety of systems in that way, including speech recognition systems like Alexa and Siri, art generators, and automatic driverless cars.
Copilot is powered by technology designed by OpenAI, an artificial intelligence lab in San Francisco that is backed by a billion dollars in funding from Microsoft. OpenAI is leading the increasingly widespread effort to train artificial intelligence technologies using digital data.
After GitHub and Microsoft released Copilot, Nat Friedman, GitHub’s chief executive, tweeted that using a current code to train the new system was a ‘fair use’ of the material under the existing copyright law, a debate mostly used by firms and researchers who designed and developed these systems. Nonetheless, no court case has managed to test the argument.
Mr. Butterick said in an interview:
“The ambitions of Microsoft and OpenAI go way beyond GitHub and Copilot. They want to train on any data anywhere, for free, without consent, forever.”
Mr. Butterick has set up a team of lawyers to sue Microsoft and other developers of Copilot.
In 2020, OpenAI unleashed a system known as GPT-3. Researchers trained this system using huge amounts of digital text, including Wikipedia articles, thousands of books, chat logs and lots of other data posted to the internet.
By focusing on patterns in all the text, the system learned to predict the next words in a sequence. When somebody typed several words into the ‘huge language model,” it could complete the thought with whole paragraphs of text. That way, the AI system could write its speeches, Twitter posts, news articles, and poems.
Much to the surprise of the team who designed the system, it can write computer programs, having learned from an untold number of programs that have been posted around the internet.
OpenAI went a step further, training the Codex system on a new collection of data that is stocked mainly with code. The lab later acknowledged in a research paper detailing the technology that some of the code came from GitHub. GitHub is a popular programming service that is owned and operated by Microsoft.
The new system gradually became the underlying technology for Copilot. Microsoft then distributed the new technology to programmers via GitHub. After it was tested with a small number of programmers for nearly a year, Copilot rolled out to all coders on GitHub in July.
Today, the code that Copilot mainly creates is simple and may be beneficial for large projects but has to be augmented, massaged, and vetted, according to most of the programmers who have used this technology. Some coders and programmers find it useful only when they are learning to code or attempting to master a new language.
But, Mr. Butterick is worried that Copilot will end up destroying the entire global community of programmers who have designed and developed the code at the core of a majority of modern technologies. Days after this system was released, he wrote a blog post titled: “This Copilot Is Stupid and Wants to Kill Me.”
Mr. Butterick identifies as an open-source programmer. He is part of the community of programmers who openly share their code with the entire world. In the last three decades, open-source software has assisted in driving the development of most of the technologies that consumers use daily, including web browsers, and mobile and smartphone apps.
Although open-source software is primarily designed to be shared freely among companies and coders, the sharing is governed by licenses designed to guarantee that it is used in methods that benefit the entire community of programmers. Mr. Butterick is convinced that Copilot has violated most licenses and, as it continues with its improvements, will make open-source coders obsolete.
After complaining publicly about the problem for many months, he filed the suit with several other lawyers. Their lawsuit is still in the infancy stages and is yet to be granted class-action status by the court.
To the surprise of most legal experts, Mr. Butterick’s suit does not accuse GitHub, Microsoft, and OpenAI of copyright infringement. His suit takes a different approach, insisting that the firms involved have violated GitHub’s terms of service and privacy policies while also ignoring a federal law that needs firms to display copyright information when they utilize materials.
Mr. Butterick together with another lawyer, Joe Saveri, said that the suit could eventually combat the copyright issue.
Asked whether the firm could discuss the suit, a GitHub spokesman declined to comment, before stating in an email that the firm has been:
“Committed to innovating responsibly with Copilot from the start, and will continue to evolve the product to best serve developers across the globe.”
OpenAI and Microsoft declined to comment on this lawsuit.
Under the current laws, a majority of experts think that training an artificial intelligence system on copyrighted material is legal. But doing that could become illegal if the system ends up developing material that is majorly similar to the data that it was trained on.
Some of the Copilot users have stated that it generates code that seems identical – or almost identical – to the current programs, an observation that may become the integral component of Mr. Butterick’s case and others.
A professor at the University of California, Berkeley, Pam Samuelson, who specializes in intellectual property and its role in modern technology, said that legal thinkers and regulators explored these legal issues briefly in the 1980s before the technology existed. Now, according to her, a legal assessment is necessary.
Dr. Samuelson said:
“It is not a toy problem anymore.”
Artificial intelligence and the Data that trains it should be regulated.