The Legal Labyrinth: Navigating the Implications of Artificial Intelligence

TULJ
Feb 28, 2024
14 min read

Milan Narayan

Edited by Sandi Perez and Colin Crawford

“Artificial Intelligence” has been shrouded in mystery since the term became mainstream. The enigma of AI has naturally caused it to become a vanguard for evolving technology as a whole. The inscrutability of AI has spliced public opinion toward extremes. Some laud it as a spearheading symbol of an efficient, utilitarian society, while others believe widespread adoption of AI portends a dystopian future devoid of humanity or ethics. As with any emerging technology, it is crucial to understand its drawbacks and limitations, especially in legal contexts in which there is much ambiguity regarding legitimate and permissible applications of AI. Currently, AI suffers from a myriad of legal challenges, including infringement of copyright, privacy, and IP laws. Understanding how these problems unfold is critical to the advancement and protection of personal liberties in an increasingly digitized world.

Understanding AI — A Primer

Although the term “Artificial Intelligence” has existed since 1956, the concept of artificially intelligent beings became familiar through science fiction in the early 1900s, such as through the Tin Man in Wizard of Oz [1]. In 1950, Alan Turing published Computing Machinery and Intelligence, in which he discussed a form of artificial intelligence that largely resembles what we are familiar with today. To determine the intelligence of a machine, Turing outlined a test to see if it could converse with a human (via some form of virtual communication) without the human knowing its true identity; it ultimately passed the test [2]. This test may be a rudimentary way of comprehending a machine’s ability to “think,” but it has proven to be a notable impetus given the rise of chatbots that humans can converse with in real-time.

Knowing what artificial intelligence means is incredibly useful because many advanced technological features are wrongly referred to as AI. The US government classifies a system as artificial intelligence if it fulfills one of five criteria, each of which emphasizes an interconnectedness between humanity and a machine or system:

“Any artificial system that performs tasks under varying and unpredictable circumstances without significant human oversight, or that can learn from experience and improve performance when exposed to data sets.
An artificial system developed in computer software, physical hardware, or other context that solves tasks requiring human-like perception, cognition, planning, learning, communication, or physical action.
An artificial system designed to think or act like a human, including cognitive architectures and neural networks.
A set of techniques, including machine learning, that is designed to approximate a cognitive task.
An artificial system designed to act rationally, including an intelligent software agent or embodied robot that achieves goals using perception, planning, reasoning, learning, communicating, decision making, and acting” [3].

While the U.S. government's criteria encapsulate the broad capabilities of AI, its practical applications manifest in many forms. For instance, AI is permeating every part of our daily lives from students using applications like ChatGPT to help with assignments to corporations using the image creator Dall-E to assist in marketing campaigns. This type of AI is called Generative AI, and the release of ChatGPT-3.5 to the public sprung Generative AI into the limelight. Generative AI differs from traditional forms of AI, which is often described as “discriminative” AI.

Generative and discriminative AI models differ in how they are trained. Discriminative AI models are trained on labeled data [4]. Labeled data is a form of data used in machine learning—a subset of AI—wherein each piece, such as an image, text, or sound clip, is tagged with a label that identifies or categorizes it. This label tells the model what the data means or represents or what category it belongs to.

Imagine you are an art student tasked with understanding the differences between prominent painters. You are provided images of different paintings labeled with the corresponding artist. This allows you to better recognize trends, patterns, and differences in the paintings of certain artists compared to others because you can more easily associate the features of each painting with the corresponding artist label. The entire process of learning via labeled data is called “supervised learning;” discriminative models are thus best suited to identify or classify things.

On the other hand, generative AI models are trained on both labeled and unlabeled data. The generative models use supervised learning to recognize features and trends in the data, similar to discriminative models [5]. The use of unlabeled data facilitates creative freedom since the model is not confined or restricted to only data with labels. This creativity is what makes generative models better equipped to produce new content. Unsupervised learning is the engine behind many of the most common AI tools today, such as ChatGPT, Dall-E, Google Bard, or AlphaCode.

Legal Implications of Traditional AI

Traditional AI is ubiquitous. Its use ranges from making diagnoses based on X-ray imaging to self-driving cars. One of the most common applications of traditional AI is in the hiring process, where, historically, humans made decisions about applicants. Humans were subject to implicit biases and prejudices, but any decisions in the hiring process were clearly “governed by federal, state, and local laws that regulated the decision-making processes in terms of fairness, transparency, and equity,” as per the Equal Employment Act [6]. Artificial intelligence treads in murky water, however, because the US does not have laws that clearly establish fair practices in hiring practices that are influenced by AI. Any AI model is reliant on its algorithms, which, by extension, are related to the data the model was trained on. These algorithms are susceptible to significant bias, either through reflection of the bias of its creator or through the data it was trained on. Nicol Turner Lee, a senior fellow at the Brookings Institute, continues that “bias in algorithms can emanate from unrepresentative or incomplete training data or the reliance on flawed information that reflects historical inequalities,” highlighting the importance of having up-to-date algorithms that reflect modern trends or demographics.

Oversight in the creation of algorithms can magnify biases unintentionally, as well. In 2018, Amazon scrapped an internal AI job recruiting tool due in part to apparent gender bias [7]. The model was trained on all resumes submitted to Amazon over the previous ten years, which were disproportionately from white males. As a result, a disproportionate number of white males received job offers. Given that information, the model attempted to harness similar patterns based on keywords or phrases, but in doing so, it unknowingly punished certain words and phrases that applied to women, such as “Women in Law Club” or “Texas Woman’s University.” Although Amazon never officially used its tool to evaluate candidates, there are undoubtedly countless organizations and corporations that use similar tactics in their hiring practices. Amazon’s long history and involvement with AI before the rollout of this tool further raises significant concerns, because it means even the most experienced and well-versed corporations are not immune to clear bias.

Law enforcement has increasingly relied on discriminative AI models in various stages of due process [8]. Biases in these algorithms carry severe consequences for those affected. For some, it can be the difference between an acquittal or a life sentence. Violence risk assessments—one type of discriminative AI model—play significant roles in some jurisdictions due to their influence on the sentencing process. A study by Washington D.C.’s Public Defender Service (PDS) unearthed that these assessments are even more influential in juvenile proceedings because young people are more likely to exhibit volatile behavior before sentencing. Youth labeled as “high-risk” from these assessments may now be “sent to a psychiatric hospital or a secured detention facility, separating them from their family and drastically changing the course of their life” [9]. Even though these tests are not necessarily compulsory, many defendants take them in exchange for the ability to plead guilty, which can increase their chance for probation as opposed to incarceration. If the test generates a “high” risk assessment, the juvenile may no longer be afforded probation as a sentencing option.

Naturally, this calls into question the criteria of the risk assessments. D.C.’s Public Defender Service discovered that one of the risk factors was “parental criminality,” which can unfairly punish young people of color based on unjust and racist laws from generations past, such as those from the War on Drugs. Another was “community disorganization,” which accounts for gang activity and violence in the surrounding neighborhood where the youth resides. Even though proximal gang activity does not necessarily predict the likelihood of a youth from the area of being high risk, the risk assessment evaluates it as if it does. Further inquiry by the PDS yielded “only two studies of its efficacy, neither of which made the case for the system’s validity; one was 20 years old and the other was an unreviewed, unpublished master’s thesis,” meaning many children’s lives were destroyed over an AI tool that had not been extensively reviewed since the 20th century. As a result of PDS’s findings, a higher court ruled that results from the risk assessment could not be considered strong evidence under the standard outlined in Daubert v. Merrell Dow Pharmaceuticals (1993).

Even when AI-influenced pre-sentencing investigation reports and risk assessments are challenged, it tends to be difficult to affect any change. A prime example of the ambiguity of these assessments comes from State v. Loomis (2017). In 2013, Eric Loomis was charged with five criminal counts related to a drive-by shooting in Wisconsin.10 Loomis conceded that he did drive the car in dispute but denied involvement in the shooting. Loomis pleaded guilty to two of the less severe counts. Before the sentencing, the Wisconsin Department of Corrections ran a pre-sentencing investigation (PSI), which included a risk assessment. This specific risk assessment drew on information from the defendant’s interview and criminal file as well as other publicly available information relating to the defendant to determine the defendant’s risk of recidivism. The AI algorithm in the assessment was developed by a third-party vendor, which was contracted by the state of Wisconsin. As a result, the methodology of the assessment “[was] a trade secret [and] only the estimates of recidivism risk [were] reported to the court,” meaning the risk factors and their associated valuations were locked in an algorithmic black box, unbeknownst to the defendant or the court [10]. The assessment generated a high risk of recidivism score for Loomis, and that was referenced during the court’s determination to give him a maximum sentence. Since Loomis and his counsel were unaware of the methodology behind the score report, they felt that the score unfairly impacted his sentencing.

Loomis filed a motion for post-conviction relief on the basis that the assessment violated his due process. Loomis argued that the assessment’s reference class was too wide since it drew upon “data relevant only to particular groups” and thus did not engender conditions for individualized sentencing. He further argued that the assessment made determinations based on his gender, meaning the court’s sentencing discriminated against him based on his gender. The case reached the Wisconsin Supreme Court, which affirmed the original sentencing and rejected Loomis’s motion. The Supreme Court wrote that Loomis did not have sufficient evidence that the assessment accounted for gender, and it further argued that the assessment is not an ultimatum, since courts have ample discretion on how much they want to value the assessment. Loomis’s unsuccessful challenge highlights both the secrecy and the unknown behind many of the algorithms that drive decision-making in criminal law enforcement. The outsourcing of these algorithms to third parties insulates the algorithms from scrutiny because they are “trade secrets,” which certainly inhibits a defendant from fully knowing how a court adjudicated their sentencing. Furthermore, if these algorithms prove successful in the aggregate at reducing recidivism, balancing individual liberties with community safety becomes immensely challenging.

Generative AI and Intellectual Property and Copyright

The onset of generative AI has heavily strained the creators of published works. Generative AI models are easily able to write poems in Snoop Dogg’s voice or produce artwork in a distinct Monet style. These models require significant training to develop the ability to mimic the techniques of different creators, which leads to concerns about whether their training data was fairly and ethically sourced. Discriminative AI models are susceptible to being trained on data that is copyrighted as well, but unlabeled data is more readily available online; and since this type of data is seldom used by discriminative models, copyright challenges primarily affect generative AI [11].

The notable case Andersen v. Stability AI (2023) touches upon the issue of using copyrighted data. In Andersen, the plaintiffs filed a putative class action that argues that a Stability AI software program was trained on some of their works, which allowed it to produce outputs in a similar style to the plaintiff’s works [12]. The presiding judge agreed that the program used copyrighted data to train, but he disagreed with the notion that this was illegal. In part, this was due to technicality—the U.S. Copyright Office asserts that one’s work “is under copyright protection the moment it is created and fixed in a tangible form that it is perceptible either directly or with the aid of a machine or device” [13]. Despite the protection, a creator is required to register their work with the Copyright Office if they wish to bring forth a copyright infringement lawsuit. Because many of the copyrighted works in this case were not registered with the Copyright Office, the Judge ruled that the plaintiffs could not pursue claims that allege Copyright Act infringement.

In another case from late 2023, Authors Guild v. OpenAI (2023), several authors alleged that OpenAI was training its Large Language Models on their published works—which OpenAI did not receive permission to use—and profiting off them through its subscription plan [14]. Although OpenAI, the developers of ChatGPT, does not publicly disclose the materials it trains its models on, the authors determined that the model could regurgitate insignificant information from the published works that was not publicly available online. For instance, Douglas Preston, one of the authors in the lawsuit, found that ChatGPT described an inconsequential and minor character from one of his novels at remarkable length, which led him to conclude that ChatGPT must have “read” his entire novel [15]. The Authors Guild has reportedly already been adversely affected by generative AI platforms. In its official complaint, the Guild noted that one of its authors who specializes in marketing lost 75% of their work due to clients switching to generative AI. Further, the guild argues that ChatGPT’s service of copyrighted text summarization should be outsourced to different platforms or individuals who allow the authors to license their works for a fee. Since the authors do not license their published works to OpenAI, they have no potential to monetize, which places tighter financial restrictions on the authors, who already earn a median income of just $20,000 per year [16].

OpenAI defended its training methods by citing Authors Guild v. Google Inc. (2015), a 2015 case that addressed fair use of copyrighted works [16]. Google developed the Library Project and Google Books Project, both of which created digital copies of copyrighted books. Both projects intended to allow users to search for snippets of text within a book, or to search for relevant books given a snippet of text. In Authors Guild (2015), the plaintiffs argued Google digitized and replicated copyrighted material without the consent of the authors, which violated copyright infringement laws. Copying published works is legal under certain conditions established under 17 U.S.C. § 107:

“The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
The nature of the copyrighted work;
The amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
The effect of the use upon the potential market for or value of the copyrighted work” [17].

The Second Circuit Court of Appeals found that Google’s actions were protected under the above conditions for fair use, primarily because Google was not creating a “significant market substitute for the protected aspects of the originals” [17]. OpenAI appealed to this opinion in Authors Guild v. OpenAI, but there is a key difference: Google did not seek to produce new content based upon the copyrighted works, but OpenAI directly uses copyrighted materials to produce similar content. In the case of the marketing author from above, OpenAI created a substantial market substitute to the point where the author was forced to recoup 75% of their clientele.17 This distinction also clearly explicates the difference between generative and discriminative AI models. Discriminative models identify and classify, whereas generative models create. Authors Guild v. OpenAI is an ongoing case, meaning any decision made on the case will set a precedent for how the justice system will seek to approach AI generations and fair use.

Liability and AI

As AI becomes more integrated into daily activities, legal experts have raised concerns about how liability relates to artificial intelligence. In 2022, a man named Stephen Taler attempted to register his AI-created art with the Copyright Office. The office denied the registration attempt, and Taler sued. A federal court sided with the Copyright Office, writing that “human authorship is an essential part of a valid copyright claim” [18]. Most proponents of the human copyrightability of AI outputs have disagreed and argued that the 1884 case Burrow-Giles Lithographic Co. v. Sarony establishes that a photographer has copyrightability if they made any creative decisions regarding the outcome of the image, such as lighting or composition [19]. Therefore, an AI tool such as an image creator would be akin to a camera in that a user provides specific directions to generate an intended output. The Copyright Office pushed back on this analogy and contended that leveraging an AI tool for copyrightability is like hiring an artist, providing them general directions on a painting, and then claiming the rights to the artist’s work.

If someone were to use artificial intelligence lawfully, but the Artificial Intelligence produced an output with legal consequences, who bears responsibility? Consider the following hypothetical case.

A pharmaceutical company uses generative AI to assist in the production of unique compounds for novel diseases. The AI model draws upon its training data—a set of publicly available compounds, molecular compositions, and chemical structures—to create new, untested compounds that the model predicts will be effective in treating certain novel diseases. One of the compounds that the model generates shows potential in clinical trials and laboratory testing. After intensive scrutiny, the FDA approves the drug for release to the market. The company that utilized the AI tool for production publicly lauds the AI tool and credits it for the development of the drug.

Upon release of the drug, doctors and pharmacists determine that a small portion of the population who use the drug has a rare mutation that causes extremely severe adverse reactions. These reactions were not discovered during clinical trials, lab testing, or by the AI during its generation process. The disgruntled patients and their families file suit against the drug company for failing to disclose potential side effects and other product liability charges. Should the company be forced to reckon with the damaging side effects of its drug despite none of the individuals creating it? Was it the responsibility of the lab testers and regulators to find this side effect? Are the creators of the AI guilty, even though the operations of the AI are hidden in a figurative black box? Is the AI itself to blame, and if so, how could it be reprimanded in a court of law? As of late 2023, US courts have yet to determine a concrete method to adjudicate this type of case. US lawmakers should propose and debate legislation that seeks to tackle this issue because the permanence of generative AI renders this sort of situation inevitable.

Concluding Thoughts

The rapid adoption of generative AI by the general public over the last couple of years prophesizes remarkable progress for the industry as a whole. But without imminent guidance from lawmakers, the current lack of clarity from federal courts regarding copyright laws and generative AI will lead to further equivocation about ethical applications of generative AI. Consequently, conversations about generative AI may lead to closer examinations of the ethicality of some practices of discriminative AI. Both discriminative and generative AI have shown staggering promise in producing a more creative, efficient, and innovative world, but misuse can cause serious concerns about how embedded AI should be in our everyday routines. For instance, the unbridled proliferation of AI, as established by the Authors Guild, could cause a significant employment decline in creative industries. It remains to be seen how AI will be regulated, but the primary focus of legislators should be the protection of civil liberties and individual rights for both users of AI and those affected by its outputs and recommendations.

[1] Rockwell Anyoha, The History of Artificial Intelligence, Science in the News (Aug. 28, 2017), sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence.

[2] Graham Oppy & David Dowe, The Turing Test, The Stanford Encyclopedia of Philosophy (Oct. 4, 2021), https://plato.stanford.edu/archives/win2021/entries/turing-test/.

[3]

[4] Anand Mahurkar, Why Discriminative AI Will Continue to Dominate Enterprise AI Adoption in a World Flooded With Discussions on Generative AI, FastCompany (2023), www.fastcompany.com/90927119/why-discriminative-ai-will-continue-to-dominate-enterprise-ai-adoption-in-a-world-flooded-with-discussions-on-generative-ai.

[5] Hiren Dhaduk, How Does Generative AI Work: A Deep Dive Into Generative AI Models, Simform (Jul. 18 2023), www.simform.com/blog/how-does-generative-ai-work.

[6] Nicol Turner Lee, Algorithmic Bias Detection and Mitigation: Best Practices and Policies to Reduce Consumer Harms, Brookings (Jun. 27, 2023), www.brookings.edu/articles/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms.

[7] James Vincent, Amazon Reportedly Scraps Internal AI Recruiting Tool That Was Biased Against Women, The Verge (Oct. 10, 2018), www.theverge.com/2018/10/10/17958784/ai-recruiting-tool-bias-amazon-report.

[8] Khari Johnson, Know It All: AI and Police Surveillance, NPR (Feb. 23, 2023), www.npr.org/2023/02/23/1159084476/know-it-all-ai-and-police-surveillance#:~:text=becoming%20a%20reality.-,Today%2C%20artificial%20intelligence%20is%20being%20used%20by%20law%20enforcement%20for,continued%20racial%20profiling%20in%20policing.

[9] Blase Kearney, LITIGATING ALGORITHMS: CHALLENGING GOVERNMENT USE OF ALGORITHMIC DECISION SYSTEMS, AI Now Institute (Sept. 2018), ainowinstitute.org/wp-content/uploads/2023/04/litigatingalgorithms.pdf.

[10] State v. Loomis, 881 N.W.2d 749 (2016).

[11] State v. Loomis, 130 Harv. L. Rev. 130 (2017).

[12] Colin Wei, Understanding Deep Learning Algorithms That Leverage Unlabeled Data, Part 1: Self-training, SAIL Blog (Feb. 24, 2022), ai.stanford.edu/blog/understanding-self-training.

[13]Andersen v. Stability AI Ltd., 23-cv-00201-WHO (N.D. Cal. Oct. 30, 2023)

[14] Copyright in General, U.S. Copyright Office, www.copyright.gov/help/faq/faq-general.html#:~:text=No.,infringement%20of%20a%20U.S.%20work.

[15] Authors Guild v. OpenAI Inc., 1:23-cv-08292-SHS (S.D.N.Y. Feb. 6, 2024).

[16] Alexandra Alter, Franzen, Grisham and Other Prominent Authors Sue OpenAI, New York Times (Sept. 20, 2023), www.nytimes.com/2023/09/20/books/authors-openai-lawsuit-chatgpt-copyright.html.

[17] Authors Guild v. Google, Inc., 13-4829-cv (2d Cir. Oct. 16, 2015).

[18] 17 USC 107: Limitations on Exclusive Rights: Fair Use. uscode.house.gov/view.xhtml?req=granuleid:USC-prelim-title17-section107&num=0&edition=prelim.

[19] Christopher Zipoli, Generative Artificial Intelligence and Copyright Law, Congressional Research Service (Sept. 29, 2023), crsreports.congress.gov/product/pdf/LSB/LSB10922.

[20] Library of Congress. “Zarya of the Dawn (Registration # VAu001480196).” Thomson Reuters, 21 Feb. 2023, fingfx.thomsonreuters.com/gfx/legaldocs/klpygnkyrpg/AI%20COPYRIGHT%20decision.pdf#page=20.