Walking the tightrope: As generative AI meets EU regulation, pragmatism is likely

2024 is likely to see a balance between regulatory compliance and fostering innovation in generative AI; clear regulation enables enterprises and vendors to proceed with certainty.

Paul Lee

United Kingdom

Lucia Lucchini

United Kingdom

Michelle Seng Ah Lee

United Kingdom

The importance of well-crafted rules should not be overstated when it comes to unlocking the potential of any market. In the case of generative AI, the absence of clear regulatory conditions may cause vendors, enterprise customers, and end users to hesitate. However, the European Union (EU) is expected to set the stage for global regulation of generative AI in 2024, not only influencing its own markets but also serving as a template for other regions.

In 2024 two EU regulations are expected to help shape the growth of the generative AI market in the region and further afield. These are the General Data Protection Regulation (GDPR),1 which has been applicable since 2018, and the upcoming EU AI Act (AIA), expected to be agreed to in early 2024. As generative AI opens up debates on how to manage issues of individual consent, rectification, erasure, bias mitigation, and copyright usage, the industry’s trajectory could be shaped by how organizations and regulators view, enforce, and manage areas of contention.

Despite potential challenges, collaboration in the form of open and transparent conversations between industry and regulators is likely to result in a pragmatic approach that balances regulatory compliance with fostering innovation in generative AI. This would continue the pattern of discussions held in 2023, which saw interventions by regulators in the European Union and other markets. Vendors have adjusted their approach to meet with regulators’ requests; regulators have enabled innovation.2 By addressing the concerns raised by EU regulations in 2024, while promoting the benefits of core technologies, the generative AI market is expected to continue to evolve productively.

Existing and drafted EU regulations are likely to influence generative AI globally

This prediction is focused on EU regulations on generative AI as it is likely to be among the first set of agreed-to regulations that will have a global impact.3 In recent years, there has been a clear “Brussels effect,”4 with EU regulations having global ramifications, and we expect a similar impact from EU regulations that would cover generative AI.5 The extraterritorial impact is likely to have varied impacts:

  1. It is directly applicable to vendors operating from any market that are selling into, or targeting users in, EU countries. Organizations that are noncompliant are likely to be subject to material fines.
  2. Other markets may use EU regulation as a template. EU regulation has, for example, influenced India’s Digital Personal Data Protection Act, 20236 and equivalent regulations in Brazil and California. The AIA may have informed a US Senate bipartisan blueprint for AI that includes elements such as the licensing of high-risk applications like face recognition and public disclosure of training data used in foundation models.7
  3. Multinationals, as well as those complying within the European Union, may apply their AI governance (including that specific to generative AI) globally for a more standardized approach aligned to leading practices, based on compliance with EU regulations.

The majority of EU regulation pertaining to generative AI should become relatively clear by the first quarter of 2024.

In 2024, the direction of European regulation on generative AI is likely to become far clearer. At this time, the industry should have sight of the agreed text of the AIA, which complements the GDPR.8 All companies looking to offer or deploy generative AI solutions should monitor developments in the AIA while also maintaining compliance with the GDPR.

The process for final agreement is in three stages; at the time of writing, two stages had been finalized, with the third and final phase pending the outcome of the “trialogue” between the EU Council, Parliament, and Council.

  • The Council of the European Union finalized its position in December 2022,9 at which point generative AI had only just reached mainstream awareness following the launch of ChatGPT.
  • The EU Parliament finalized its position in June 2023, and this included specific regulation for generative AI.10 This prediction’s references to the AIA reflect mostly the status as of this point in time.
  • The final version of the AIA, expected in early 2024, may include variations to the Parliamentary position. There will then be a further two years before the AIA is applicable.

There are specific terms that are applicable to the European Union’s regulation of generative AI, and it is important to define these. The critical components and types of players that the European Union has defined for the purposes of regulating generative AI within the AIA are:

  • Foundation models (FMs): These are AI models that are trained on data at scale such as OpenAI’s GPT models or Google’s PaLM 2.11 These models can be applied to a wide range of tasks, and as such, differ from narrow AI models. Per the European Union’s definition, a foundation model could be used in a GPAI or other more specific AI models.12
  • General purpose AI (GPAI): The EU regulation defines this as an AI system13 that is designed to perform “generally applicable functions” and does not have an “intended purpose.”14 It can be used in a “plurality of contexts” and in a “plurality of other AI systems.” Core capabilities of a GPAI would include recognition (e.g., of images or speech), generation (currently most commonly of text or images), pattern detection, and translation.
  • Generative AI: This is defined as an AI system created specifically to generate outputs in a range of formats. The best-known generative AI applications include ChatGPT, Snap AI, Google Bard, and Microsoft’s M365 Copilot.

There are two types of entities that will be in scope:

  • Providers: A person, public authority, agency, or other body that develops or commissions an AI system with a view to making this publicly available, for a charge or for free.
  • Deployers: A person, public authority, agency, or any other body using an AI system under its authority. In some contexts, a deployer may be considered a provider. This would be the case if the deployer uses an AI system for a high-risk application.

This prediction will firstly focus on the GDPR, whose obligations are known, and then on the AIA, whose shape is forming but not yet finalized.

Generative AI and the European Union’s GDPR

Generative AI is expected to need to comply with the GDPR on processing of personal data. The GDPR, which came into effect in May 2018,15 defines the rights of “data subjects”—that is, individuals whose personal data being processed could be used to identify those people.

A fundamental tenet of EU regulation is that individuals’ personal data use is grounded on applicable legal grounds, with lawfulness of processing to be maintained for each processing activity.16

This requirement may seem to clash with the core approach of generative AI, which is based on foundation models. Each model is trained on massive quantities of raw data—the more the better. A large proportion of this data—the exact share varying by model—may require consent per some interpretations of EU law. The largest foundation models may have been trained on petabytes (thousands of gigabytes, or GB) of data.17 Earlier ones, including GPT-3, were trained on 570 GB of data.18 Generative AI applications in any medium—text, image, code, or other—create content using the knowledge within each foundation model.

Given the vast number of people whose data may have been used, obtaining individual consent, where required, becomes a complex exercise. Furthermore, as each foundation model supports an effectively infinite number and range of applications, requesting permissions for each additional purpose is even more unrealistic.

However, obtaining individual consent might not be mandatory. “Legitimate interest” may prove to be a sufficient “lawful basis” that permits training of the foundation models that drive generative AI.19 A legitimate interest exists when there is a compelling reason for processing and that processing of the data is the only approach to achieve the desired outcome.20 Regulators are likely wanting to see that organizations have conducted the appropriate evaluations to ensure that claimed legitimate interests and individuals’ rights and freedoms are balanced.

Furthermore, obtaining individual permissions may well be considered a “disproportionate effort.” An acceptable middle way may be mass-market communication. This was one of the steps requested in April 2023 by the Italian regulator, the Garante, of OpenAI to permit it to reinstate service.21 It placed an obligation on the data controller (the nominated person responsible for the foundation model) to launch an awareness campaign in broadcast and online media. This was meant to inform users that personal data may have been used and explain how such data could be deleted via an online tool.

Regulators might view positively that the intent of training is specifically to create better inferential capability that can then be deployed in generative applications (such as OpenAI’s ChatGPT, Stability.ai’s DreamStudio, or Adobe’s Firefly).

The European Data Protection Board may provide more clarity on the issue of consent, among other contentious areas, in 2024.22

The GDPR tenets of rectification, erasure, and the right to be forgotten are applicable to the foundation models that underpin generative AI

GDPR includes a suite of rights with regards to personal data. If data is incorrect, an individual can ask for it to be corrected. If the data subject no longer wants their personal data to be associated with or processed by that organization, they can ask for it to be deleted. These rights have been well known since GDPR came into force. Addressing such requests may cost organizations thousands of dollars.

The foundation models that underpin generative AI are trained on myriad websites that may contain errors. The training process is a single event during which errors can be absorbed into the model. Updating the model to reflect rectifications or other changes could be done most accurately by retraining the model, but this implies substantial costs and time.23

The approach that’s likely to be used to satisfy this requirement will be to use negative feedback loops to fine-tune the model.24 If an original data point is determined to be wrong, weighting applied to the erroneous data point can be changed to minimize the likelihood of that data point reappearing. Feedback loops are imperfect but may be considered appropriate. That said, it is not certain how this approach might work in the case of class action challenges, which may require large swathes of data to be deleted.

Data minimization and statistical accuracy

The idea of data minimization is that collection of personal information should be limited to what is strictly relevant and necessary to achieve a specific task, and as soon as this is complete, the data should be deleted.25 This approach may seem to be at odds with foundation models, with their efficacy related to how much data they can query, with more being better.

However, the principle of data minimization may still be compatible with generative AI if data is de-personalized, for example, by using approaches such as pseudonymization (swapping personal identifiers with placeholder data, which reduces, but does not eliminate, data protection risks) and anonymization (deleting identifiers, which means data is no longer “personal”).26 Using these approaches, the volume of training data can be maintained, but full anonymization may be challenging. Organizations should have an appropriate framework in place to assess and explain to and assure the regulator how they determine what is necessary.

The size of foundation models is linked to statistical accuracy, which is an element of proposed EU regulation included in the AIA.27 In an AI context, accuracy refers to the quality of outputs generated. With a foundation model, the greater the volume of good training data, the more accurate the results should be.28

The next part of the prediction considers the possible impact of the AIA on generative AI.

AIA obligations on foundation models, per the EU Parliament’s agreement

As mentioned earlier, the EU Parliament finalized its position in June 2023, and this included specific regulation for generative AI. The final version of the AIA, expected in early 2024, may include variations to the Parliamentary position.

The Parliamentary agreement included the following elements:

  • Foundation models should be registered in an EU database.
  • Models should be tested extensively to have appropriate levels of predictability, interpretability, corrigibility, safety, and cybersecurity for the entirety of the model’s expected life cycle.
  • Design, testing, and analysis should identify and reduce risks during the model’s development.
  • Datasets used in training models should have sufficient data governance standards. Data sources should be assessed for data quality and bias.
  • Energy usage should be minimized and monitorable across the model’s life cycle.
  • Extensive, accessible technical documentation should be available to downstream providers, to enable their compliance. This documentation should be available for a decade from commercial launch.
  • A quality management system should ensure and document compliance.

Additionally, providers of FMs used in generative AI systems and providers who specialize an FM into a generative AI system should:

  • Comply with additional transparency obligations, including the specific labeling of outputs as AI-generated.
  • Ensure safeguards against the generation of outputs that breach EU law.
  • Document and publish summaries of training data that is protected by copyright.

Bias may be mitigable

The AIA aims to minimize bias within AI systems. This includes the suppression of human bias. Foundation models may have been trained on biased content such as biased text related to gender, race, or sexual preference for example.

Training data is also likely to include language biases, with most content written in English, with additional biases resulting from the preponderance of content ingested from writers of a specific gender, ethnicity, social class, degree of education, and income group.29 Historical biases used to train foundation models could, therefore, generate content that repeats or even accentuates those biases.

Regulators are likely to require that biases be mitigated via any of a variety of techniques, including weighting or the inclusion of synthetic data that can balance out bias.30 Data controllers—which could be both the AI developer and the AI deployer—are likely to be asked to document “traceability,” which explains steps taken.31

Copyright: More clarification on permitted behaviors required

In 2024, further clarity is likely to be needed regarding the use of copyright content.32

Existing EU law may permit usage of copyright data for training, specifically “instances of text and data mining that do not involve acts of reproduction or where the reproductions made fall under the mandatory exception for temporary acts of reproduction.”33 The AIA draft requires that copyright works used for training be listed.

The EU recently, via the Digital Single Market Directive,34 introduced permissions for use of text and data mining for scientific research and for commercial lawful use; although for commercial use there is a right to “opt out” of that permission. Content owners, including several media companies, have exercised that right to opt their data out of AI training.35 As of April 2023, more than a billion items had been removed from a training set for the Stable Diffusion v3 model.36

The largest foundation models may be categorized as systemic

The AIA focuses on risk assessment of each application. This runs counter to the general-purpose nature of foundation models.

However, there may be a distinction between systemic foundation models (SFMs)—those whose impact represents a system risk—and others, following an approach used in the EU Digital Services Act when categorizing types of online platforms and search engines.37 Designation as an SFM is likely to be made according to the quantity of computing resources required to train the model, the type and cost of training inputs used, and its likely market impact. SFMs are likely to have a greater degree of due diligence obligations.38

Another possible outcome is that the AIA may establish some baseline requirements applicable to all foundation models (e.g., around transparency and technical documentation), with additional requirements if foundation models are used for high-risk use cases.

The bottom line

European regulation matters. It is likely to have extraterritorial and regional impacts. At first glance, several existing principles of EU regulations that apply to digital services may have seemed to present major obstacles to the growth of the generative AI market. Indeed, some commentators had likely expected generative AI to be incompatible with EU guidelines.

How generative AI will shape up in the years ahead and what impact it could have is still unknown. It may be several years before the scale and nature of its impact is certain. In 2024, and beyond, vendors and regulators are likely to aspire to collaborate to attain an outcome that works for consumers, enterprises, vendors, and society in general. Governments are acutely aware of the importance of enabling innovation in generative AI—for example, via regulatory sandboxes.39

In 2024, as generative AI applications evolve and the resulting legal challenges become clearer, the direction of the regulatory response may become more evident. Generative AI is likely to remain an emerging sector this year, which can make it hard for regulation to be explicit at this stage. There will likely still be core questions to address, such as the responsibilities for providers of generative AI versus deployers, when each is a separate entity.

By

Paul Lee

United Kingdom

Lucia Lucchini

United Kingdom

Michelle Seng Ah Lee

United Kingdom

Endnotes

  1. European Union (EU), Directive 95/46/EC (General Data Protection Regulation), April 27, 2016. 

    View in Article
  2. European Commission (EC), “First regulatory sandbox on Artificial Intelligence presented,” last updated January 30, 2023; Spanish Ministry of Finance, “Approved statute of the Agencia Española de Supervisión de la Inteligencia Artificial (AESIA),” August 22, 2023. 

    View in Article
  3. Anna Gamvros, Edward Yau, and Steven Chong, “China finalises its Generative AI Regulation,” Norton Rose Fulbright, July 25, 2023. 

    View in Article
  4. Charlotte Siegmann and Markus Anderljung, The Brussels Effect and artificial intelligence: Howe EU regulation will impact the global AI market, Centre for the Governance of AI, 2021.

    View in Article
  5. Tatjana Evas, European framework on ethical aspects of artificial intelligence, robotics and related technologies, European Parliamentary Research Service (EPRS), 2020. 

    View in Article
  6. India Ministry of Law and Justice, The Digital Personal Data Protection Act, 2023, Gazette of India, August 11, 2023; Raktima Roy and Gabriela Zanfir-Fortuna, “The Digital Personal Data Protection Act of India, explained,” Future of Privacy Forum, August 15, 2023. 

    View in Article
  7. Khari Johnson, “Senators want ChatGPT-level AI to require a government license,” Wired, September 9, 2023.

    View in Article
  8. European Parliament News, “EU AI Act: First regulation on artificial intelligence,” last updated June 14, 2023. 

    View in Article
  9. Spanish Presidency of the Council of the European Union, “The EU pioneers regulation of artificial intelligence,” October 22, 2023.

    View in Article
  10. European Parliament News, “EU AI Act: First regulation on artificial intelligence.” 

    View in Article
  11. Rick Merritt, “What are foundation models?,” Nvidia Blog, March 13, 2023; Jacob Devlin et al., “BERT: Pre-training of deep bidirectional transformers for language understanding,” Cornell University Arxiv, last updated May 24, 2019; Google AI, “AI across Google: PaLM 2,” accessed November 20, 2023. 

    View in Article
  12. Elliot Jones, “Explainer: What is a foundation model?,” Ada Lovelace Institute, July 17, 2023.

    View in Article
  13. Council of the European Union, “Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain Union legislative acts,” November 29, 2021; Carlos Ignacio Gutierrez, Anthony Aguirre, and Risto Uuk, “The European Union could rethink its definition of General Purpose AI Systems (GPAIS),” OECD.AI, November 7, 2022. 

    View in Article
  14. Council of the European Union, “Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain Union legislative acts,” May 13, 2022. 

    View in Article
  15. Council of the European Union, “The General Data Protection Regulation,” last updated September 1, 2022. 

    View in Article
  16. Norwegian Consumer Council, Ghost in the machine: Addressing the consumer harms of generative AI, June 2023.

    View in Article
  17. Kim Martineau, “IBM and NASA team up to spur new discoveries about our planet,” IBM Blog, February 1, 2023. 

    View in Article
  18. Merritt, “What are foundation models?

    View in Article
  19. Pablo Rodrigo Trigo Kramcsak, “Can legitimate interest be an appropriate lawful basis for processing Artificial Intelligence training datasets?,” Computer Law & Security Review 48 (April 2023). 

    View in Article
  20. Information Commissioner’s Office (ICO), “Legitimate interests,” accessed November 20, 2023.

    View in Article
  21. Italian Data Protection Authority, “The Guarantor for the Protection of Personal Data,” March 30, 2023. 

    View in Article
  22. European Data Protection Board (EDPB), “EDPB resolves dispute on transfers by Meta and creates task force on ChatGPT,” press release, April 13, 2023. 

    View in Article
  23. Will Knight, “OpenAI’s CEO says the age of giant AI models is already over,” Wired, April 17, 2023. 

    View in Article
  24. Haziqa Sajid, “The AI feedback loop: Maintaining model production quality in the age of AI-generated content,” Unite.ai, July 25, 2023.

    View in Article
  25. European Data Protection Supervisor, “Data minimization,” accessed November 20, 2023.

    View in Article
  26. Ireland Data Protection Commission, “Apply anonymity and pseudonymity,” accessed November 20, 2023; ICO, “Chapter 3: Pseudonymisation,” Draft Anonymisation, Pseudonymisation and Privacy Enhancing Techologies Guidance, February 2022.

    View in Article
  27. European Parliament, Artificial Intelligence Act, accessed November 20, 2023. 

    View in Article
  28. Rishi Bommasani et al., On the opportunities and risks of foundation models, Center for Research on Foundation Models (CRFM), Stanford Institute for Human-Centered Artificial Intelligence (HAI), Stanford University, July 12, 2022. 

    View in Article
  29. Aldo Lamberti, “Tackling bias in large ML models: The role of synthetic data,” Syntheticus, July 31, 2023. 

    View in Article
  30. Syntheticus, Synthetic data 101: What is it, how it works, and what it’s used for, accessed November 20, 2023.

    View in Article
  31. Suhas Maddali, “How to address data bias in machine learning,” Towards Data Science, July 27, 2022. 

    View in Article
  32. Atsuki Mizuguchi, “Legal issues in generative AI under Japanese law,” Nishimura & Asahi, July 11 2023. 

    View in Article
  33. European Parliament, “Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC,” Official Journal of the European Union, May 17, 2029. 

    View in Article
  34. Ibid.

    View in Article
  35. João Pedro Quintais, “Generative AI, copyright and the AI Act,” Kluwer Copyright Blog, May 9, 2023. 

    View in Article
  36. Kyle Wiggers, “Spawning lays out plans for letting creators opt out of generative AI training,” TechCrunch+, May 3, 2023. 

    View in Article
  37. European Union, The Digital Services Act (DSA), accessed November 20, 2023.

    View in Article
  38. J. Scott Marcus, “Adapting the European Union AI Act to deal with generative artificial intelligence,” Bruegel, July 19, 2023.

    View in Article
  39. EC, “First regulatory sandbox on Artificial Intelligence presented, June 27, 2022; Spanish Ministry of Finance, “Approved statute of the Agencia Española de Supervisión de la Inteligencia Artificial (AESIA).” 

    View in Article

Acknowledgments

The authors would like to thank Nick Seeber, Lukas Kruger, Suchitra Nair, Ben Stanton, Robert MacDougall, Joanne Conway, and Isabel Parker for their contributions to this article.

Cover image by: Manya Kuzemchenko