What Is Gen AI

Generative artificial intelligence represents a fundamental paradigm shift in how machines create and process information, moving from systems designed to analyze and predict data toward systems capable of generating entirely novel content across multiple modalities. Generative AI is a system of algorithms or computer processes that can create novel output in text, images or other media based on user prompts. This technology has emerged as one of the most transformative developments in computer science, attracting unprecedented investment and attention from enterprises, governments, and consumers alike. In 2024, generative AI attracted $33.9 billion in global private investment—an 18.7% increase from 2023—and by 2025, enterprises spent $37 billion on generative AI solutions, representing a 3.2-fold year-over-year increase from 2024’s $11.5 billion investment. This explosive growth reflects not merely enthusiasm for a novel technology but rather genuine recognition that generative AI addresses fundamental business needs and opens entirely new possibilities for innovation. Understanding generative AI requires examining its technical foundations, its profound differences from traditional AI systems, its expanding applications across virtually every industry, and the critical challenges of ensuring its responsible development and deployment.

Foundational Concepts and Definitions of Generative AI

The term “generative AI” requires precise definition to distinguish it from the broader artificial intelligence landscape that has evolved over decades. Generative AI is artificial intelligence that can create original content in response to a user’s prompt or request. This definition captures the essential characteristic that differentiates generative AI from other approaches: the capacity to produce novel outputs rather than merely classifying, predicting, or analyzing existing data. Traditional AI systems have focused on reactive tasks—identifying patterns in data, making predictions based on historical information, or classifying inputs into predefined categories. In contrast, generative AI is fundamentally proactive, capable of initiating and creating content that did not previously exist. These systems learn patterns and structures from vast training datasets and then employ those learned representations to synthesize new content that is statistically similar to, but distinct from, the original training data.

The distinction between generative and discriminative AI represents a critical conceptual divide in artificial intelligence. Generative models delve into the underlying distribution of the input data. By understanding the joint probability distribution of this data and labels, they can generate content that resembles the training data, allowing for creative data synthesis and augmentation. Generative models accomplish this by learning the complete probability distribution of their training data, which enables them not only to classify inputs but to generate entirely new samples that reflect the learned distribution. In contrast, discriminative AI models focus on learning the boundary between classes or categories rather than modeling the entire data distribution. Traditional discriminative models excel at maximizing the probability of accurate class predictions by explicitly learning decision boundaries, but they lack the inherent capability to generate new content. This represents more than a technical distinction; it reflects fundamentally different approaches to artificial learning itself.

The emergence of large language models (LLMs) as the public face of generative AI has shaped popular understanding of the technology, though generative AI extends far beyond text generation. Large language models are probabilistic systems that attempt to predict word sequences. These models, including GPT-3, GPT-4, Claude, and others, function as advanced statistical systems that learn to predict the most likely next word in a sequence based on preceding words and context. However, generative AI encompasses multiple modalities and approaches beyond language alone. Generative models can produce images from text descriptions, generate audio and video, create code, synthesize 3D models, and generate data across numerous other domains. This breadth of capability reflects the underlying flexibility of generative approaches—once trained, these systems can be applied to various tasks within their domain through careful prompt engineering and task specification.

The concept of “foundation models” has become central to contemporary generative AI discussion, representing a new paradigm in machine learning. Foundation models (FMs) are large deep learning neural networks that have changed the way data scientists approach machine learning (ML). Rather than develop artificial intelligence from scratch, data scientists use a foundation model as a starting point to develop ML models that power new applications more quickly and cost-effectively. Foundation models are trained on broad, unlabeled datasets and demonstrate remarkable versatility—the same model can be adapted for text generation, image understanding, code completion, and numerous other tasks through relatively modest additional training. A unique feature of foundation models is their adaptability. These models can perform a wide range of disparate tasks with a high degree of accuracy based on input prompts. This adaptability represents a dramatic departure from earlier machine learning paradigms, where separate specialized models were typically required for each task.

Technical Architecture and Mechanisms Enabling Generative AI

Understanding how generative AI functions requires examining the neural network architectures and training methodologies that enable these systems to learn and generate content. The foundation of modern generative AI rests on developments in deep learning, specifically neural networks with multiple layers that automatically extract hierarchical features from data. Neural networks are made up of node layers, an input layer, one or more hidden layers and an output layer. Each node is an artificial neuron that connects to the next, and each has a weight and threshold value. The depth and complexity of these networks—hence the term “deep learning”—allows them to capture increasingly abstract and sophisticated patterns in data. Through training on vast datasets, these neural networks learn internal representations that capture the essential structure and characteristics of the training data.

The transformer architecture represents the most significant breakthrough enabling contemporary generative AI systems, fundamentally changing how language models process and generate information. Transformers, introduced by Google in 2017 in a landmark paper “Attention Is All You Need,” combined the encoder-decoder architecture with a text-processing mechanism called attention to change how language models were trained. The transformer architecture solves a critical problem that plagued earlier approaches: the inability to effectively model long-range dependencies in sequences of data. Transformers also have a self-attention mechanism. Essentially, with a self-attention mechanism, each word “knows about” all the other words in the passage and how they are related. This mechanism allows each token (word fragment) in a sequence to directly attend to and establish relationships with all other tokens in that sequence simultaneously, rather than processing information sequentially as in earlier recurrent neural network approaches.

The transformer architecture comprises several key components that work together to enable sophisticated language understanding and generation. Encoders compress a dataset into a dense representation, arranging similar data points closer together in an abstract space. Decoders sample from this space to create something new while preserving the dataset’s most important features. Encoders take raw input data and transform it into numerical representations called embeddings that capture the semantic meaning and relationships within that data. These embeddings arrange related concepts in nearby locations in high-dimensional mathematical space, allowing the model to understand relationships between words and concepts. Decoders then use these representations, along with previously generated outputs, to successively predict and generate each subsequent token in a response.

The process through which transformers generate text begins with tokenization and embedding of input data, followed by iterative prediction through multiple layers of neural computation. The input provided by the user is used by the AI model to build an output. The input is first parsed into a form of data that the model can understand. The model then uses that data to identify matching patterns from its training that it combines to build the final output. When a user provides a prompt, the model first converts this text into tokens—small pieces of text like individual words or subword fragments—and represents each token as a high-dimensional numerical vector called an embedding. These embeddings are then passed through multiple transformer layers that apply self-attention mechanisms, allowing each token to establish relationships with all other tokens in the input sequence. The model then generates output one token at a time, with each new token generated based on all previous tokens plus the learned patterns from the training data.

Different configurations of transformer architectures support different generative tasks, leading to different families of language models with distinct capabilities. Language transformers fall into three main categories: encoder-only models, decoder-only models, and encoder-decoder models. Encoder-only models like BERT are primarily designed for understanding tasks where the model must bidirectionally process context to extract meaning and answer questions about text, making them less suitable for open-ended generation. Decoder-only models like the GPT family of models are trained to predict the next word without an encoded representation. GPT-style models operate by predicting each subsequent word based only on preceding words, making them naturally suited for open-ended text generation tasks where the model must autonomously decide what content to produce. Encoder-decoder models combine capabilities of both approaches, understanding full bidirectional context while also maintaining the ability to generate novel sequences, making them well-suited for tasks like translation where both understanding and generation are essential.

The training of generative models involves exposing them to enormous datasets and allowing them to learn statistical relationships through processes like self-supervised learning. Through fill-in-the-blank guessing games, the encoder learns how words and sentences relate to each other, building up a powerful representation of language without anyone having to label parts of speech and other grammatical features. During pre-training, language models are exposed to billions or trillions of tokens from internet text, books, code repositories, and other sources. The model learns by attempting to predict masked or obscured portions of text, gradually building sophisticated understanding of language structure and meaning. This self-supervised approach is powerful because it requires no manual labeling—the training signal comes directly from the structure of language itself.

Once pre-trained, generative models can be further refined through additional training techniques that improve their performance on specific tasks and align their outputs with human preferences. Instruction-tuning, introduced with Google’s FLAN series of models, has enabled generative models to move beyond simple tasks to assist in a more interactive, generalized way. Instruction-tuning exposes models to examples pairing instructions with appropriate responses, teaching models to interpret and follow diverse types of requests. Furthermore, alignment refers to the idea that we can shape a generative model’s responses so that they better align with what we want to see. Reinforcement learning from human feedback (RLHF) is an alignment method popularized by OpenAI that gives models like ChatGPT their uncannily human-like conversational abilities. Through RLHF, human evaluators rate different model responses for quality, and these ratings guide further model training through reinforcement learning techniques. This approach produces systems like ChatGPT that generate text that is not merely statistically probable given training data but also aligned with human preferences and values.

Foundation Models and Large Language Models Driving Generative AI Innovation

The rapid evolution of foundation models represents one of the most striking technological developments in recent years, with successive generations achieving increasingly sophisticated capabilities while simultaneously growing vastly larger in scale. GPT-3, at 175 billion parameters, was the largest language model of its kind when OpenAI released it in 2020. Other massive models — Google’s PaLM (540 billion parameters) and open-access BLOOM (176 billion parameters), among others, have since joined the scene. These enormous models demonstrate that increasing scale brings dramatic improvements in language understanding and generation capabilities. However, contemporary research reveals that scale alone does not determine capability; architecture innovations, training data quality, and training methodology also profoundly influence model performance. For example, BERT, one of the first bidirectional foundation models, was released in 2018. It was trained using 340 million parameters and a 16 GB training dataset. In 2023, only five years later, OpenAI trained GPT-4 using 170 trillion parameters and a 45 GB training dataset. This extraordinary growth in both model scale and training dataset size reflects intense competition and investment in developing increasingly capable systems.

Major foundation models available today exhibit different architectural approaches and capabilities that make them suitable for different applications and use cases. BERT is a bidirectional model that analyzes the context of a complete sequence then makes a prediction. It was trained on a plain text corpus and Wikipedia using 3.3 billion tokens (words) and 340 million parameters. BERT’s bidirectional approach makes it particularly effective for understanding tasks where context from both directions around a word is valuable. GPT models, by contrast, employ unidirectional (left-to-right) generation, making them naturally suited for tasks requiring text generation. The popular ChatGPT chatbot is based on GPT-3.5. And GPT-4, the latest version, launched in late 2022 and successfully passed the Uniform Bar Examination with a score of 297 (76%). GPT-4’s performance on complex professional exams demonstrates the sophistication these models have achieved in reasoning and knowledge application.

Beyond text-focused language models, foundation models have expanded into multimodal systems that combine understanding and generation across text, images, audio, and video. Multimodal generative AI engenders a class of systems from the artificial intelligence category capable of understanding and generating content in several media; be it text, images, audio, or video. Multimodal models operate by encoding different types of data into compatible numerical representations that allow the model to reason across modalities. For instance, a multimodal model might accept a text description and generate an image, or analyze an image and generate descriptive text. DALL-E 3 is the latest text-to-image model from OpenAI that stands out for its ability to generate intricate outputs from complex prompts that effectively reflect emotions. These systems require training on datasets containing paired examples of different modalities—such as images with accompanying captions—so the model learns to associate concepts across different types of data representation.

Video generation represents an emerging frontier in generative AI, with models demonstrating remarkable capability to create coherent, realistic videos from text descriptions. Sora is our text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. Sora demonstrates that foundation models can be extended to temporal domains, generating sequences of frames that maintain visual coherence and accurately represent physical processes described in text prompts. Sora is a diffusion model, which generates a video by starting off with one that looks like static noise and gradually transforms it by removing the noise over many steps. This approach differs from autoregressive text generation; instead, diffusion models iteratively refine noisy outputs toward coherent final products.

The economic landscape of foundation models reflects enormous concentration of both capabilities and investment among a small number of leading organizations. Enterprise spending on generative AI in 2025 shows clear market leaders in model provision and clear trends in how organizations deploy these technologies. The largest share, $19 billion, went to the user-facing products and software that leverage underlying AI models, aka the application layer. This represents more than 6% of the entire software market, all achieved within three years of ChatGPT’s launch. Application layer products—chat interfaces, content generation tools, customer service systems—represent the most immediate consumer-facing uses of foundation models. Behind these applications lies the infrastructure of foundation model APIs and model training infrastructure. Foundation model APIs ($12.5 billion) power the intelligence behind all AI applications. Model training infrastructure ($4.0 billion) enables frontier labs and enterprises to train and adapt models. The dominance of a handful of organizations—OpenAI, Anthropic, Google, Meta—in developing and operating these large models creates significant concentration of power over the technology that is increasingly fundamental to business operations.

Diverse Applications of Generative AI Across Industries

The breadth of applications for generative AI has expanded dramatically as the technology has matured and as organizations have discovered new use cases. Generative artificial intelligence has applications in diverse industries such as health care, manufacturing, software development, financial services, media and entertainment, and advertising and marketing. Rather than representing isolated innovations, these applications reflect generative AI becoming woven into the operational fabric of organizations across sectors. Healthcare represents one of the most dynamic sectors for AI adoption. Generative AI can augment medical images like X-rays or MRIs, synthesize images, reconstruct images, or create reports about images. This technology can even generate new images to demonstrate how a disease may progress in time. By enhancing existing medical images or generating synthetic examples, generative AI helps doctors visualize pathologies and supports diagnostic accuracy. Additionally, AI applications support personalized medicine by analyzing patient data to recommend tailored treatment protocols.

The financial services industry has rapidly integrated generative AI into operations, particularly for fraud detection, algorithmic trading, and risk management. In the finance sector, Gen AI is crucial in fraud detection and risk management. AI models analyze transaction data in real-time to identify fraudulent activities. Real-time fraud detection protects institutions from losses while providing security to customers. Algorithmic trading, driven by AI, has become a dominant force in financial markets. AI algorithms can analyze large volumes of data and make split-second trading decisions. These algorithmic systems make investment decisions at speeds and with information processing capabilities far beyond human capability. The legal profession has emerged as an early and enthusiastic adopter of generative AI for automating document review and legal research. Gen AI is being used by law firms to automate laborious processes like contract review. AI-powered platforms can analyze complex legal documents, extract relevant information, and flag potential risks. Automated contract analysis reduces the time attorneys must spend on routine tasks, freeing them to focus on strategic advice and complex negotiation.

Software development has emerged as one of the dominant use cases for generative AI within enterprise settings. Departmental AI spending hit $7.3 billion in 2025, up 4.1x year over year. Although coding captures more than half of departmental AI spend at $4 billion, the technology is gaining traction across many enterprise departments. This represents genuine cost savings—generative AI dramatically accelerates development by generating code, explaining existing codebases, and automating routine programming tasks. Beyond traditional departmental applications, vertical-specific AI solutions have emerged for industries with specialized requirements. Vertical AI solutions captured $3.5 billion in 2025, nearly 3x the $1.2 billion invested in 2024. When segmented by industry, healthcare alone captures nearly half of all vertical AI spend—approximately $1.5 billion, more than tripling from $450 million the year prior. Healthcare’s dominant share in vertical AI spending reflects the sector’s recognition that AI can drive significant value through applications tailored to healthcare’s specific workflows, regulatory environment, and data characteristics.

Marketing and advertising professionals have adopted generative AI to dramatically accelerate content creation. Generative AI can help marketing professionals create consistent, on-brand text and images to use in marketing campaigns. Rather than employing human designers for every marketing asset, organizations use generative AI to rapidly prototype and generate marketing materials. Gartner predicts that marketing professionals will use generative AI to create 30 percent of outbound marketing materials by 2025. This represents a fundamental shift in how marketing departments operate, with AI becoming as essential to the marketing technology stack as email and social media platforms. Real estate, product design, and other visual design domains benefit from generative AI’s ability to create photorealistic visualizations. Real estate visualization, product design and digital twin manufacturing, virtual tourism and simulation training represent key applications where generative AI produces images and 3D models from descriptions, allowing designers and marketers to rapidly iterate on concepts before committing to production.

Manufacturing and supply chain operations have integrated generative AI for optimization and predictive analytics. Using generative AI, engineers and project managers can work through the design process much faster by generating design ideas and asking the AI to assess ideas based on the constraints of the project. Rather than hand-drafting designs, engineers can rapidly generate multiple design variants and have AI systems evaluate them against project constraints. Predictive maintenance represents another critical manufacturing application. Maintenance professionals can use generative AI to track the performance of heavy equipment based on historical data, potentially alerting them to trouble before the machine malfunctions. By analyzing equipment performance patterns, AI systems predict maintenance needs before failures occur, reducing costly downtime and extending equipment lifetime.

The expansion of AI applications has generated corresponding growth in AI-related infrastructure and tools designed to help organizations optimize model performance. Prompt engineering has emerged as a critical skill enabling users to achieve superior results from generative AI systems. Prompt engineering is the process where you guide generative artificial intelligence (generative AI) solutions to generate desired outputs. Even though generative AI attempts to mimic humans, it requires detailed instructions to create high-quality and relevant output. Effective prompts specify the task clearly, provide relevant context, establish the desired output format, and sometimes include examples of desired responses. Advanced prompting techniques have been developed to enhance model reasoning and output quality. Tree-of-thought prompting, complexity-based prompting, generated knowledge prompting, and directional-stimulus prompting represent sophisticated approaches to structuring prompts to guide models toward high-quality outputs across diverse tasks.

Capabilities, Limitations, and Critical Challenges of Generative AI

While generative AI has demonstrated remarkable capabilities, the technology exhibits significant limitations that constrain its applicability and reliability for certain tasks and domains. One of the most persistent and consequential limitations is the tendency of generative models to produce outputs containing false or misleading information with apparent confidence. Generative AI models can produce false or misleading content, also known as hallucinations. These hallucinations often sound confident and authoritative, increasing the risk that users will trust them as reliable sources of information. The term “hallucination” captures the unsettling phenomenon where AI systems generate plausible-sounding but entirely fabricated information, citations, quotes, and facts. A notable legal case illustrates the practical dangers: In this case, a New York attorney representing a client’s injury claim relied on ChatGPT to conduct his legal research. The federal judge overseeing the suit noted that the opinion contained internal citations and quotes that were nonexistent. This example powerfully demonstrates that sophisticated language and apparent confidence provide no guarantee of accuracy.

The fundamental architecture of generative models renders them inherently susceptible to producing hallucinations because of how they function. Generative AI systems are not fact databases; they statistically model how words tend to appear together based on their training data. The model’s primary objective during generation is predicting the statistically most likely next token given previous tokens and learned patterns, not verifying factual accuracy. Generative AI models function like advanced autocomplete tools: They’re designed to predict the next word or sequence based on observed patterns. Their goal is to generate plausible content, not to verify its truth. That means any accuracy in their outputs is often coincidental. Because the model learned from vast amounts of internet text that contains both accurate and inaccurate information, it can seamlessly reproduce errors, misconceptions, and biases present in training data.

Training data quality and composition directly influence the biases and limitations evident in model outputs. Generative AI models are trained on vast amounts of internet data. This data, while rich in information, contains both accurate and inaccurate content, as well as societal and cultural biases. The internet data upon which these models are trained reflects real-world biases—stereotypes about gender, race, nationality, and other demographic characteristics are prevalent in text and images online. When models learn from this data, they inevitably absorb and can reproduce these biases. Generative AI tools can exhibit bias, and bias can happen at different stages of development. Bias in generative AI is not a new issue, but rather a continuation of problems within machine learning and algorithmic system development. Bias emerges not just from training data but also from architectural choices about how models are trained and evaluated, from decisions about what data to include or exclude, and from the values embedded in evaluation criteria.

Real-world examples demonstrate tangible harms from biased generative AI outputs. A 2023 analysis of more than 5,000 images created with the generative AI tool Stable Diffusion found that it simultaneously amplifies both gender and racial stereotypes. Image generators trained on internet photographs learn and reproduce stereotypes present in those images. When asked to generate images of “a CEO” or “a nurse,” models are more likely to generate images reflecting gender stereotypes because historical data contains more images of male CEOs and female nurses. These biases extend beyond mere stereotypes to have concrete harms. For instance, adding biased generative AI to “virtual sketch artist” software used by police departments could “put already over-targeted populations at an even increased risk of harm ranging from physical injury to unlawful imprisonment.” When biased AI systems are deployed for consequential decisions like criminal investigation, the harms can be severe.

Generative AI systems exhibit inconsistency and unpredictability in their outputs—identical inputs can produce markedly different responses. This can happen more frequently than some other AI-based outputs. When provided the same input, outputs can vary quite significantly due to the statistical nature of the models. This variation, while sometimes beneficial for creative applications where diversity is desired, poses challenges for applications requiring reliable, reproducible behavior. The output of generative AI can be inconsistent. This can pose several challenges when there is a need to rely on predictable, repeatable behaviour. However, by utilising appropriate prompt engineering (discussed later), it is possible to enforce a level of consistency of response. Users can reduce inconsistency through careful prompt engineering that specifies output formats and constraints, but some inherent variability remains due to the probabilistic nature of model outputs.

The rapid emergence of new AI-generated content as part of training data for subsequent models creates a dangerous feedback loop where errors and biases can compound over generations. In addition, as generative AI output becomes more common, it is likely to form a bigger part of the training data used by generative AI models and generative AI such systems will progressively base more and more of their output on input from themselves rather than new creative content produced by humans. As AI-generated content pollutes training datasets, models trained on this data inherit errors and biases from previous AI generations, potentially amplifying distortions. Where training data is flawed, for example through hallucination or bias, the output can become more restricted, less reliable and less reflective of the real world. This represents a potential trajectory where successive generations of AI models become increasingly divorced from reality as errors accumulate through multiple generations of training.

Information accuracy challenges extend to models’ ability to handle current events and up-to-date information. The output of generative AI may not reflect the real world. Generative AI models can also struggle in fast-paced environments where up-to-date knowledge is critical, as the weighting of training data will invariably be towards older and potentially redundant information. Models trained on historical data cannot know about events occurring after their training data cutoff. For example, GPT-4’s developers explicitly state the model generally lacks knowledge of events after September 2021. Applications requiring current information—like news analysis, market commentary, or scientific research—are inherently limited when relying solely on static, pre-trained models.

Addressing hallucinations and accuracy challenges requires multiple complementary approaches, including retrieval-augmented generation (RAG), which connects models to external knowledge sources. Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. RAG systems enhance model accuracy by retrieving relevant information from external sources before generating responses, providing factual grounding for outputs. However, RAG itself can propagate biases if knowledge bases are biased. The very same techniques we might use to reduce hallucination – RAG, fine-tuning, knowledge graphs – are dependent on data, data that can easily be biased to reinforce specific views; one person’s truth is after all another’s hallucination. This underscores that technical solutions to AI limitations cannot be separated from governance, curation, and ongoing human oversight.

Ethical Considerations, Risks, and Responsible AI Development

The rapid deployment of generative AI at scale has raised profound ethical concerns spanning bias, misinformation, copyright infringement, privacy violations, environmental impact, and labor exploitation. These concerns are not peripheral to the technology but central to questions about whether and how generative AI should be deployed in society. The development and training of generative AI relies on human labor in ways that are often invisible to end users but raise serious ethical questions. Generative AI training and improvement depends upon human labor in multiple ways. These systems are based on massive datasets of human-made works scraped from the internet. Training and improving these models requires people (data workers) to annotate data and review and rate outputs. Thousands of workers globally perform tasks like labeling images, writing captions, and rating AI-generated content to train and improve models. Datasets and outputs can include depictions of violence, self-harm, abuse, and other traumatizing content. Companies employ people as contract or gig workers to describe, rate, and review materials, and these positions are low wage, high pressure, and precarious. These data workers often labor in exploitative conditions for minimal compensation while exposure to harmful content takes psychological toll.

Copyright and intellectual property concerns have become central to legal and ethical debates surrounding generative AI. Generative AI foundation models use a large amount of data, some of which may be subject to intellectual property (IP) protection. New content produced by generative AI could breach IP laws and this may not always be visible to the user. Training models like DALL-E and GPT required ingesting enormous quantities of copyrighted works scraped from the internet without artist or author permission or compensation. Generative AI models are trained on large datasets crawled from the web, including artworks, performances, books, essays, and other materials created by humans. These creators were not notified that their work was being ingested and were not provided an opportunity to refuse. This raises profound questions about consent, fair compensation, and creative artists’ right to control how their work is used.

Legal challenges to generative AI companies regarding copyright infringement have multiplied as artists, authors, and media companies seek to establish that unauthorized use of their work for model training constitutes infringement. Multiple lawsuits have been initiated against generative AI companies regarding copyrighted data used for model training, alleging the defendants infringed copyright by ingesting works for AI models that are subsequently capable of generating outputs that mimic, compete with or reproduce their works. The question of fair use—whether use of copyrighted material for model training constitutes fair use exemption to copyright law—remains contested in courts. To evaluate fair use exception specific to AI, we consider a court’s analysis of the traditional four factors through the lens of the recent case, Thomson Reuters v. Ross Intelligence. This case provides pivotal insights into how future courts may approach copyrighted works used in AI training models. As courts address these questions, they will establish precedents determining whether artists and authors have rights to control and be compensated for use of their work in AI training.

Privacy concerns pervade generative AI systems in multiple dimensions—from training data collection to model outputs to user interactions with AI systems. For example, generative AI tools trained with data scraped from the internet may memorize personal information about people, as well as relational data about their family and friends. Models can inadvertently memorize and reproduce personal information from training data, including sensitive information never intended for public sharing. When users interact with generative AI platforms, there’s the risk of others using our data and AI tools for anti-social purposes. Already, bad actors are using AI voice cloning to impersonate people and then extort them over good old-fashioned phones. This data helps enable spear-phishing and fraud attacks. Furthermore, many generative AI platforms collect user prompts and interactions for model improvement, Although some generative AI tools allow users to set their own data retention policy, many collect user prompts and other user data, presumably for training data purposes.

The environmental costs of training and operating generative AI systems represent a critical but often overlooked dimension of the technology’s impact. The massive computational resources required to train and operate large models generate substantial energy consumption and carbon emissions. Training the bigger, more popular AI models like GPT-3 produced 626,000 pounds of carbon dioxide, equivalent to approximately 300 round-trip flights between New York and San Francisco. This represents the carbon footprint of training a single model version; as companies release new and larger models, each training run requires comparable resource expenditure. By 2030, data centers are predicted to emit triple the amount of CO2 annually than it would have without the boom in AI development. The amount of GHG emissions predicted, 2.5 billion tonnes, equates to roughly 40% of the U.S’s current annual emissions. The projected AI-related carbon emissions by 2030 would rival the current total carbon emissions of major nations.

Beyond carbon emissions, data centers require enormous water consumption for cooling, straining local water resources. Google’s data centers used around 5 billion gallons of fresh water for cooling in 2022 which represented a 20% increase from 2021. Additionally, it is estimated that water usage from AI could reach somewhere around 1.7 trillion gallons of water by 2027. This intensive water use places strain on communities where data centers are located, particularly in water-stressed regions. Furthermore, the rapid growth of AI infrastructure demands exponential increases in rare earth minerals and semiconductors used in computing equipment. Another area in which generative AI can harm the environment is through e-waste, where one study found that the e-waste generation of generative AI will grow at a rapid pace – 16 million tons of cumulative waste by 2030.

Responsible AI frameworks have emerged to address these concerns and guide ethical development and deployment of generative AI systems. Organizations that use AI ethically follow five key principles: fairness, transparency, accountability, privacy, and security. Fairness requires addressing bias in training data and model behavior to ensure AI systems treat different demographic groups equitably. Transparency demands that organizations clearly communicate when and how AI is being used, what data informs decisions, and what limitations AI systems have. Accountability requires establishing clear responsibility for AI system outcomes and harms. Privacy protection involves handling personal data ethically and following applicable data protection regulations. Security encompasses protecting AI systems and training data from unauthorized access and adversarial attacks.

Implementing responsible AI requires institutional mechanisms with enforcement power. A governance mechanism tends to be more valuable than an AI framework. When there’s no group or individual directly responsible for enforcing policies or ensuring they’re being followed, organizations can easily slide into unethical or irresponsible AI behaviors. Organizations establishing responsible AI governance typically designate governance bodies—boards, councils, or appointed individuals—with authority to oversee AI development, implementation, and deployment. These governance mechanisms must have “teeth”—real consequences for violations—or they become mere compliance theater. Furthermore, governance must evolve continuously as technology and societal understanding of AI impacts changes. Moreover, since AI is developing so rapidly, a policy from as little as six months ago could be inadequate.

Regulatory frameworks at governmental and international levels are rapidly developing to establish rules for AI development and deployment. The AI Act is a European regulation on artificial intelligence (AI) – the first comprehensive regulation on AI by a major regulator anywhere. The Act assigns applications of AI to three risk categories. First, applications and systems that create an unacceptable risk, such as government-run social scoring of the type used in China, are banned. Second, high-risk applications, such as a CV-scanning tool that ranks job applicants, are subject to specific legal requirements. The EU AI Act represents the first major regulatory framework attempting to establish enforceable standards for AI development and deployment. Other jurisdictions are developing their own AI regulations, though considerable variation exists in regulatory approach and stringency. In 2024, U.S. federal agencies introduced 59 AI-related regulations—more than double the number in 2023—and issued by twice as many agencies. This proliferation of regulations reflects growing governmental attention to AI governance, though fragmented regulatory approaches create challenges for global organizations attempting to comply with multiple jurisdictions.

Emerging Frontiers: Multimodal AI and Agentic Systems

Generative AI continues evolving beyond text-based language models toward increasingly sophisticated multimodal systems capable of processing and generating multiple types of data simultaneously, and toward agentic systems capable of autonomous planning and action. Multimodal models represent the next frontier in generative AI development, integrating understanding and generation across text, images, audio, video, and potentially other modalities. Multimodal generative models function by amalgamating different neural networks trained on various types of data. For instance, it may be a combination of convolutional neural networks for images, recurrent neural networks for text, and transformers for context understanding and generating coherent outputs across the modalities. These systems learn associations between different modalities during training on datasets containing paired examples—images with captions, videos with transcripts, etc. Once trained, they can reason across modalities, accepting input in one form and generating output in another.

The technical implementation of multimodal reasoning requires encoding all input types into compatible numerical representations that the model can process jointly. Data Integration and Representation: First come the input of the multimodal model, coming from heterogeneous sources; these are processed. This usually means that for each type of direct source-type text, images, or audio-parameters for instrumentalization into an abstract numerical representation are constructed and implemented so that the AI can begin to ‘understand’. Text is converted into embeddings using transformer-based approaches like GPT or BERT; images are processed through convolutional neural networks that extract visual features; audio is converted into spectrograms or other feature representations. Information alignment between different data types is a must to achieve modality coherence. It requires deep contextual knowledge with intricate training methodologies. The model must learn that the word “sunset” relates to images showing sunsets, that spoken descriptions align with corresponding scenes, and that visual and auditory information combine coherently.

Contemporary multimodal AI systems have demonstrated remarkable capabilities including generating videos from text descriptions, analyzing images and generating accompanying text, interpreting documents with mixed text and images, and accepting voice input while generating both spoken and visual responses. OpenAI’s GPT-4o integrates speech recognition, vision, and text generation, allowing users to hold natural spoken conversations with real-time visual analysis of objects or documents. Multimodal systems like GPT-4o enable new interaction modalities where users can interleave voice, images, and text in natural conversation while the AI reasons across all these inputs. Image generation has become increasingly sophisticated, with models like DALL-E 3 and Midjourney producing photorealistic images accurately reflecting complex text descriptions. Text-to-video generation, exemplified by OpenAI’s Sora, represents an emerging capability where models generate coherent, cinematically sound video sequences maintaining character consistency across scenes.

Beyond the current generation of multimodal systems lies the emerging concept of agentic AI—AI systems capable of autonomous planning, action, and adaptation to changing circumstances. Autonomous AI agents are intelligent systems designed to work independently, performing tasks such as campaign planning and client collaboration without constant human oversight. They leverage machine learning to adapt and execute complex strategies, enhancing efficiency and creativity in marketing initiatives. Rather than requiring users to specify each step of a task, agentic systems can understand high-level objectives and autonomously determine appropriate sequences of actions to achieve them. These agents integrate large language models with tools, reasoning capabilities, and memory, enabling them to plan multi-step workflows and adapt when circumstances change.

The emergence of agentic AI reflects foundational improvements in language model capabilities that make autonomous operation more feasible. “The big thing about agents is that they have the ability to plan. They have the ability to reason, to use tools and perform tasks, and they need to do it at speed and scale.” Contemporary large language models can engage in multi-step reasoning, use external tools to gather information or take actions, and maintain memory of previous interactions. Better, faster, smaller models, chain-of-thought (COT) training, increased context windows, and function calling represent four developments enabling agentic capabilities in current-generation models. Smaller models have become increasingly capable, reducing the computational requirements for autonomous agents. Chain-of-thought training teaches models to reason through complex problems step by step rather than jumping to conclusions. Larger context windows allow agents to maintain longer histories of interactions and reasoning chains. Function calling enables models to invoke external tools and APIs to gather information or execute actions.

Enterprise adoption of agentic AI is expected to accelerate dramatically as organizations recognize value in AI systems that can autonomously handle complex workflows. However, agentic systems raise distinct governance and accountability challenges compared to simpler generative AI applications. These systems must be rigorously stress-tested in sandbox environments to avoid cascading failures. Designing mechanisms for rollback actions and ensuring audit logs are integral to making these agents viable in high-stakes industries. Autonomous systems operating without direct human oversight create new failure modes where errors can cascade and compound. If an agent makes an incorrect decision that initiates a series of actions, the consequences can rapidly spiral beyond human ability to intervene.

Looking forward, expert consensus suggests agentic AI will likely evolve toward increasingly sophisticated multi-agent orchestration where specialized AI agents collaborate under coordination of higher-level orchestrator models. The “new normal” envisioned by this narrative sees teams of AI agents corralled under orchestrator uber-models that manage the overall project workflow. Enterprises will use AI orchestration to coordinate multiple agents and other machine learning (ML) models working in tandem and using specific expertise to complete tasks. This architecture would allow specialized AI agents optimized for particular functions—research, planning, execution, verification—to collaborate under oversight of orchestration systems that manage their interactions. However, as individual agents become more capable, the trajectory may eventually shift toward single highly-capable agents handling end-to-end workflows. As those individual agents get more capable, you’re going to switch toward saying, ‘I’ve got this agent that can do everything end-to-end.’ The ultimate architecture for autonomous AI systems remains uncertain and will likely evolve as the technology matures and organizations gain experience deploying agents in practice.

Gen AI: From Understanding to Innovation

Generative AI represents a fundamental transformation in artificial intelligence, shifting from systems designed to analyze existing data toward systems capable of creating novel content across multiple modalities. From its technical foundations in transformer architectures and self-supervised learning, to its proliferation across industries from healthcare to legal services to creative fields, generative AI has become a technology influencing virtually every sector of human activity. The technology exhibits remarkable capabilities—generating human-like text, photorealistic images, coherent videos, and increasingly sophisticated reasoning—while simultaneously displaying significant limitations including hallucination, bias, and opaque decision-making processes.

The path forward for generative AI development must prioritize responsible innovation that balances enthusiasm for technological advancement with serious attention to governance, equity, and ethical implications. Organizations deploying generative AI should establish robust governance frameworks with enforcement mechanisms ensuring fairness, transparency, accountability, privacy, and security. Regulatory frameworks must evolve to establish minimum standards for AI development and deployment while remaining sufficiently flexible to accommodate rapid technological change. Investment in addressing hallucinations, bias, and other technical limitations should receive priority comparable to efforts to increase model capabilities. Environmental impacts must be measured, disclosed, and mitigated through investments in energy efficiency and renewable energy infrastructure. Labor practices surrounding AI training and development—particularly for data workers—must improve dramatically to meet ethical standards of fair compensation and working conditions.

The emergence of multimodal AI and agentic systems signals that generative AI is still in early development stages despite its already transformative impact. More sophisticated AI systems capable of autonomous action, reasoning across modalities, and adapting to complex real-world environments are rapidly approaching deployment. These developments make governance, accountability, and responsible development practices even more critical. Rather than viewing generative AI governance as an impediment to innovation, organizations and societies should recognize that sustainable long-term deployment of AI technologies depends on building public trust through demonstrable commitment to ethical development and deployment practices.

Ultimately, generative AI is a tool—powerful, increasingly sophisticated, and with consequences that ripple through society in ways both visible and invisible. Like all consequential technologies, its ultimate impact depends not simply on technical capabilities but on decisions about how to develop, deploy, and govern it. As generative AI becomes increasingly central to how humans work, create, learn, and make decisions, ensuring that its development and deployment reflects human values, protects vulnerable populations, and contributes to broad human flourishing becomes an essential challenge. Meeting this challenge requires sustained commitment from technologists, business leaders, policymakers, and society as a whole to keeping human agency, dignity, and wellbeing at the center of generative AI development and deployment.