While conversational AIs like ChatGPT engage us in dialogue, chat-based LLMs are just scratching the surface of what generative AI (GAI) can do. The same way ChatGPT can generate text, other GAI models can generate images, videos, audio, and more. Whether it’s artwork, songs, poetry, or slides for your next presentation, the generative AI applications used for all of these use cases rely on many of the same underlying algorithms, architectures and techniques.
This article will cover the basics of generative AI, and the approaches commonly employed to develop these technologies which are rapidly becoming an important part of the way we live and work.
What makes AI generative?
Many of the most common AI applications are what we would call discriminative. When Google uses trained models to return relevant images when you do a search or when your bank stops someone from using your credit card to make a fraudulent transaction, these are examples of discriminative AI.
Discriminative AI: Understanding the Basics
Discriminative AI, is a type of artificial intelligence that is optimized to make decisions based on existing data. It usually operates in the realm of data classification and prediction, where its core task is to determine which categories or classes a particular piece of data belongs.
At its core, discriminative AI relies on supervised learning. This technique relies on training a model using labeled data. You can think of labeled data as a set of input examples paired with corresponding output labels. For instance, when dealing with image classification, the input is an image, and the output label might indicate what the subject of the image represents, such as “cat” or “dog.”
During training, a discriminative AI model learns to identify patterns and features by evaluating the the input data and correlating it with the output labels. This helps the AI model to create decision boundaries or decision functions that help establish rules for classifying new, unseen data into the correct categories. This decision making process is a key aspect of discriminative AI, and what allows it to perform tasks like image recognition, spam detection, and sentiment analysis reliably and efficiently.
For example, when Google’s image search algorithm analyzes the content of millions of images on the internet, it’s using discriminative AI to classify and rank those images. Every time you perform a query and click on a relevant image, you are also helping to train and improve that AI model. Similarly, when your bank’s fraud detection system sees a suspicious transaction and determines that it’s likely to be fraudulent, it’s leveraging discriminative AI to make that decision.
Generative AI: A Different Paradigm
Generative AI, in contrast, often involves creativity and content generation. While discriminative AI focuses on decision-making and categorization, generative AI is focused on producing new data that is similar to what it has learned during training.
Generative AI models, such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformers, are designed to model and understand the underlying distribution of the data they’ve been exposed to. This knowledge allows them to generate entirely new data samples by mimicking the characteristics it identified in the training data.
For example, a generative AI model trained on a dataset of human faces can create photorealistic faces of people that have never existed. Similarly, in the field of natural language processing, generative AI models like ChatGPT can produce natural sounding and relevant text responses based on a given text input, even though they’ve never seen that exact text before.
The key difference between discriminative and generative AI is their goals. Discriminative AI is intended to make accurate classifications and predictions, while generative AI aims to generate new content that reflects the characteristics it learned about when it was trained.
Now that we’ve explored the fundamentals of discriminative AI and how it operates, let’s delve into deeper into the enabling technologies that make generative AI possible.
Key Technologies in Generative AI
Generative AI relies on several key technologies and concepts to enable content creation across various domains. Understanding these foundational pieces is important if you want to attain a solid grasp on the capabilities and applications of generative models.
Large Language Models (LLMs)
Large Language Models, such as OpenAI’s GPT-3 and GPT-4, are prime examples of generative AI. These models represent major advancements in natural language understanding and generation by learning from a massive corpus of publicly available sources. The invention of LLMs have been foundational in enabling generative AI to understand and generate human-like language, driving new applications such as chatbots, automated content creators, and language translation services.
Transformers are one of the most central technologies that power generative AI. These architecture models are used nearly everywhere in modern natural language processing and generative AI applications. With their attention mechanisms, transformers capture contextual information and are able to identify key relationships between the various parts of an input text and understand the essential meaning of a piece of text. They are essential components that enable tasks like language translation, summarization, and text generation.
GANs (Generative Adversarial Networks)
Generative Adversarial Networks, or GANs, represent one of the most widely used frameworks in the world of generative modeling. GANs consist of two neural networks—the generator and the discriminator—engaged in an ongoing adversarial training process. The generator tries to outfox the discriminator by generating content that’s good enough to trick it into thinking it authentic. GANs are credited with being the key technology that has allowed computers to generate lifelike images, videos, and text, making them one of the most important techniques in creative and artistic applications.
Embedding Models play a fundamental role in generative AI by representing data in lower-dimensional spaces. These algorithms are also one of the key components that allow patterns such as Retrieval Augmented Generation (RAG) which enable developers to augment LLMs with additional relevant context which can product better, more relevant results. This technique facilitates similarity comparisons and content understanding, and can often improve the quality and coherence of generated content.
Variational Autoencoders (VAEs)
Variational Autoencoders (VAEs) are a type of AI that can create new things such as images, video or text. They work by looking at a lot of examples of existing things starting with very simplified versions that contain only the most essential elements. Once the VAE has learned this simplified version of the data, it can then try to generate new examples by sampling from the training data. As it samples, it can construct entirely new content that shares the essential characteristics that it identified during its training.
VAEs are relatively easy to train and can be used to generate data from a variety of modalities. They can also be used to control how different the new things are from the examples they were trained on. VAEs have the potential to be used in many different ways, like creating new art, generating new data for scientists to study, or even translating languages.
Challenges and Limitations
Perhaps one of the most pressing challenges that looms over the horizon is the ethical use of generative AI. The ability to generate convincingly real deepfake videos and manipulate textual content raises a host of ethical dilemmas. From the proliferation of misinformation to potential invasions of privacy and even the risk of causing harm, these concerns demand careful consideration. Striking a balance between harnessing the creative potential of generative AI and ensuring responsible and ethical use stands as a paramount challenge for the field.
Another significant hurdle that generative AI faces is the issue of bias and fairness. These AI models, when trained on vast datasets, can inadvertently absorb biases present in the training data. The result is content generation that may reflect and perpetuate biases related to gender, race, and political ideology.
While major improvements have been made in generating content that appears realistic, the quality and coherence of generative AI output is still sometimes a bit off. Text or images produced by these models can sometimes lack context or exhibit unexpected features, like people with the several extra fingers. The pursuit of consistent quality and coherent details represents a technical challenge that the AI community is continuously working to advance.
Lastly, the resource intensiveness of generative AI poses concerns. Training and fine-tuning AI models require significant levels of computational resources, including power hungry GPUs and extensive datasets. This resource requirement can serve as a barrier for individual developers or smaller organizations eager to delve into the realm of generative AI, potentially limiting its democratization while at the same time creating environmental concerns as these technologies demand large amounts of electricity to operate.
Generative AI is ushering in a new era of computer generated and computer-assisted creativity. From generating art, written content and music to assisting in data augmentation, the applications of this technology are staggering. As we continue to see the new things generative AI can achieve, it’s essential to recognize and address the challenges related to ethics, bias, and quality.
Getting started with generative AI requires curiosity, dedication, and a willingness to learn. By getting familiar with the foundational concepts, exploring open-source tools, and participating in your local AI community, you can start to leverage this technology to change the way you work and to build new applications for your customers.
In a world where your creative potential is limited only by your imagination, generative AI offers a new frontier of human-AI collaboration. As you embark on your generative AI endeavors, remember that the future of creativity lies at your fingertips, waiting to be unlocked.