If you’re spending much time reading about generative AI (GAI), you’ve likely come across the term AI Agent. This term has been around for several decades, but it’s reached almost buzzword status since the public launch of ChatGPT and the resulting surge in interest around GAI. Often, you’ll hear this term used interchangeably with chatbots, but the truth is that the idea of an AI agent is much broader and applicable to many types of AI applications than just chat.
The computer science definition of an AI agent generally describes a computer program that has characteristics such as an ability to make a decision, to perceive its environment and to take specific actions. The word agent in this context is meant to convey the idea that the computer program has agency. In a sense, it has a mind of its own and can make its own decisions based on the information available to it as it perceives its environment.
While this definition is accurate, it’s a bit broader than what you’re likely to encounter within the context of generative AI conversations today. Technically things like recommendation engines, a Roomba vacuum, and the turn-by-turn directions you receive from Google Maps can all be accurately described as AI agents. However, when looking through the lens of generative AI in 2023, the term is generally used to describe a wrapper on top of an AI model that can be used to create new things such as text, images, video and more. In essence, the term agent often is used to describe the application that provides an end user a friendly way of interacting with an underlying model to generate whatever it is that particular model generates.

For example, ChatGPT is often referred to as an agent. It provides a wrapper on top of an AI model, in this case a large language model called gpt-3.5 or gpt-4 as of the time of this writing, and the agent allows users to input some text and produces a text response. Likewise, tools like the DALL-E web interface can be described as an agent wrapping the DALL-E transformer model. Users describe the image they would like to generate and the DALL-E agent generates an image using the underlying model.
Why are AI agents important?
AI agents are important because they serve as the interface between complex machine learning models and the end-users. They can simplify many user tasks and provide a more personalized experiences. They can also enable people to do things they couldn’t otherwise do on their own like allowing someone who isn’t all that artistic to create AI generated artwork or someone who struggles with writing to create effective written communciations.
For businesses, AI agents can improve efficiency by automating customer service tasks, performing data analysis, and handling mundane tasks so employees can focus on higher value activities.
Agent Architectures and Frameworks
When you hear the term agent architecture used in non-academic conversations today, it is typically going to refer to the application architecture used to build AI agent applications.Frameworks like LangChain, LlamaIndex, and Superagent are often described as agent frameworks and those frameworks are used to build AI agent architectures. You can also think of these as application frameworks that just so happen to specialize in applications that interface with generative AI models.
This area is changing frequently and there is no universally agreed upon bill of materials that comprises an agent architecture, however there are common patterns and components that are frequently found in these agent frameworks.

A typical agent architecture will provide a way of accepting user input. This could be via a web UI, voice interface, API, or command line interface, among others. The agent architecture typically has a mechanism to identify and retrieve related context that would be helpful to the large language model (LLM) when generating the output content.
To further help the AI model to respond in the way desired by the application, agent architectures often have prompt templating. A prompt is the input into the AI model. If the model is a large language model like GPT-4, a prompt might be the text “write a country song about how my truck broke down and now I can’t go fishing”. However, when building agent applications, we often need to provide much more context and direction to the AI model.
For example, let’s say we’re building an agent to identify keywords in a block of input text. Rather than just asking the LLM, we can give it examples to help it better understand what we’re asking it to do. In this case, the prompt template could look something like this one, which is taken from the OpenAI documentation:
Extract keywords from the corresponding texts below.
Text 1: Stripe provides APIs that web developers can use to integrate payment processing into their websites and mobile applications.
Keywords 1: Stripe, payment processing, APIs, web developers, websites, mobile applications
##
Text 2: OpenAI has trained cutting-edge language models that are very good at understanding and generating text. Our API provides access to these models and can be used to solve virtually any task that involves processing language.
Keywords 2: OpenAI, language models, text processing, API.
##
Text 3: {text}
Keywords 3:
However, instead of given generic examples, we may have a set of known text blocks and keywords that we can retrieve to improve the results we get from the AI model. In that case, our prompt template may look something like this:
Extract keywords from the corresponding texts below.
{examples}
##
Text 3: {text}
Keywords 3:
And our agent framework would leverage a combination of code and/or configuration to perform a retrieval step. That retrieval step might involve an API call to an external data service, a SQL query against a database, a text search using a search platform like OpenSearch or an approximate nearest neighbor search using a vector database like Astra DB from DataStax or Chroma.
Conclusion
AI agents are a great tool that can be used to build applications powered by generative AI. With the invention of frameworks like LangChain and LlamaIndex, it’s becoming easier to leverage fledgling patterns to combine prompt templating, retrieval mechanisms and other techniques to create experiences which provide accurate results and make GAI capabilities more accessible to a broader set of users.
Add comment