AI agents, the next step in the evolution of GenAI

September 10, 2024

3 minute read

Written byPhilip D'SouzaSenior Machine Learning Engineer

In a previous article, I discussed three approaches to unlocking more value out of generative AI (GenAI). Here I want to go further and discuss the emergence of AI agents (or LLM agents). This is a shift towards compound AI systems that can work independently to tackle more complex tasks and problems. Experts predict that AI agents "will likely be the best way to maximize AI results in the future, and might be one of the most impactful trends in AI in 2024".

When you think of using the most popular LLM tools, such as ChatGPT, you can ask it a question or prompt such as 'Tell me about X" or "Write me a draft email about Y". And, as long as it has the required knowledge in its training data - it will provide an answer or solution.

To tackle more complex or involved tasks, GenAI needs to behave more like you, as a human, would. For example, you would first stop to consider the task and think of a plan to address it, perhaps with a multi-step approach. This solution might include consulting some external resources - such as searching the web or looking at a specialist online database. You would assess your progress at each stage and see if you are going in the right direction and adjust your strategy as needed. If you get stuck, you will rewind and do things a little differently until you've managed to come up with a satisfactory solution or answer.

What are the key resources an AI agent needs?

AI agents operate similarly, often as part of agentic workflow implementations where they function in the background, automating tasks and enhancing productivity. But to do this, they need to combine the following attributes or resources:

Reasoning: In an AI agent, the model becomes the "brain" that is in charge of the control logic that decides how to perform a task or solve a problem. It will reason and break down the solution into several steps. Popular techniques that an AI agent can use to break down tasks include Chain of Thought (CoT) and Tree of Thoughts (ToT) reasoning. These can be categorized as single-path reasoning and multi-path reasoning, respectively.
Acting: The model will have access to external tools and resources, allowing it to act. For example, it could be given the ability to search the web or specific public databases, or be given access to a calculator. It might have access to another LLM (for example, a specialist model that translates between languages). The LLM itself can define when and how to use these tools to solve a problem.
Feedback: A popular method for this feedback mechanism is ReAct. ReAct combines reasoning and acting aimed at enabling an LLM to solve complex tasks by "interleaving between a series of steps (repeated N times): Thought, Action, and Observation". Using the ReAct framework, the LLM receives feedback from the environment in the form of observations, which allows it to refine and adjust its actions, enhancing completion rates. Other types of feedback can include human-in-the-loop and model feedback". Frameworks like Autogen and CrewAI provide automated and integrated feedback loops into agent interactions.
Memory: The AI agent needs to be able to access memory. This allows the AI agent to maintain context over longer interactions, crucial for tasks requiring continuity and learning from past interactions. For example, the model can figure out a plan to try and tackle a problem at the start - and it needs access to memory recall this plan while it goes through the different steps.

How would an AI agent solve a problem?

This diagram illustrates how an AI Travel Agent might plan a weekend trip. The agent uses several tools:

Calendar API to check the user's available dates.
Weather Forecast API to get weather information for potential destinations.
Travel Booking API to search for flights and hotels.

The AI agent processes information from these tools to create an itinerary. It then presents the trip plan to the user and can refine it based on feedback. This example demonstrates how AI agents can integrate multiple data sources and tools to solve complex tasks efficiently.

Travel agent booking API process

Med-Palm 2, developed by Google, exemplifies AI agent potential in healthcare. This advanced system analyzes patient data alongside comprehensive medical databases.

It generates precise diagnostic suggestions and treatment recommendations, significantly enhancing diagnostic accuracy. By processing vast amounts of medical information, Med-Palm 2 assists healthcare professionals in making more informed decisions, potentially improving patient outcomes.

Med-Palm 2 AI agent flowchart

AI agent-powered financial analysis tools, like those used by major institutions, showcase AI's impact on finance. These sophisticated systems process enormous volumes of financial data, market trends, and historical records. They generate precise portfolio optimization strategies and risk assessments, markedly improving investment decision-making. By analyzing complex, financial patterns, AI agents assist financial professionals in making more informed choices, potentially enhancing investment performance and risk management.

AI financial analysis system flowchart

In the corporate realm, there is already plenty of enthusiasm surrounding agentic AI like this. A Capgemini survey of 1,100 executives at large enterprises suggests 10% already use AI agents, while more than half plan to use them in the next year. 82% plan to integrate them within the next three years.

AI agents as collaborative partners

AI agents represent the cutting edge of GenAI, pushing beyond simple query-response models into realms where they function as collaborative entities. By integrating reasoning, acting, feedback, and memory, these agents are not just tools but are evolving into collaborative partners in various sectors with the ability to interact with each other to solve problems, as seen in advancements like LangGraph and LlamaIndex.

The stage is set for a future where they work alongside humans, enhancing our capabilities, as demonstrated by projects like SWE-agent, which tries to help software developers automatically fix development issues using GPT-4, or another LLM of the user's choice.

From healthcare applications like Med-Palm 2 to financial analysis, this emergence of specialized AI agents such as the AI-Scientist marks a new frontier in scientific research. These agents autonomously conduct research, from hypothesis formulation to data analysis, potentially leading to breakthroughs by reducing human error and accelerating discovery cycles.

As we move forward, the integration of AI agents into more aspects of life and work promises not only increased efficiencies but also the potential for ground breaking innovations across industries.

And taking things even further, California-based start-up, Altera, shares a vision of a future in which AI agents will one day become an integral part of human civilization, collaborating with each other and the rest of us. The company is running Project Sid, a set of simulations on a Minecraft server populated entirely by 1,000+ autonomous AI agents. The agents are given free rein to build Minecraft worlds together, with virtual societies complete with their own governmental institutions, economy, culture and religion.

This blog was originally published on the IBM Community.

Subscribe to our blog for updates

Get expert blog content delivered straight to your inbox