Maturity of Agentic AI – How do we calibrate the capabilities?
- Neena Sathi
- Mar 27
- 6 min read

Introduction
Generative AI is re-inventing itself every month, and yet most ideas have roots that persist over time. Ever since AI was introduced, we envisioned three capabilities that traditional programming could not achieve:
It uses knowledge and provides reasoning capabilities. Unlike a traditional coding where requirements and design were input to the coding task, AI manages to design its algorithms.
It learns from experience and can improve its performance over time. By collecting feedback, AI is capable of crowd-sourcing related ideas and decide how to use feedback to drive improvements.
It can work with incomplete information and provides its working process as it deals with incomplete and inconsistent information. You can use its level of confidence and explanations to build a trust in its working behavior.
While AI offers the same overall vision, each advancement either reduces the maintenance effort and enhances AI’s ability to self-manage and self-learn.
Agentic AI represents a significant leap forward, promising more sophisticated and autonomous systems capable of learning, adapting, and evolving. Its impact is being felt across various sectors, from healthcare to finance, and it is reshaping how businesses operate, making them more efficient and responsive. The concept of multi-agentic frameworks was first introduced in 1980s in the Distributed AI literature. AI researchers began to understand the limitations of an super-intelligent single agent and the field of distributed AI started to uncover the potential of organizational intelligence.
In this article, we will explore agentic model components and introduce you to a maturity model for calibrating their AI capabilities.
What are Agentic Models and How do they differ from LLMs
In Star Wars, the movie introduces us to a variety of robots. The two most famous ones were C-3PO and R2-D2. C-3PO knows many languages and can converse with you but has limited strategy or orchestration capability. R2-D2 on the other hand works independently and can execute a plan without much direction. Using chatGPT, our conversation with LLM is like a C-3PO conversation. It works at your command; answers any question you may have and can often hallucinate if you do not provide proper direction or knowledge. A typical assistant has a chat memory to retain session context, an LLM to provide a body of knowledge and can work with your prompts to provide a response using its chat memory and LLM.
An agentic solution is like R2-D2. Agentic models are advanced AI systems designed to function autonomously, performing tasks without constant human intervention. Unlike traditional Large Language Models (LLMs), which primarily process and generate language based on vast datasets, Agentic models have a higher degree of independence and decision-making ability. They can perceive their environment, make choices, and learn from their experiences, continually improving their performance over time. While LLMs excel in generating human-like text responses, Agentic models extend their capabilities beyond text generation, incorporating complex decision-making and adaptive behaviors.
Let’s take two examples of agents to drive how agents, like R2-D2 independently perform actions with minimal intervention.
OpenAI Operator: Sam Altman and his R&D team demonstrated how they use OpenAI Operator to make restaurant reservations on your behalf. You provide a goal and your preferences. The Operator will work with OpenTable and other websites to perform action on your behalf.
<video-plug-in: https://www.youtube.com/live/CSE77wAdDLg>
Manus AI: In this video Yichao ‘Peak’ Ji, co-founder of Manus demonstrates Manus AI performing a series of complex tasks, such as resume analysis, stock analysis and website development.
Maturity Model of Agentic Models
These models, like Starwars robots come in different shapes and sizes. The Manus AI demonstration refers to the GAIA benchmark[1] and its performance against the benchmark. GAIA benchmark was created to measure the performance of LLM in reasoning tasks. While it provides a reasonable measure of general AI for a public LLM, the agentic architectures are expected to address business problem solving in a collaborative setting, where the focus is not on providing a good answer, but in representing the best interest of the user organizations.
In 1980s, we did a fair amount of work in developing distributed AI and the effort was in understanding the diversity of goals in a multi-agentic world. In the 1986 Workshop on Distributed AI, we proposed a measurement of multi-agentic performance using a series of dimensions that captured multi-dimensional nature of organizational problem solving.[2]

Let’s leverage the classic distributed AI research work to create a maturity model that addresses organizational complexity.
Level 1: Single Agent with simple flows
These agents respond to stimuli but do not retain information or learn from their experiences. They are simple and limited in scope. We can create an agent that uses a series of steps to perform technical and fundamental research on stocks and generate relative rating on stocks.
A typical architecture for reactive agent will include a prompt template, chat memory, LLM, chat input and chat output. Typically, a prompt engineer provides a series of steps and asks the reactive agent to perform agent using the series of steps and provide results back to the user. These agents also do not have any tools or ability to decide how to switch workflow based on input. The technical and fundamental analysis is very limited, and it relies on directions provided in the prompt.

Level 2: Multi-Agents with sequencing
What if you have a couple of agents, each capable of doing a task Maybe we have one agent that performs fundamental analysis, a second one that performs technical analysis and a third agent that can create a report using the output of the first two agents. Here we are creating an organization of simple agents, where the work is divided across the agents based on their skills and fused together by a supervisor. This is akin to a simple factory model where each worker operates on the output of the previous worker.
Typically, level 2 agents have more than one agent, each with its prompt template, and LLM. They share a common chat memory and work under user direction to perform their assigned tasks.

Level 3: Single or Multi-Agents with Tools
Most of the hard work in programming is where we build system interfaces between two systems. A supervisory function may call a number of other functions. It supplies a set of arguments to the called function and parses its responses. What if a set of agents can do the same capabilities using agentic workflows. So, going back to the investment example, I now have an agent that can work with Yahoo Finance as a tool to collect stock performance data, another tool for conducting technical analysis. Another agent can do some fundamental analysis using company P&L and Balance Sheet and a third agent can create report and explain the results.

Compared to level 2, this architecture is now a lot more flexible and can be used to effectively introduce new charting techniques or other agents – say for new analysis, peer comparisons, etc.
Level 4: Multi-Agentic Solutions with Orchestration
The orchestrator agent can create its own orchestration. It directs other agents to do specific tasks. They can analyze their actions and refine their plans accordingly. As compared to our level 3 agents, these agents are better at providing dynamic instructions to other agents and tools.

For example, an orchstrator can build a portfolio of stocks and direct other agents to conduct technical and fundamental analysis to select stocks for the portfolio.
Level 5: Multi-Agents with Goals and Constraints
Agentic collaboration and negotiation is an advanced capability. The agents have certain authority to make decisions that impact other agents based on hard and soft constraints. In a typical constraint-directed negotiation situation, the agents guard their goals and constraints but make concessions to improve collaborative performance.
A level 5 multi-agent system provides investment advisor a comprehensive advice using complex factors and can maximize many objectives in a complex portfolio.

What is the maturity of current agentic offerings?
Assistants using a single LLM, for example, classic ChatGPT or copilot, is a level 1 agent. Using Langchain constructs, a number of providers are offering level 2 and level 3 capabilities. While it is hard to analyze close architectures, from observed behavior, we are yet to see fully developed level 4 and 5 agents. Often uncontrolled infinite loops can be detrimental to your token limits and consumption in a level 4 architecture. In order to fully function as autonomous agent in a complex organizational situation, it is important to have level 5 agent capability.
So, what can you do to use agentic architecture. The current state of art can be used for developing and deploying level 2 and level 3 agentic models. They offer significant value and can be built with the tools and technologies available today.
How Can You Learn More?
There are several courses available on Agentic models. One example is “Generative AI Mastery: Build Intelligent Orchestrators”, offered by the Applied AI Institute—a premium institution specializing in AI and Generative AI education. This course is an intensive, four-week program designed for AI professionals eager to design, build, and deploy sophisticated AI agentic solutions that revolutionize business processes.
[1] GAIA: A Benchmark for General AI Assistants Grégoire Mialon1, Clémentine Fourrier , Craig Swift , Thomas Wolf , Yann LeCun , Thomas Scialom, https://arxiv.org/pdf/2311.12983
[2] 1986 Workshop onAIDistributedN. S. Sridharan, AI Magazine Volume 8 Number 3 (1987) (© AAAI), https://onlinelibrary.wiley.com/doi/epdf/10.1609/aimag.v8i3.603
Comments