What are AI agents really?
Let's do a deep dive into what AI agents are, and their building blocks
We as humans have always been trying to make our computers do more and more things. In the last 40 years alone as a species, we’ve gone orders of magnitudes further in field of computing than we have in our entire 10,000+ year old history. We have developed cheap storage, better and faster compute every year. We optimized our supply chains to get hardware around the globe faster. We developed cloud and made it super convenient for people to build applications and software. We developed GPUs and figured out ways to use them to perform computations orders of magnitude faster.
All of this came together to realize the dream that computer scientists have long wanted to realize ever since the conception of computers. What if computers could learn behavior without explicit programming. You know - instead of do this when x happens, what if the machine learns on its own to decide what should be done. What if it could mimic how the human brain works? That would be wonderful wouldn’t it?
We developed Machine learning algorithms and the idea of an artificial neural network before 2000s but we did not have the computation, storage and a global infrastructure to make this economically viable. Until we did.
Agents, are what we’ve always wanted our computers to be. Computers “thinking” on their own and also being autonomous, which implies that they will act on their own. The “act on their own” part is what really differentiates an agent from what we’ve been doing with AI so far. This is the main bit. An agent can act based on various inputs that it takes from its environment. It can utilize a series of tools to do complex tasks.
Let me show you some examples.
Firebase studio
This right here is Firebase studio, an excellent implementation of agentic AI. You tell the agent the kind of application you want to build, and the features that you need. The agent that goes ahead and figures out the frontend, the backend and the various building blocks required to build your application. It then starts to put together code using Large Language Models (like GPT and Gemini) and checks whether the code actually works. If the code works, it knows that the job is done and waits for you to give it more instructions.
Figma, a UI/UX prototyping tool came a couple of years ago and made it very simple to see how an application would look and feel before the development even begins. It significantly changed the industry over the years. Now, Imagine being able to prototype an entire application, in minutes, and not just its mockup. Getting from ideas to products is going to be so much faster with tools like this in the market. And this is far from being the only tool that’s changing the way, we’ve through about software development.
VS Code Agent
VS Code is one of the best and most popular free coding editors available for software engineers today. It integrates super well with another tool called Github copilot to give you a fantastic “agent mode” experience. Let’s say you want to build an application in typescript but do not know the nuances of it. Maybe you know a bit of typescript but there’s a lot of work that is usually required to set up your project and even getting the basics to work on your machine. There might be software dependencies, and other things that are required before you can see a UI render for your application.
Similar to firebase studio, VS Code agent mode will figure out the commands for you, and run those commands for you. It will observe the output of the execution of those commands and gradually make its way towards giving you a fully functional project. That is great news for developers and PMs who want to prototype things but do not possess deep expertise in the hundreds of frameworks that are often required in making an application. And even for a seasoned developer, the productivity gains are immense. Being able to prototype ideas quickly, without having to spend a ton of time purely typing does increase the speed of writing code significantly.
What can agents do?
Here are some examples that I’ve seen agents being able to do, with the right integrations
Search the internet
Send emails
Crawl & scrape the internet for information
Send messages
Write custom code to do solve basically problems of any nature
Talk to other agents
If you provide an agent with the right set of tools, it’ll be able to accomplish many things that were considered impossible a couple of years ago. For example, here’s an implementation by Manus AI (a state of the art Agentic AI solution) to understand public sentiment related to a topic by going through conversation happening on twitter, youtube and other social media platforms.
What is needed to build agents?
AI agents are systems designed to perceive their environment, make decisions, and take actions to achieve specific goals, autonomously. Some of the main building blocks of AI agents are
Perception
AI agents gather information from the environment through, Sensors in physical robots (e.g., cameras, LIDAR) or APIs or data streams in software agents (e.g., websites, databases, user interactions)
For example:
A chatbot sees user text as input.
A trading bot reads real-time market data.
A social sentiment analysis agent goes through crawls and scrapes websites
Reasoning and Decision Making
Once data is received, agents, analyze the input, maintain a state or memory of previous interactions and use rules, models, or AI techniques to decide what to do next Some of these techniques can be Rule-based logic, Machine learning models (e.g., classification, reinforcement learning), Planning algorithms (e.g., A* search, Monte Carlo Tree Search) or Large Language Models for high-level reasoning and planning. Due to the recent rapid advances in LLMs, and LLMs becoming increasingly cheaper and bigger (more context tokens), agents have been becoming better at solving complex problems.
Action
After deciding what to do, agents can perform an action via actuators (robots) or function calls (software). These actions affect the environment, completing the perception-action loop
Some examples are:
An AI assistant books a meeting, based on everyone’s availability and sends out a pre-read for the meeting’s topic
An LLM-based agent queries a database based on a user’s question and formats a report
Environment
The world the agent interacts with can be:
Physical like real-world robotics and IoT devices
Digital like web browsers, APIs, games and simulations
Abstract like mathematical problems and data science tasks
An environment allows an agent to interact and learn. It acts as a source for feedback that agent can use to experiment iteratively. It allows agents to learn from both successful and failed experiments much like a baby learns from interacting with his/her environment while growing up.
Memory (State) Management
Memory management within agents allows the agent to keep track of past interactions. This is why you’re able to have a natural conversation with the chatGPT these days. If your first question is like, “What are the available macbooks in the market”, followed up by “What is the price of the latest one?”, you know that the the agent is going to find the price for the latest macbook. Memory also allows the agent to make intermediary decisions that might be important to solve your problem better. This is very similar to how we sometimes make a to-do list before solving a complicated multi-step problem.
Frameworks & Tools
For building advanced agents, there are multiple frameworks out there already. Some of these frameworks help you build LLM agents, some are components within agents to let you perform tasks easily. Here are some of the them
LangChain / LlamaIndex
Autogen / BabyAGI
Haystack, CrewAI, MetaGPT
Langflow
LLMs/APIs (OpenAI, Claude, Mistral, local models)
I’m planning to cover these frameworks in my blog, soon. If you’re interested in free weekly articles on the latest agentic AI frameworks, tutorials and engineering trends, feel free to subscribe to compute for free by clicking the button below. With that, thank you so much for reading and I hope you have a wonderful day!