Article
From Brain to Action: How AI Agents Think, Talk, and Work Together
AI agents are systems that mimic human cognition by combining perception, reasoning, memory, and action in a continuous loop. Unlike traditional chatbots, they can take real-world actions through tool calling, where structured functions (APIs, databases, workflows) are invoked to produce grounded results. Together, MCP (tools), A2A (agents), and ADK (framework) form the foundation of scalable, production-grade multi-agent AI systems.
From Brain to Action: How AI Agents Think, Talk, and Work Together
A walkthrough of AI agents, tool-calling, MCP, A2A, and ADK the full stack behind intelligent, multi-agent systems.
Table of Contents
- How AI agents mimic human cognition
- Tool calling (giving agents hands)
- MCP (the USB-C for AI tools)
- A2A (agents talking to agents)
- ADK (building it all together)
- The big picture: where these pieces fit
A few years ago, "AI" meant a chatbot that answered questions from a fixed knowledge base. Today, it means systems that can browse the web, run code, query databases, file tickets, and coordinate with other AI agents all autonomously. The leap is enormous. But how does it actually work?
I recently went through a presentation that laid out the entire stack clearly, from the cognitive underpinnings of agents all the way to Google's new protocols for multi-agent coordination. This is my attempt to translate that into a narrative you can actually understand and use.
1. How AI Agents Mimic Human Cognition
Before any protocol or framework, you have to ask: what even is an AI agent? The simplest answer is that an agent is a system that perceives its environment, reasons about it, and takes action. That's not far from how psychologists describe human cognition.
Human cognition involves six intertwined processes: attention, language, learning, memory, perception, and thought. An AI agent maps onto these in surprisingly direct ways:
| Cognitive Process | Agent Equivalent |
|---|---|
| Perception | Multimodal input (text, audio, video, images fed through preprocessing) |
| Thought & Reasoning | The LLM itself (reasoning, planning, deciding which tool to call next) |
| Memory | Databases (vector stores, SQL, NoSQL) |
| Learning | Fine tuning on domain-specific data; accumulating context via a knowledge base |
| Attention | Context window management and relevance ranking |
| Language | The LLM's core capability — understanding and generating natural language |
The "action" part is what separates an agent from a plain language model. A model generates text. An agent uses that text to do something like call an API, write a file, trigger a workflow. Cognition without action is just thinking out loud.
"An AI agent is not just a smarter chatbot. It's a cognitive loop — perceive, reason, act, observe — running continuously until a goal is reached."
The Evolution of Agent Types
Agents didn't arrive fully formed. They evolved from simple reflex systems toward collaborative multi-agent architectures:
Reflex → Memory → Reasoning & Tool Use → Multi-Agent
(Rules) (State) (ReAct + Tools) (A2A Comms)
- Simple Reflex Agents: pattern match: "if X then Y." No memory, no planning.
- Model-Based Agents: maintain an internal world model and track state over time.
- ReAct Tool Using Agents: the real breakthrough. They interleave chain-of-thought reasoning with tool calls in a tight loop.
- Multi-Agent Systems: teams of specialized agents coordinating via protocols like A2A. Powerful, but complex.
All of these sit on top of a shared foundation: the MCP & A2A stack context, tools, and agent-to-agent communication.
2. Tool Calling (Giving Agents Hands)
An LLM by itself is like a brilliant person locked in a room with no phone, no computer, and no way to check facts or take actions. Tool calling changes that. It gives the agent a set of verified, structured capabilities it can invoke on demand.
The mechanism is elegant. A tool is just a function:
def add(a: int, b: int) -> int:
"""Adds two integers together"""
return a + b
But it's wrapped in a rich JSON schema that tells the LLM exactly what it does, what arguments it takes, and what it returns:
{
"type": "function",
"function": {
"name": "add",
"description": "Adds two integers together",
"strict": true,
"parameters": {
"type": "object",
"required": ["a", "b"],
"properties": {
"a": { "type": "integer", "description": "The first integer to add" },
"b": { "type": "integer", "description": "The second integer to add" }
},
"additionalProperties": false
}
}
}
The LLM never runs the code itself. It outputs a structured request, and a separate execution layer handles the actual call.
The Tool Calling Loop
- You provide tool definitions as part of the LLM's system context.
- The user sends a message. The LLM reads it alongside the tool list and decides which tool(s) to call and with what arguments.
- The tool execution layer runs the function and returns the result.
- The LLM receives the result and uses it to generate a final, grounded response.
This is how an agent goes from "I think the answer is X" to "I queried your database and the answer is definitively X." The tool is the bridge between language and reality.
A Concrete Example
Imagine an agent with access to get_customer_info and get_company_info tools. When a user says: "Tell me about the customer fdouetteau and their company", the LLM outputs:
{
"role": "assistant",
"tool_calls": [{
"type": "function",
"function": {
"name": "get_customer_info",
"arguments": "{\"customer_id\": \"fdouetteau\"}"
}
}]
}
The tool runs, returns customer data, and the LLM uses that grounded information to respond. No hallucination. No guesswork.
3. MCP (The USB-C Moment for AI Tools)
Once you have multiple agents and multiple tools, you face a proliferation problem. Every new tool needs a custom integration with every agent that wants to use it. That's N × M integrations — a maintenance nightmare.
The Model Context Protocol (MCP) is Anthropic's answer to this. Think of it as USB-C for AI: a single, standardized interface that any agent (client) can use to talk to any tool or data source (server), regardless of who built them.
What MCP Standardizes
- Communication protocol: JSON-RPC 2.0 over stdio, HTTP, or WebSocket
- Capability discovery:
tools/listan agent asks "what can you do?" - Tool execution:
tools/callan agent says "do this, with these args" - Resource access:
resources/listandresources/readfor reading data - Reusable prompts:
prompts/listandprompts/getfor prompt templates
The Standard MCP Loop
1. Discover (tools/list request to MCP server)
2. Select (LLM chooses the right tool from the response)
3. Invoke (tools/call with structured arguments)
4. Respond (MCP server returns structured output)
The MCP Request/Response Flow (Step by Step)
Step 1: List tool request:
{ "jsonrpc": "2.0", "id": 1, "method": "tools/list" }
Step 2: List tool response: The server returns all available tool schemas with their names, descriptions, and parameter definitions.
Step 3: User message to LLM:
User: "Create high priority ticket: Fix NULL product rates"
Step 4: LLM tool selection response:
{
"role": "assistant",
"tool_calls": [{
"name": "create_ticket",
"arguments": { "title": "Fix NULL product rates", "priority": "high" }
}]
}
Step 5: JSON-RPC request to MCP server:
{
"jsonrpc": "2.0", "id": 2, "method": "tools/call",
"params": { "name": "create_ticket", "arguments": { "title": "Fix NULL product rates", "priority": "high" } }
}
Step 6: JSON-RPC response from MCP server:
{
"jsonrpc": "2.0", "id": 2,
"result": {
"content": [{ "type": "text", "text": "Ticket #456 created successfully: Fix NULL product rates (Priority: high)" }]
}
}
The real power of MCP is decoupling. You can swap out, upgrade, or extend your tool infrastructure without touching your agent code. The agent just asks "what's available?" every time. The ecosystem becomes plug and play.
4. A2A(When Agents Need to Talk to Each Other)
MCP solves agent-to-tool communication. But what about agent-to-agent? What happens when your system is too complex for a single agent when you need a team of specialists rather than one generalist?
Single agents have real ceilings:
- Context window limits
- Tool overload
- Poor specialization
- Hard to scale
- Difficult to debug and control
- No parallel execution
The answer is decomposition — break the problem into specialized agents and let them collaborate. But collaboration requires a protocol.
Google's Agent-to-Agent (A2A) protocol is exactly that, an open standard that turns agents into interoperable services that can discover each other, delegate tasks, and return structured results.
The AgentCard (Every Agent's Business Card)
Discovery in A2A works via an AgentCard a JSON document served at /.well-known/agent.json. It declares the agent's name, description, skills, supported protocols, auth requirements, and endpoints:
{
"name": "ClickUp Ticketing Agent",
"description": "Creates/manages ClickUp tickets",
"version": "1.0.0",
"skills": [{
"id": "manage_tickets",
"name": "Ticket Management",
"description": "CRUD operations for ClickUp tickets"
}],
"capabilities": { "streaming": false },
"auth": [{ "type": "api_key", "in": "header" }]
}
Any other agent or orchestrator can fetch this card and immediately know how to work with that agent no custom integration needed.
The Standard A2A Loop
- Discover: Fetch the target agent's AgentCard. Parse its capabilities and endpoint.
- Initiate: Call
tasks/executewith a structured task payload (task ID, messages, context). - Process: The target agent runs, using its own LLM and tools (via MCP) internally. No client-visible protocol.
- Monitor: Either poll with
tasks/getor subscribe to a real-time stream via SSE. - Complete: Receive the final result as structured artifacts text, data, files, or typed output parts.
Tasks can also be cancelled mid-execution with tasks/cancel. The entire lifecycle is explicit and observable.
Core A2A Endpoints
| Endpoint | Purpose |
|---|---|
GET /a2a/:agentId/.well-known/agent-card.json | Retrieve agent metadata, skills, protocols |
POST /a2a/:agentId/v1/message:send | Send a message and wait for complete response |
POST /a2a/:agentId/v1/message:stream | Send a message and stream responses via SSE |
GET /a2a/:agentId/v1/tasks/:taskId | Poll status and results of a task |
POST /a2a/:agentId/v1/tasks/:taskId:subscribe | Subscribe to updates for an existing task |
POST /a2a/:agentId/v1/tasks/:taskId:cancel | Cancel an active task |
"MCP connects agents to tools. A2A connects agents to agents. Together, they form the connective tissue of a real multi-agent ecosystem."
5. ADK (Building It All Together)
Knowing the protocols is one thing. Building a production system with them is another. Google's Agent Development Kit (ADK) is the framework that handles the scaffolding — letting you focus on agent logic rather than infrastructure plumbing.
ADK provides modular building blocks organized as a clean four-layer stack:
┌──────────────────────────────────────────┐
│ 4. A2AStarletteApplication (HTTP) │ /.well-known/agent.json + /a2a
├──────────────────────────────────────────┤
│ 3. A2aAgentExecutor (Protocol Glue) │ tasks/execute, getTask, SSE
├──────────────────────────────────────────┤
│ 2. Runner (Execution Engine) │ LLM calls, tool execution, state
├──────────────────────────────────────────┤
│ 1. Agent (Intelligence Core) │ Model + Tools + Instruction
└──────────────────────────────────────────┘
Layer Breakdown
Agent Layer: The intelligence core. Defines the model, tools, and instruction prompt. Single source of truth for what the agent can do.
root_agent = Agent(
name="orchestrator",
model=LiteLlm(model=os.getenv("LLM_PROVIDER", "openai/gpt-4o")),
instruction=system_prompt,
tools=[sql_agent, ticketing_agent],
)
Runner Layer: Orchestrates agent execution lifecycle. Manages short/long-term memory and handles multi turn conversations.
runner = Runner(
app_name="sql_agent_runner",
agent=root_agent,
artifact_service=InMemoryArtifactService(),
memory_service=InMemoryMemoryService(),
session_service=InMemorySessionService()
)
Executor Layer: Protocol glue. Translates A2A JSON-RPC to runner calls. Handles tasks/execute --> runner.run_async(), status polling, and event queuing for streaming updates.
Server Layer: Exposes standard A2A endpoints. Supports both polling and SSE. The AgentCard advertises capabilities to the outside world.
Three Types of ADK Agents
ADK provides three base types to cover different use cases:
LLM Agents: The "brains." They leverage large language models to understand natural language, reason through problems, and decide the next course of action.
Workflow Agents: The "managers." They orchestrate how tasks get done but don't do the work themselves. Three patterns:
-
SequentialAgent: An assembly line. Runs sub-agents one after another in order. Output of one feeds into the next. Perfect for multi step pipelines:
fetch data --> clean data --> analyze data --> summarize findings. -
ParallelAgent: A team working simultaneously. Runs all sub-agents concurrently. Ideal for independent tasks like calling three different APIs to gather information at the same time.
-
LoopAgent: A while loop. Repeatedly executes sub agents until a condition is met or max iterations reached. Useful for polling an API for a status update or retrying an operation until it succeeds.
Custom Agents: The "specialists." When you need full control or logic that doesn't fit the above, you inherit from BaseAgent and write your own Python.
ADK vs MCP vs A2A (How They Relate)
| Layer | Role |
|---|---|
| ADK | Build & run agents |
| MCP | Connect agents to tools |
| A2A | Connect agents to agents |
These aren't competing technologies they're complementary layers. ADK is the construction framework. MCP and A2A are the communication standards it implements.
ADK 2.0
ADK 2.0 introduces more powerful orchestration capabilities:
- Graph-based workflows: build deterministic agent workflows with explicit control over task routing and execution order.
- Collaborative agents: coordinator agents directing multiple specialized subagents working in parallel.
- Dynamic workflows: code-based branching logic for complex decision trees and iterative loops that go beyond simple sequential or parallel patterns.
The Big Picture: Where These Pieces Fit
Here's the mental model to take away:
| Layer | Technology | What it does |
|---|---|---|
| Build & run | ADK | Framework for defining agents, tools, memory, and execution |
| Agent <--> Tool | MCP | Standardized interface between agents and external tools/data |
| Agent <--> Agent | A2A | Standardized protocol for inter-agent task delegation and results |
A real workflow might look like this: a user asks a question --> a coordinator agent (built with ADK) uses A2A to discover and delegate to a specialist data-agent --> that agent uses MCP to query a live database --> returns structured results back up the chain --> the user sees a clean, grounded answer.
The whole machine is invisible. The output just feels like magic.
Sources: Dataiku LLM Agentic Tools · Google A2A Blog · GeeksforGeeks MCP · ADK Cloud Blog
Want more from the DeepFrog research team?
Browse more publications or speak directly with our researchers.