Implementing A2A at Enterprise Scale: What the Spec Doesn't Tell You
Shell's first Agent-to-Agent implementation on Azure AI Foundry — the protocol gaps, identity challenges, and what actually breaks when autonomous agents try to talk to each other.
Google published the Agent2Agent (A2A) protocol spec. It’s clean. It makes sense on paper. Then you try to implement it inside a large enterprise and realise the spec was written by people who’ve never had to explain to a security team why an AI agent needs its own identity.
This is a build log from Shell’s first A2A implementation on Azure AI Foundry. I’m not going to talk about NDA details — I’m going to talk about the gaps nobody writes about.
What A2A actually is
The short version: A2A is a protocol that lets autonomous agents communicate with each other without a human in the loop. Agent A can delegate a task to Agent B, Agent B can respond with results, and neither needs a human to broker the handshake.
In theory, this is the foundation for real multi-agent pipelines. In practice, the moment you try to implement this at enterprise scale, you hit three problems the spec glosses over.
Problem 1: Whose identity does the agent use?
This is the one that stopped us cold for two weeks.
When an agent calls an internal API — rotates a secret, triggers a deployment, reads a log file — the enterprise security model asks: who is doing this? The answer in most OAuth/OIDC implementations is “a service principal.” But a service principal is static. It doesn’t capture which agent, on whose behalf, for what task.
A2A assumes you’ve solved identity. It doesn’t tell you how.
What we ended up building was a layered approach using Microsoft Entra ID workload identities — one identity per agent role, scoped tightly to what that agent actually needs. The orchestrator holds a higher-privilege identity; sub-agents hold minimal-scope identities. When an agent delegates a task via A2A, the receiving agent uses its own credential, not the caller’s.
This sounds obvious in retrospect. But most A2A examples online show everything running under a single service account, which in a real enterprise would get your implementation flagged in the first security review.
Problem 2: The A2A handshake fails silently in enterprise network topologies
A2A uses HTTP-based agent cards and task endpoints. In a cloud-native startup, this works fine. In an enterprise with private endpoints, VNet integration, and network policies, agents can’t discover each other’s endpoints without explicit configuration.
We had to build a lightweight agent registry — essentially a lookup table mapping agent names to their internal endpoints, stored in Azure Key Vault and refreshed on a TTL. Not elegant. Not in the spec. But necessary.
Problem 3: What happens when a sub-agent fails mid-task?
The spec defines task states (submitted, working, completed, failed). What it doesn’t define is what the orchestrator should do when a sub-agent goes silent mid-task. Not failed — silent. No response, no timeout, no error.
We handle this with a dead-letter pattern: tasks that haven’t received a status update within N seconds get re-queued with exponential backoff. After three retries, the orchestrator marks the task failed and surfaces it for human review.
This is standard distributed systems thinking. But if you’re coming from an LLM-first background and have never built a message queue system, it’s not obvious.
What works well
Despite the gaps, A2A is genuinely useful. The protocol is simple enough to implement in a weekend. The agent card format is a clean abstraction. And once you’ve solved identity and discovery, the orchestration layer becomes surprisingly readable — you’re writing task descriptions, not RPC calls.
The real value is that A2A forces you to think about agents as services rather than prompts. Each agent has a contract. It takes inputs. It produces outputs. It can fail. That framing makes the whole system more maintainable.
Where this goes next
The thing I keep coming back to is that A2A is a protocol for communication, not trust. It tells agents how to talk to each other. It doesn’t tell them whether they should.
Non-human identity, attestation, and fine-grained authorisation for autonomous agents is the next unsolved problem. I’m writing a separate post on that.
If you’re working on A2A implementations or enterprise agent identity, I’d genuinely like to compare notes — reach out on LinkedIn.