From ROS to LangGraph: What Drone Autonomy Taught Me About AI Agents

I spent the early part of my career building autonomous drones. ROS, Gazebo, PX4, perception pipelines, state machines. Then I moved to enterprise AI and started building LLM agent pipelines.

About six months in, I noticed something: I’d solved most of these problems before.

The framing is completely different. The vocabulary is different. But the underlying engineering challenges — state management, failure handling, sensor fusion, decision loops — are remarkably similar. Here’s the thread.

The perception-action loop never changes

In robotics, the core loop is: sense → process → act → sense again. A drone takes a reading from its IMU and barometer, fuses those signals, decides on a motor command, executes it, and immediately starts sensing again to correct for what changed.

In an LLM agent pipeline, the loop is: observe → reason → act → observe again. The agent reads from its context (memory, tool results, instructions), reasons over that input, calls a tool or produces output, and loops back with the updated context.

Strip the jargon and it’s the same architecture. ReAct — the reasoning pattern behind most modern agent frameworks — is just the sense-process-act loop with language models as the processor.

State machines are state machines

Autonomous drones are almost always built around a hierarchical state machine. At the top level: IDLE → TAKEOFF → MISSION → LANDING → IDLE. Within MISSION, sub-states for WAYPOINT_NAV, OBSTACLE_AVOIDANCE, RETURN_TO_HOME. Each state has entry conditions, exit conditions, and transition guards.

When I first started building LangGraph pipelines, the graph structure felt familiar. It was. It’s the same hierarchical FSM — the states are nodes, the transition guards are LLM decisions or conditional edges.

The difference is that in robotics, the state machine is explicit. You draw it before you write a line of code. In LLM agent systems, most teams build it implicitly, as tangled chains of prompt templates, and then act surprised when it’s hard to debug.

LangGraph makes the structure explicit again. That’s most of its value.

Failure handling is the real work

In drone autonomy, “happy path” development takes about 20% of the effort. The other 80% is: what happens when GPS signal drops? What if a motor stutters? What if obstacle avoidance gives a false positive at 20 metres?

You build fallback states. You define safe behaviours for every failure mode you can think of, then test in simulation until you find the ones you didn’t think of.

LLM agent pipelines have the same problem and most teams are still in the happy path phase. What happens when a tool call times out? What if the model hallucinates a function call that doesn’t exist? What if the context window fills up mid-task?

These aren’t hypothetical. They’re production failure modes. The teams that handle them well are the ones that came in with a systems engineering mindset — define your failure states before you deploy, not after your first incident.

The sensor fusion problem maps to context management

Drones don’t have one source of truth. They have GPS (noisy at low altitude), barometric pressure (drifts with weather), inertial measurement (accumulates error over time), computer vision (fails in low light). Sensor fusion combines unreliable signals into a reliable state estimate.

LLM agents have the same problem. Memory retrieval, tool outputs, conversation history, system instructions — each source is partially unreliable. A retrieved memory might be stale. A tool output might be malformed. The conversation history might contain contradictions.

Context management in agent systems is sensor fusion with text. RAG, memory hierarchies, context summarisation — these are all sensor fusion techniques by another name.

What robotics gets right that LLM agent teams often miss

Simulation before deployment. In robotics, you don’t put untested code on a flying vehicle. You simulate. Hundreds of hours. You inject failure modes deliberately and watch what happens. Most LLM agent teams test by sending live traffic at the system and hoping.

Explicit system architecture docs. Every ROS system has a node graph. Every node has published topics and subscribed topics. You can look at the architecture diagram and understand data flow immediately. LLM agent systems are often undocumented. Who calls what. What the inputs and outputs are. Most teams don’t have this written down.

The sim-to-real gap is real. In robotics, you eventually have to test on real hardware. The equivalent in agent systems is testing against real production APIs, real user inputs, and real data at scale. Many teams skip this because it’s expensive — then they’re surprised when eval performance doesn’t match production.

The thread

UAVs, autonomous systems, AI agents. I used to think these were separate career paths. They’re the same path with different actuators.

If you’re coming from a robotics background and feeling like you’re starting over in AI — you’re not. Your intuitions about state, failure, and feedback loops are directly applicable. The vocabulary will catch up.

Always happy to talk autonomous systems — aerial or otherwise. Find me on LinkedIn or GitHub.