AI群体如何重新定义软件工程 How Swarms of AI Agents Are Redefining Software Engineering —— LangChain

RR1335

106人浏览 · 2026-05-08 05:00:00

RR1335 · 2026-05-08 05:00:00 发布

Agentic Engineering: How Swarms of AI Agents Are Redefining Software Engineering

代理工程：AI群体如何重新定义软件工程

https://www.langchain.com/blog/agentic-engineering-redefining-software-engineering

Key Takeaways

What is agentic engineering? Agentic engineering is a multi-agent coordination model where AI agents act as digital team members — each with defined roles, shared memory, and a common observability layer — to move software through the full delivery pipeline, not just generate code faster.

What results can multi-agent systems produce in software delivery? In a pilot of 20+ debugging workflows, coordinated agent execution produced a 93% reduction in time-to-root-cause compared to historical baselines, with over 200 engineering hours saved across 512 sessions in a single month. Development workflows showed a 65% reduction in execution time, with the biggest gains coming from compressing downstream testing — not code generation.

How does agentic engineering differ from AI coding agents like Codex or Claude? AI coding agents excel at translating intent into code within a single user-driven session. Agentic engineering operates at a higher level of abstraction: it's a control plane that orchestrates cross-team workflows, maintains long-term memory across agents, and manages state and traceability across the full software delivery lifecycle. The two aren't competing — coding agents like Codex can run inside the worker agents as reasoning and code-generation engines.

关键要点
什么是代理工程？代理工程是一种多智能体协作模型，AI代理作为数字化团队成员——每个代理都有明确的角色、共享记忆和共同的观测层——将软件贯穿整个交付流程，而不仅仅是更快地生成代码。

多智能体系统在软件交付中能产生什么结果？在20多个调试工作流程的试点中，协调代理执行将根本原因定位时间减少了93%，仅一个月内就在512次会话中节省了超过200个工程小时。开发工作流程的执行时间减少了65%，最大的收益来自压缩下游测试——而非代码生成。

代理工程与Codex或Claude等AI编码代理有何不同？AI编码代理擅长在单次用户驱动的会话中将意图转化为代码。代理工程则在更高层次的抽象上运作：它是一个控制平面，协调跨团队工作流程，维护代理间的长期记忆，并管理整个软件交付生命周期中的状态和可追溯性。二者并非竞争关系——像Codex这样的编码代理可以作为推理和代码生成引擎运行于工作代理内部。

Software development has entered a new phase—one where intelligent agents operate not as isolated tools, but as coordinated entities mirroring real-world teams. As AI adoption accelerates, the focus has shifted from what is possible to what works in practice. Every software engineering stage—requirements, design, development, security, testing, deployment, and operations is amenable to partial automation at the very least, and may even support full end-to-end orchestration when agents collaborate cross-functionally. This goal then shifts from “How do we write code faster?” to “How do we move software through the system faster and safely?” Through experimentation with multiple agentic frameworks, we have identified practical patterns that deliver real, measurable impact.

This blog describes an agentic engineering system designed to transition from task level execution to system-level collaboration. We propose a reference architecture and a pilot evaluation of a multi-agent coordinated framework implemented using LangChain’s suite of tools—including LangSmith and LangGraph. What this system is not is a “better coding AI,” or a “better task assistant”. This architecture is designed to function as a control plane for multi-agent coordination, focusing on end-to-end software delivery.

软件开发已进入新阶段——智能体不再作为孤立工具运行，而是像现实团队那样协同合作。随着AI应用加速，焦点已从"技术可能性"转向"实践可行性"。软件工程的每个环节——需求、设计、开发、安全、测试、部署和运维——至少可实现部分自动化，当智能体跨职能协作时，甚至能支持完整的端到端编排。目标也随之从"如何更快写代码"转变为"如何更安全高效地推进软件交付"。通过多智能体框架实验，我们已识别出能产生可量化效果的实际模式。

本文描述了一个旨在从任务级执行转向系统级协作的智能体工程系统。我们提出参考架构，并对采用LangChain工具套件（包括LangSmith和LangGraph）实现的多智能体协调框架进行试点评估。该系统并非"更好的编码AI"或"更优任务助手"，而是专为多智能体协调设计的控制平面架构，聚焦端到端软件交付。

Agentic Engineering to Mirror Real-world Engineering

代理工程以映射现实世界工程

Our core insight is simple:

“The biggest step change doesn’t come from better tools alone. It comes from systems that mirror real-world teams.”

At the core of the agentic engineering is a collaborative system of intelligent agents designed to mirror how engineering teams plan, execute, and deliver software. Rather than treating AI as a collection of isolated assistants, this framework models agents as team members— each with defined responsibilities, shared context, and accountability— coordinated through a lightweight but powerful leadership layer.

This system provides a native control plane for multi-agent coordination with the ability to:

Execute long-lived workflows
Retain agent memory that is shareable between teams
Chain different types of workflows together that can move across team boundaries
Facilitate knowledge sharing to onboard new team members to agentic workflows
Global observability into workflows executed agentically for traceability and auditability

智能体工程的核心是一个协作式智能体系统，其设计灵感源于工程团队规划、执行和交付软件的方式。该框架摒弃将AI视为孤立助手集合的传统思路，而是将智能体建模为团队成员——每个角色都具备明确职责、共享上下文和问责机制——通过轻量级但强大的领导层进行协调。

该系统为多智能体协作提供了原生控制平面，具备以下能力：

执行长期运行的工作流保留可在团队间共享的智能体记忆将不同类型的工作流串联起来，实现跨团队边界流转促进知识共享，帮助新成员快速融入智能体工作流提供全局可观测性，实现对智能体工作流的追踪与审计

Architecture

At a high level, the system is a loosely coupled system of agents that can either operate as independent entities or as an entity within a swarm of agents. Our system consists of the two complementary roles that can be adapted to scale:

Worker Agents – These agents function as digital counterparts to individual contributors on an engineering team. They operate autonomously within well-defined boundaries, planning and executing tasks based on engineering intent—such as development, testing, debugging, or operational. Depending on team maturity and complexity, a deployment may involve a single worker agent or a dynamically coordinated swarm of worker agents.

A Worker Agent is capable of:

Interpreting user intent and translating it into an executable plan using a reasoning model.
Gathering the required context from systems of record such as source repositories, issue trackers, and internal knowledge bases such as logs.
Executing workflows through tools, coding agents, or custom/sub agents.
Validating outcomes to ensure correctness and completeness.
Reporting plans, actions, and results to a leadership layer to ensure transparency, accountability, and traceability.

Worker Agents are intentionally loosely coupled, enabling them to scale horizontally, adapt to new workflows, and—when necessary—delegate tasks to other agents in the swarm.

Leader Agent – These agents act as the digital analogue of a project leader. They coordinate, govern, and provide shared capabilities and visibility across a swarm of agents. The Leader Agent provide:

A shared prompt and workflow library that standardizes best practices and dramatically lowers onboarding friction.
A common tool gateway that exposes approved capabilities to worker agents in a consistent and secure manner.
Long-term memory for the swarm, enabling learning and continuous improvement over time.
Global observability into agent activity, decisions, and outcomes, providing insight into system behavior and performance.
Orchestration to determine when and how agents act, not just what they produce.
By separating execution from coordination, the framework preserves autonomy at the edges while maintaining coherence at scale.

The diagram below shows a reference architecture for the agentic engineering system. All our worker agents communicate via the A2A protocol. However, the work agents may also interact with agents that do not support A2A via an MCP wrapper. Engineers interacting with the system express an intent through their preferred interface—such as an IDE or CLIs, or an external trigger via a GitHub or Jira action. In this system, workflows are customizable to meet the productivity needs of the teams.

After evaluating multiple agentic frameworks, we selected LangChain’s framework for this study based on how they map to production requirements for agentic engineering. It is an execution model for stateful, collaborative, and governable agent systems, which makes it suitable for orchestrating AI systems that mirror real-world engineering teams. We use LangMem abstractions to store long-term state, and use LangSmith to log execution traces, enabling end-to-end traceability, telemetry, and a system-wide view of agentic workflows and outcomes.

Macro Architectural View

Below is a reference diagram for how these agentic systems can span cross team boundaries. Agent leaders can collaborate with leaders of other teams. For instance, a product requirement coming from the product management team can be routed by the engineering team leader to the right worker agent (swarm) for planning and extracting requirements.

Reference Technical Implementation with LangChain

This implementation incorporates and thus evaluates three core abstractions provided by the suite of LangChain’s framework – LangGraph (for controllable agent orchestration), LangSmith (for agent observability & evals), and LangMem (a library that helps agents learn and improve via long-term memory). LangGraph’s core abstraction—a graph of stateful nodes—enables the construction of custom workflows based on a plan generated by the agent. The evaluation focused on the following technical characteristics for transitioning agentic engineering from experimental environments to stable, production-ready operating models.:

State management and checkpointing capabilities that persist across steps, agents, and retries.
Provision for audit trails to track who decided what, when, and why, supporting post-hoc analysis and continuous improvement.
Interface compatibility with external systems of record and MCP-style tool gateways.

Deterministic execution model that ensures that agents are performing authorized actions to reduce operational risk.
Interoperability across different agentic communication protocols and with agents built using other frameworks.

Using LangGraph assisted agentic execution

We explored several scenarios that involved agent to agent communication such as debugging technical issues with worker agents spanning across different teams and leveraging AI coding agents like codex or claude to collaborate with the worker agent for development. We detail an example of the latter scenario in the diagram below. The diagram depicts the interaction between the AI coding agent and the worker agent where the autonomous logic resides. The autonomous logic within the worker agent follows a logical four-stage progression applicable to most agentic workflows. This use case demonstrates how worker agents can be used to retrieve context that extend beyond context from source code, to notify other agents, and to trace agentic activity.

Intent Analysis: Upon entering an engineering intent in the IDE as natural language, the request is sent to a worker agent. The agent’s workflow, in this case, is orchestrated using LangGraph to analyze the intent and retrieve relevant context via MCP tools.
Planning and Notification: Once the context is established, the agent generates a structured, multi-step plan (Step 1 through Step N in the diagram). The plan is notified to the engineers via a communication channel (e.g., Slack, Teams, or Webex).
Execution and Tracking: The plan is then executed one step at a time in collaboration with the AI coding agent in the IDE. The agent leverages LangGraph’s checkpointing and state tracking mechanism to track execution state.
Validation & Closure: In the final step, once execution is complete, the worker agent closes the loop by validating that the plan executed matches the state of execution checkpointed in memory. The results are communicated to engineers as notification in their communication channel and saved in LangMem as long term state.

Given how the AI coding agent did not support native a2a capabilities, we built an MCP adapter tool that routed requests from AI coding agent to the worker agent. This approach thus makes the system IDE-agnostic.

Findings & Observations from Pilot Study

To evaluate the practical impact of agentic engineering, we applied this framework to real-world development, testing, and debugging workflows. Rather than optimizing individual tasks, we measured improvements in throughput without loss of quality when agents collaborated, selecting workflows that required coordination between at least two agents. To curate our baseline for our development and debug workflows, we conducted a bootcamp where our engineering teams huddled together to curate a list of use cases and computed the time it took to complete these workflows if they were to execute them without agents, based on historical evidence. We report numbers conservatively, in reality the gains maybe more.

We evaluated several debugging workflows that involve cross-team triage and root-cause analysis, with independent quality assessment by our QE team. Using time-to-root-cause as the primary metric, a pilot of 20+ workflows showed an overall 93% reduction relative to historical debug times. Several cross-team investigations completed in under five minutes of coordinated agent execution, with no measurable loss of quality as confirmed by an independent QE assessment. From a total of 512 debug sessions generated by 70 unique users in a span of a month, we computed over 200 man hours saved by leveraging our cross collaborative agentic workflows.

For development-focused workflows, the setup paired an IDE-based AI coding agent with our worker agent. Though this is not required, a key advantage of this was the system’s ability to retrieve project-specific context from our backend services, enabling more informed code generation and test plan generation. We also tested by shifting the planning responsibilities to the worker agent while maintaining long-term state in LangMem, allowing prior workflows to be indexed and reused. This significantly reduced onboarding overhead for repeat tasks.

Across 15+ development workflows, we observed over 65% reduction in execution time compared to historical baselines even with the worker agent in the equation. Importantly, the primary gains were not limited to faster code generation—which AI coding agents already perform well—but from compressing downstream workflows for functional testing after PR merge through coordinated agent execution. PR review process itself became the bottleneck introduced by human-in-the-loop.

How This System Differs from AI Coding Agents

There are several new capabilities offered by AI coding agents like Codex and Claude that augments software development. However, these agents operate at a fundamentally different level of abstraction than the agentic engineering system described here.

Codex-class models are often embedded within Worker Agents or as a component in the workflow as reasoning or code-generation engines.
While AI coding agents excel at translating intent into code, refactoring, explaining or debugging code in the context of a repository, they operate within a bounded user-driven interaction loop and is limited in its ability to orchestrate cross-team workflows. In contrast, this agentic engineering system is explicitly designed to behave like a loosely coupled engineering team that moves across developer and team boundaries.
AI coding agents and its subagents can perform parallel functions extremely well. The system introduced in this blog post is an explicit control plane for orchestrating end-to-end agentic engineering to move software quickly in the software engineering pipeline, for which we leveraged LangChain’s framework.

Conclusion

Agentic engineering represents a fundamental shift in how software is built by reorganizing work around AI systems that behave like real engineering teams and by leveraging what they can do well. Collectively, our study suggest that the primary impact of agentic engineering is not incremental task acceleration, but a structural shift in how software moves through the organization—compressing coordination overhead, reducing cross-team latency, sharing context, and redefining where human attention is most valuable. Frameworks like LangGraph make this operating model practical by treating collaboration, memory, and observability as first-class concerns. The benefit of the agentic engineering framework is the noticeable ease of ramp up into the software delivery pipeline with minimal setup required by engineers. Once the agents are configured, multiple teams can leverage the worker agent to fetch context from tools, both internal and external. The result is not faster code generation, but a more resilient, scalable, and fundamentally different way of delivering software.

---