Technical Architecture Design
Complete technical architecture of Agent MCP Container — from frontend to LLM
1. Problem and Background
This technical architecture is designed to solve two core problems:
- Skill Reuse — Each SKILL.md defines not just a prompt, but a complete skill workflow (phase breakdown, input constraints, output specifications, knowledge retrieval strategy). The same skill can be invoked by different AI Agent tools and reused across different projects within the same tool. The MCP protocol shields client differences, enabling skills to be "written once, used everywhere".
- Skill Organization Assetization — As the 15 skills across BA/SA/PM Agents (with more Agents to come) progressively mature, they are no longer just "AI conversation assistants" but form a manageable knowledge asset repository. When a new project starts, you can directly load existing skill packs instead of writing prompts from scratch each time. Skill version management, hot loading, and per-Agent isolation allow this asset to continuously accumulate and evolve.
2. Overall Architecture
Agent MCP Container is an intelligent agent container system based on the MCP (Model Context Protocol). It encapsulates the expert capabilities of three roles — BA (Business Analyst), SA (Solution Architect), and PM (Project Manager) — as independent MCP Servers, exposing Tools and Resources externally via the Streamable HTTP transport protocol.
The system adopts a "Shared Module + Multi-Agent Service + Skill Definition" layered architecture. Each Agent runs as an independent process on its own port, exposed uniformly through an Nginx reverse proxy. Clients (OpenClaw / Cursor / Qoder and other AI Agent tools) connect via the MCP protocol to initiate skill invocations.
OpenClaw / Cursor / Qoder
(MCP Streamable HTTP)"] end subgraph EDGE["Access Layer"] NGINX["Nginx Reverse Proxy
mcp-en.smartmoves.com.cn"] end subgraph AGENTS["Agent Service Layer (Independent Processes)"] BA["BA Server
Business Analyst
6 Requirements Analysis Skills"] SA["SA Server
Solution Architect
4 Architecture Design Skills"] PM["PM Server
Project Manager
5 Project Management Skills"] end subgraph SHARED["Shared Module shared/"] AC["AgentCore Engine
Four-Layer Prompt Assembly
Streaming LLM Calls"] SR["SkillRegistry
Skill Registration Center
SKILL.md Parsing & Schema Generation"] SM["SessionManager
Redis Session Management
Distributed Lock / History Compression"] end subgraph INFRA["Infrastructure"] REDIS["Redis
Session Storage"] QD["Qdrant Vector Database
Knowledge Retrieval"] LLM["LLM API
(OpenAI-compatible)"] EMBED["Embedding Model
bge-small-en-v1.5"] end C --> NGINX NGINX --> BA NGINX --> SA NGINX --> PM BA & SA & PM --> AC BA & SA & PM --> SR BA & SA & PM --> SM AC --> REDIS AC --> QD AC --> LLM SM --> REDIS QD --> EMBED
3. Full Request Processing Chain
When a user triggers an Agent skill in an AI Agent tool, the complete request processing chain is as follows:
4. MCP Server Layer
Each Agent is an independent FastMCP instance, responsible for:
- Dynamic Skill Registration: Scans the skills/{agent}/ directory at startup and registers each SKILL.md as an MCP Tool via SkillRegistry
- Protocol Message Handling: Responds to MCP protocol messages such as Initialize, tools/list, tools/call
- Streaming Response: LLM output is pushed to clients in real-time via SSE (Server-Sent Events)
- Session Management: Each request is associated with a unique Conversation ID (CCID); clients pass this ID back in subsequent calls to continue the session context
- Health Checks: Each Agent exposes a /health endpoint to view service status, active session count, and loaded skill list
5. AgentCore Engine
AgentCore is the core execution engine for each Agent, employing a unified four-layer prompt assembly mechanism:
| Layer | Content | Source |
|---|---|---|
| Role Layer | Agent identity definition, behavioral constraints, responsibility boundaries | Agent definition files (ba-master/pm-master/sa-master.md) |
| Skill Layer | The current skill's complete SKILL.md: execution steps, constraint rules, phased workflow | SKILL.md parsed by SkillRegistry |
| Knowledge Layer | Related domain knowledge retrieved from the engineering knowledge base (project management methodologies/business rules/architecture patterns, etc.) | QdrantRetriever vector retrieval |
| History Layer | Conversation history summary of the current session (auto-compressed when threshold is exceeded) | SessionManager |
LLM calls use a streaming output + exponential backoff retry strategy. Streaming tokens are pushed to clients one by one via SSE.
6. Skill Registry (SkillRegistry)
SkillRegistry scans SKILL.md files under skills/{agent}/, parses YAML Frontmatter and body content, and dynamically generates MCP Tool JSON Schemas. Agents automatically load all skills at startup — no manual configuration needed.
- Parameter Mapping: Parameters defined in SKILL.md are automatically mapped to MCP Tool inputSchema
- Multi-Agent Isolation: Each Agent only loads skills from its own directory; BA skills are not visible to SA, and vice versa
- Hot-Loading Support: After modifying a SKILL.md, simply restart the corresponding Agent for changes to take effect
7. Session Management (SessionManager)
Redis-based session lifecycle management with core design principles:
- CCID Protocol: The server returns a Conversation-ID on the first response; clients pass this ID back in subsequent calls. Requests with the same CCID are processed serially within the same session. Cross-client support is a "logical continuation" rather than shared state
- Distributed Lock: Only one request is allowed to execute at a time within the same session, preventing concurrent write conflicts
- History Compression: When conversation history exceeds the threshold, the earliest batches are automatically compressed into summaries to control the context window
- TTL Management: Each session has an expiration time; invalid sessions are periodically cleaned up
8. Qdrant Knowledge Retrieval
The system features a built-in Qdrant vector database providing RAG (Retrieval-Augmented Generation) capability for each Agent:
- Embedding Model: bge-small-en-v1.5 (approximately 90MB), auto-downloaded on first startup
- Knowledge Collections: Categorized by domain, e.g. pm-pure-project-management (project management methodologies), sa-architecture-patterns (architecture patterns), ba-business-rules (business rules), etc.
- Retrieval Strategy: Targeted retrieval from the corresponding Collection based on the current skill type; supports parallel multi-collection queries with result deduplication
- Result Injection: Retrieved knowledge snippets are injected into the Knowledge layer of the four-layer prompt for LLM reference
9. Deployment Topology
| Service | Path | Protocol |
|---|---|---|
| BA Agent | /ba/mcp | Streamable HTTP |
| SA Agent | /sa/mcp | Streamable HTTP |
| PM Agent | /pm/mcp | Streamable HTTP |
| Redis | — | RESP |
| Qdrant | — | gRPC / REST |