Technical Architecture Design

Complete technical architecture of Agent MCP Container — from frontend to LLM

1. Problem and Background

This technical architecture is designed to solve two core problems:

Skill Reuse — Each SKILL.md defines not just a prompt, but a complete skill workflow (phase breakdown, input constraints, output specifications, knowledge retrieval strategy). The same skill can be invoked by different AI Agent tools and reused across different projects within the same tool. The MCP protocol shields client differences, enabling skills to be "written once, used everywhere".
Skill Organization Assetization — As the 15 skills across BA/SA/PM Agents (with more Agents to come) progressively mature, they are no longer just "AI conversation assistants" but form a manageable knowledge asset repository. When a new project starts, you can directly load existing skill packs instead of writing prompts from scratch each time. Skill version management, hot loading, and per-Agent isolation allow this asset to continuously accumulate and evolve.

2. Overall Architecture

Agent MCP Container is an intelligent agent container system based on the MCP (Model Context Protocol). It encapsulates the expert capabilities of three roles — BA (Business Analyst), SA (Solution Architect), and PM (Project Manager) — as independent MCP Servers, exposing Tools and Resources externally via the Streamable HTTP transport protocol.

The system adopts a "Shared Module + Multi-Agent Service + Skill Definition" layered architecture. Each Agent runs as an independent process on its own port, exposed uniformly through an Nginx reverse proxy. Clients (OpenClaw / Cursor / Qoder and other AI Agent tools) connect via the MCP protocol to initiate skill invocations.

graph TB subgraph FRONT["Clients"] C["AI Agent Tools
OpenClaw / Cursor / Qoder
(MCP Streamable HTTP)"] end subgraph EDGE["Access Layer"] NGINX["Nginx Reverse Proxy
mcp-en.smartmoves.com.cn"] end subgraph AGENTS["Agent Service Layer (Independent Processes)"] BA["BA Server
Business Analyst
6 Requirements Analysis Skills"] SA["SA Server
Solution Architect
4 Architecture Design Skills"] PM["PM Server
Project Manager
5 Project Management Skills"] end subgraph SHARED["Shared Module shared/"] AC["AgentCore Engine
Four-Layer Prompt Assembly
Streaming LLM Calls"] SR["SkillRegistry
Skill Registration Center
SKILL.md Parsing & Schema Generation"] SM["SessionManager
Redis Session Management
Distributed Lock / History Compression"] end subgraph INFRA["Infrastructure"] REDIS["Redis
Session Storage"] QD["Qdrant Vector Database
Knowledge Retrieval"] LLM["LLM API
(OpenAI-compatible)"] EMBED["Embedding Model
bge-small-en-v1.5"] end C --> NGINX NGINX --> BA NGINX --> SA NGINX --> PM BA & SA & PM --> AC BA & SA & PM --> SR BA & SA & PM --> SM AC --> REDIS AC --> QD AC --> LLM SM --> REDIS QD --> EMBED

3. Full Request Processing Chain

When a user triggers an Agent skill in an AI Agent tool, the complete request processing chain is as follows:

sequenceDiagram participant C as Client participant N as Nginx participant S as MCP Server participant SM as SessionManager participant SR as SkillRegistry participant Q as QdrantRetriever participant AC as AgentCore participant LLM as LLM API C->>N: POST /{agent}/mcp N->>S: Forward by port S->>SM: Get/Create Redis session SM-->>S: Return session context + CCID S->>SR: Look up skill definition SR-->>S: Return Skill object (param Schema) S->>AC: Execute skill (message + session) AC->>SM: Get conversation history AC->>Q: Retrieve knowledge by skill Q-->>AC: Return vector retrieval results AC->>AC: Four-layer prompt assembly AC->>LLM: Streaming HTTP request LLM-->>AC: SSE streaming tokens AC-->>S: Real-time streaming output S-->>C: Streaming response + [DOC] marker AC->>SM: Update session history

4. MCP Server Layer

Each Agent is an independent FastMCP instance, responsible for:

Dynamic Skill Registration: Scans the skills/{agent}/ directory at startup and registers each SKILL.md as an MCP Tool via SkillRegistry
Protocol Message Handling: Responds to MCP protocol messages such as Initialize, tools/list, tools/call
Streaming Response: LLM output is pushed to clients in real-time via SSE (Server-Sent Events)
Session Management: Each request is associated with a unique Conversation ID (CCID); clients pass this ID back in subsequent calls to continue the session context
Health Checks: Each Agent exposes a /health endpoint to view service status, active session count, and loaded skill list

5. AgentCore Engine

AgentCore is the core execution engine for each Agent, employing a unified four-layer prompt assembly mechanism:

Layer	Content	Source
Role Layer	Agent identity definition, behavioral constraints, responsibility boundaries	Agent definition files (ba-master/pm-master/sa-master.md)
Skill Layer	The current skill's complete SKILL.md: execution steps, constraint rules, phased workflow	SKILL.md parsed by SkillRegistry
Knowledge Layer	Related domain knowledge retrieved from the engineering knowledge base (project management methodologies/business rules/architecture patterns, etc.)	QdrantRetriever vector retrieval
History Layer	Conversation history summary of the current session (auto-compressed when threshold is exceeded)	SessionManager

LLM calls use a streaming output + exponential backoff retry strategy. Streaming tokens are pushed to clients one by one via SSE.

6. Skill Registry (SkillRegistry)

SkillRegistry scans SKILL.md files under skills/{agent}/, parses YAML Frontmatter and body content, and dynamically generates MCP Tool JSON Schemas. Agents automatically load all skills at startup — no manual configuration needed.

Parameter Mapping: Parameters defined in SKILL.md are automatically mapped to MCP Tool inputSchema
Multi-Agent Isolation: Each Agent only loads skills from its own directory; BA skills are not visible to SA, and vice versa
Hot-Loading Support: After modifying a SKILL.md, simply restart the corresponding Agent for changes to take effect

7. Session Management (SessionManager)

Redis-based session lifecycle management with core design principles:

CCID Protocol: The server returns a Conversation-ID on the first response; clients pass this ID back in subsequent calls. Requests with the same CCID are processed serially within the same session. Cross-client support is a "logical continuation" rather than shared state
Distributed Lock: Only one request is allowed to execute at a time within the same session, preventing concurrent write conflicts
History Compression: When conversation history exceeds the threshold, the earliest batches are automatically compressed into summaries to control the context window
TTL Management: Each session has an expiration time; invalid sessions are periodically cleaned up

8. Qdrant Knowledge Retrieval

The system features a built-in Qdrant vector database providing RAG (Retrieval-Augmented Generation) capability for each Agent:

Embedding Model: bge-small-en-v1.5 (approximately 90MB), auto-downloaded on first startup
Knowledge Collections: Categorized by domain, e.g. pm-pure-project-management (project management methodologies), sa-architecture-patterns (architecture patterns), ba-business-rules (business rules), etc.
Retrieval Strategy: Targeted retrieval from the corresponding Collection based on the current skill type; supports parallel multi-collection queries with result deduplication
Result Injection: Retrieved knowledge snippets are injected into the Knowledge layer of the four-layer prompt for LLM reference

9. Deployment Topology

Service	Path	Protocol
BA Agent	/ba/mcp	Streamable HTTP
SA Agent	/sa/mcp	Streamable HTTP
PM Agent	/pm/mcp	Streamable HTTP
Redis	—	RESP
Qdrant	—	gRPC / REST