Agent Training Mcp

@FwithBon a year ago

2 MIT

FreeCommunity

AI Systems

# MCP Learning: AI Agent Training Project Based on YOLO

What is Agent Training Mcp

agent_training_mcp is an AI agent training project based on the YOLO architecture, utilizing a client-server model to enable natural language control for training YOLO models.

Use cases

Use cases include training YOLO models to recognize specific objects (e.g., cats and dogs) through natural language commands, remote monitoring of training processes, and enabling multiple users to access the training server simultaneously.

How to use

To use agent_training_mcp, start by running the server on a machine with GPU using ‘python server.py’. Then, configure the client by modifying the server URL in ‘main.py’ and setting the OpenRouter API key. Finally, run the client with ‘python main.py’ to interact with the training process using natural language commands.

Key features

Key features include a client-server architecture for distributed training control, natural language instruction parsing via OpenRouter API, remote training control from lightweight devices, flexible deployment across different machines, and real-time feedback on training status and results.

Where to use

agent_training_mcp can be used in various fields such as computer vision, machine learning research, and AI development, particularly for applications requiring object detection and classification.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Agent Training Mcp

agent_training_mcp is an AI agent training project based on the YOLO architecture, utilizing a client-server model to enable natural language control for training YOLO models.

Use cases

How to use

Key features

Where to use

agent_training_mcp can be used in various fields such as computer vision, machine learning research, and AI development, particularly for applications requiring object detection and classification.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

YOLO MCP 训练助手

这是一个基于MCP架构的YOLO训练系统，通过客户端-服务器模式实现了自然语言控制YOLO模型训练的功能。本项目采用消息通信协议(MCP)将原本单机的YOLO训练系统拆分为服务器端和客户端两个独立组件，使用户可以从任何设备远程控制训练过程。客户端通过OpenRouter API解析自然语言指令，将其转换为结构化请求发送给服务器，服务器执行实际训练任务并返回结果。

项目特点

MCP客户端-服务器架构：将用户交互与计算密集型任务分离，实现分布式训练控制
自然语言指令解析：集成OpenRouter API，支持中文自然语言控制训练流程
远程训练控制：可以在笔记本等轻量级设备上控制服务器的训练任务
灵活部署：客户端和服务器可以部署在不同机器上，通过网络通信
实时结果反馈：完整的训练状态和结果返回机制

项目背景与动机

本项目是对原单机版YOLO Agent训练助手的架构升级，解决了以下问题：

资源分离：将高性能计算任务与用户交互界面分离，优化资源使用
远程控制：支持从任何设备远程启动和监控训练过程
多用户支持：为未来支持多用户同时访问训练服务器奠定基础
持续服务：服务端可以持续运行，客户端可以随时连接或断开

项目结构

agent_training_mcp/
├── server.py        # MCP服务器，接收训练请求并执行训练
├── train.py         # 训练模块，负责调用Ultralytics YOLO进行训练
├── main.py          # MCP客户端，处理用户交互、指令解析和请求发送
├── requirements.txt # 项目依赖列表
└── README.md        # 项目说明文档

组件说明

server.py：
- 使用Flask框架创建HTTP服务器
- 提供/train接口接收训练请求
- 调用train.py中的训练函数执行实际训练
- 将训练结果以JSON格式返回给客户端
train.py：
- 封装YOLOv8训练功能为可调用函数
- 处理各种训练参数（模型类型、轮数、数据集等）
- 提供标准化的返回格式，包含训练状态和结果信息
main.py：
- 提供用户交互界面（命令行）
- 通过OpenRouter API解析自然语言指令
- 将解析结果转换为HTTP请求发送给服务器
- 接收并显示服务器返回的训练结果

环境与依赖

服务器端依赖

Python 3.6+
ultralytics：YOLOv8模型训练库
flask：HTTP服务器框架

客户端依赖

Python 3.6+
openai：用于调用OpenRouter API
requests：用于发送HTTP请求

安装依赖

# 服务器端
pip install flask ultralytics

# 客户端
pip install requests openai

使用方法

步骤1：启动服务器

在训练服务器（通常是具有GPU的机器）上运行：

python server.py

服务器将在0.0.0.0:5000端口监听训练请求。

步骤2：配置客户端

在客户端（可以是任何设备）上，修改main.py中的服务器URL：

# 将此行
url = 'http://localhost:5000/train'
# 修改为服务器的IP地址
url = 'http://服务器IP:5000/train'

同时确保设置了OpenRouter API密钥：

# 方法1：设置环境变量
export OPENROUTER_API_KEY=您的API密钥

# 方法2：直接在代码中设置
API_KEY = "您的API密钥"

步骤3：运行客户端

在客户端机器上执行：

python main.py

客户端将显示菜单，您可以选择：

使用自然语言指令训练模型（例如：“训练一个yolov8n模型识别猫狗，跑3轮”）
手动指定训练参数
退出程序

实际测试效果

本项目已在以下环境成功测试：

服务器：Windows PC（192.168.1.6）运行server.py和train.py
客户端：笔记本电脑运行main.py，通过WiFi连接到服务器
笔记本成功解析自然语言指令，发送到PC，PC执行训练并返回结果

技术实现详解

MCP架构实现

项目采用了简单而有效的MCP（消息通信协议）实现方式：

通信协议：使用HTTP协议和JSON格式作为基础通信机制
API端点：服务器定义/train端点接收POST请求
消息格式：
- 请求体：{"model_type": "yolov8n", "epochs": 1, "data": "coco128.yaml"}
- 响应体：{"status": "success/error", "message": "描述", "details": "详情"}

自然语言解析

客户端使用OpenRouter API进行自然语言解析：

定义function calling格式，指定训练函数和参数结构
将用户输入发送给LLM，请求其解析为函数调用
解析返回的函数调用参数，转换为HTTP请求参数

错误处理机制

系统实现了多层次的错误处理：

客户端解析错误：当LLM无法解析指令时，使用默认参数
网络通信错误：添加重试机制和超时处理
服务器执行错误：捕获训练过程异常并返回详细错误信息
日志记录：使用logging模块跟踪系统运行状态和错误

与单机版YOLO Agent的区别

特性	单机版YOLO Agent	MCP架构YOLO Agent
架构	单体应用，直接函数调用	客户端-服务器分离，通过HTTP通信
部署	单一设备	可分布在多个设备上
资源使用	用户交互和训练在同一设备	将计算密集型任务放在专用服务器
远程控制	不支持	支持从任何设备远程控制
可扩展性	有限	高（支持多客户端、负载均衡等）
状态保持	必须保持应用运行	服务器可持续运行，客户端可随时连接

未来计划

实时进度反馈：实现训练过程中的实时进度推送（WebSocket/SSE）
多任务队列：支持多个训练任务排队和并发执行
用户认证：添加认证机制，确保只有授权用户可以使用服务
Web界面：开发基于浏览器的用户界面，替代命令行客户端
分布式训练：支持跨多台服务器的分布式训练任务

贡献与许可证

贡献：欢迎通过Issues和Pull Requests参与项目改进
许可证：本项目采用MIT许可证

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers