Swarmui Mcp Agent

1 MIT

FreeCommunity

AI Systems

A simple prototype SwarmUI MCP server with accompanying local LangChain Ollama agent

What is Swarmui Mcp Agent

SwarmUI_MCP_Agent is a simple prototype MCP server designed to facilitate text-to-image generation using a local LangChain Ollama agent. It primarily exposes the /GenerateText2Image endpoint.

Use cases

Use cases include generating artwork from descriptive text, creating visual content for marketing, and assisting in design processes by converting ideas into images.

How to use

To use SwarmUI_MCP_Agent, start the server by running the server.js file. You can then make POST requests to the /generate-image endpoint with your image prompt to generate images.

Key features

Key features include the ability to generate images from text prompts, integration with LangChain for enhanced functionality, and a simple Express server setup for handling requests.

Where to use

SwarmUI_MCP_Agent can be used in fields such as artificial intelligence, creative design, and any application requiring image generation from textual descriptions.

Overview

What is Swarmui Mcp Agent

SwarmUI_MCP_Agent is a simple prototype MCP server designed to facilitate text-to-image generation using a local LangChain Ollama agent. It primarily exposes the /GenerateText2Image endpoint.

Use cases

Use cases include generating artwork from descriptive text, creating visual content for marketing, and assisting in design processes by converting ideas into images.

How to use

To use SwarmUI_MCP_Agent, start the server by running the server.js file. You can then make POST requests to the /generate-image endpoint with your image prompt to generate images.

Key features

Key features include the ability to generate images from text prompts, integration with LangChain for enhanced functionality, and a simple Express server setup for handling requests.

Where to use

SwarmUI_MCP_Agent can be used in fields such as artificial intelligence, creative design, and any application requiring image generation from textual descriptions.

Content

SwarmUI_MCP_Agent

A simple prototype SwarmUI MCP server with accompanying local LangChain Ollama agent. The prototype only exposes the
SwarmUI /GenerateText2Image endpoint. It uses Python, Node.js, Typescript, Langchain, @ModelContextProtocol, Axios, Express,
and Pydantic (though Pydantic is mostly unused in this implementation).

SwarmUI MCP Server

The SwarmUI MCP Server prototype is declared in server.ts.

The /GenerateText2Image and /GetNewSession SwarmUI API endpoints are exposed via generateImage() and getSessionId() methods, which
are described in the (MCP) Server capabilities config when instantiating the Server. The Server
capabilities config is used by the agent to know about the available tools’ input and output schemas and
descriptions. The text descriptions for these tools are used by the llm to understand the tool. When
the server.js Node server is started, the SwarmUIServer is initialized and an Express server is created
with a POST endpoint at ‘/generate-image’ that calls SwarmUIServer.generateImage, then the Express server
is started. An Axios instance is created to make HTTP requests to the local SwarmUI API and stored in the SwarmUIServer class.

DEV NOTE: You will likely need to change the port used by SwarmUI as well as the model hardcoded for image generation.

GenerateImage StructuredTool

A generate_image StructuredTool is declared in tools.py that POSTS to the /generate-image endpoint of the Express server
started by the SwarmUIServer. The description of the StructuredTool and its docstring are used by the llm
to know about the tool. Theoretically, Pydantic models should assert tool input, but the current setup doesn’t work for multiple
inputs yet.

LangChain Ollama Agent

A local Ollama agent (currently a dolphin-mistral model) is instantiated with tool-calling instructions, a max_iterations of 1,
and AgentType of ZERO_SHOT_REACT_DESCRIPTION, with the generate_tool StructuredTool. User input is taken for the image
prompt and then the agent is invoked with the input. The agent will use the generate_image Action with the supplied prompt
as Action Input, calling the local SwarmUI API /GenerateText2Image endpoint with the prompt and other hardcoded image generation
settings.

Starting the SwarmUI MCP Server

From the /SwarmUI_MCP_Agent/src directory, run: tsc server.ts && node server.js

Running the Agent

With the SwarmUIMCPServer running and SwarmUI itself running, from the /SwarmUI_MCP_Agent directory, run: python run.py

Viewing Generated Images

Generated images are available within the SwarmUI Image History or directly within the SwarmUI /Ouput directory

Quickstart

1. Clone Repository

git clone
cd

2. Install Dependencies (Script)

Run setup.sh

2B. Install Dependencies (Manual)

npm install
pip install -r requirements.txt

3. Start the API server

tsc server.ts && node server.js

5. Start the UI

streamlit run run.py

6. Ensure SwarmUI is running and accessible.

7. Enter your SwarmUI Port number in the Agent Settings > SwarmUI API Port

8. Select LLM Model and (optionally) Vision Model

9. Enter image request and click submit. Wait for models to download, if not downloaded.

Tools

No tools

Swarmui Mcp Agent

What is Swarmui Mcp Agent

Use cases

How to use

Key features

Where to use

Overview

What is Swarmui Mcp Agent

Use cases

How to use

Key features

Where to use

Content

SwarmUI_MCP_Agent

SwarmUI MCP Server

GenerateImage StructuredTool

LangChain Ollama Agent

Starting the SwarmUI MCP Server

Running the Agent

Viewing Generated Images

Quickstart

Tools

Comments