- Explore MCP Servers
- mcp-slurm
Mcp Slurm
What is Mcp Slurm
mcp-slurm is a Model Context Protocol (MCP) server designed to manage SLURM (Simple Linux Utility for Resource Management) clusters. It enables AI assistants to interact with High-Performance Computing (HPC) clusters via SSH for job submission, resource checking, queue management, and job status monitoring.
Use cases
Use cases for mcp-slurm include submitting computational jobs for simulations in physics, managing data processing tasks in bioinformatics, running machine learning models on HPC clusters, and monitoring job statuses in large-scale data analysis projects.
How to use
To use mcp-slurm, first clone the repository and install the necessary dependencies. Configure the connection details in a .env
file, including your cluster’s login node and authentication method. Start the server using the command ‘npm start’, and it will run on port 1337 by default.
Key features
Key features of mcp-slurm include cluster information querying, job submission with customizable parameters, job management capabilities (cancel, hold, suspend, etc.), script upload for direct execution, file operations for managing job outputs, and secure SSH connectivity.
Where to use
mcp-slurm is primarily used in fields that require high-performance computing, such as scientific research, data analysis, machine learning, and simulations, where managing computing resources efficiently is crucial.
Overview
What is Mcp Slurm
mcp-slurm is a Model Context Protocol (MCP) server designed to manage SLURM (Simple Linux Utility for Resource Management) clusters. It enables AI assistants to interact with High-Performance Computing (HPC) clusters via SSH for job submission, resource checking, queue management, and job status monitoring.
Use cases
Use cases for mcp-slurm include submitting computational jobs for simulations in physics, managing data processing tasks in bioinformatics, running machine learning models on HPC clusters, and monitoring job statuses in large-scale data analysis projects.
How to use
To use mcp-slurm, first clone the repository and install the necessary dependencies. Configure the connection details in a .env
file, including your cluster’s login node and authentication method. Start the server using the command ‘npm start’, and it will run on port 1337 by default.
Key features
Key features of mcp-slurm include cluster information querying, job submission with customizable parameters, job management capabilities (cancel, hold, suspend, etc.), script upload for direct execution, file operations for managing job outputs, and secure SSH connectivity.
Where to use
mcp-slurm is primarily used in fields that require high-performance computing, such as scientific research, data analysis, machine learning, and simulations, where managing computing resources efficiently is crucial.
Content
SLURM MCP Server
A Model Context Protocol (MCP) server for managing SLURM (Simple Linux Utility for Resource Management) clusters. This server allows AI assistants to interact with HPC clusters via SSH to submit jobs, check resources, manage queues, and monitor job status.
Features
- Cluster Information: Query node status, partitions, and resource availability
- Job Submission: Submit jobs with customizable parameters including resource requests
- Job Management: Cancel, hold, release, suspend, resume, and modify running jobs
- Script Upload: Upload and execute job scripts directly to the cluster
- File Operations: View job outputs, list directories, and manage files
- SSH Connectivity: Secure connection to login nodes with password or key authentication
Quick Start
1. Installation
# Clone the repository
git clone <your-repo>
cd mcp-slurm
# Install dependencies
npm install
# Build the project
npm run build
2. Configuration
Create a .env
file in the project root with your cluster connection details:
# Required: Cluster connection details
SLURM_HOST=your-cluster-login-node.example.com
SLURM_USERNAME=your-username
# Authentication (choose one)
SLURM_PASSWORD=your-password
# OR
SLURM_SSH_KEY_PATH=/path/to/your/private/key
# Optional: Connection settings
SLURM_PORT=22
# Optional: Default SLURM parameters
SLURM_DEFAULT_PARTITION=compute
SLURM_DEFAULT_ACCOUNT=your-account
3. Running the Server
# Start the server
npm start
# Or run in development mode
npm run watch
The server will start on port 1337 by default.
Tools Available
1. slurm_info
Get cluster information including nodes, partitions, queues, and job accounting.
Parameters:
command_type
: Type of command (sinfo
,squeue
,sacct
,scontrol
)detailed
: Get detailed output (optional)partition
: Query specific partition (optional)node
: Query specific node (optional)
Examples:
- Check node status:
{command_type: "sinfo", detailed: true}
- View job queue:
{command_type: "squeue"}
- Check specific partition:
{command_type: "sinfo", partition: "gpu"}
2. slurm_submit
Submit jobs to the SLURM scheduler with customizable parameters.
Parameters:
job_name
: Name for the jobcommand
: Command or script to executepartition
: Partition to submit to (optional)nodes
: Number of nodes (optional)cpus_per_task
: CPUs per task (optional)memory
: Memory per node (optional)time_limit
: Time limit (optional)account
: Account to charge (optional)- And many more…
Example:
{
"job_name": "my_simulation",
"command": "python simulate.py",
"nodes": 2,
"cpus_per_task": 16,
"memory": "64G",
"time_limit": "2:00:00",
"partition": "compute"
}
3. slurm_job_control
Control SLURM jobs: cancel, hold, release, suspend, resume, requeue, or modify.
Parameters:
job_id
: Job ID to controlaction
: Action to perform (cancel
,hold
,release
, etc.)reason
: Reason for action (optional)modify_parameter
: Parameter to modify (for modify action)modify_value
: New value (for modify action)
Examples:
- Cancel job:
{job_id: "12345", action: "cancel", reason: "User request"}
- Hold job:
{job_id: "12345", action: "hold"}
- Modify time limit:
{job_id: "12345", action: "modify", modify_parameter: "TimeLimit", modify_value: "4:00:00"}
4. slurm_script
Upload job scripts and submit them to SLURM.
Parameters:
script_name
: Name for the script filescript_content
: Content of the job scriptremote_path
: Directory to store script (optional)submit_immediately
: Whether to submit after upload (default: true)additional_sbatch_args
: Extra sbatch arguments (optional)
Example:
{
"script_name": "job.slurm",
"script_content": "#!/bin/bash\n#SBATCH --job-name=test\n#SBATCH --time=1:00:00\n\necho 'Hello from SLURM!'",
"submit_immediately": true
}
5. slurm_files
Manage files on the cluster including viewing job outputs.
Parameters:
action
: Action to perform (list
,view
,tail
,head
,delete
,find_outputs
)path
: File or directory path (optional)job_id
: Job ID to find outputs for (optional)lines
: Number of lines to show (optional)pattern
: Search pattern (optional)
Examples:
- List home directory:
{action: "list"}
- View job output:
{action: "view", path: "slurm-12345.out"}
- Find job outputs:
{action: "find_outputs", job_id: "12345"}
- Tail log file:
{action: "tail", path: "job.log", lines: 100}
Claude Desktop Integration
Local Development
Add this to your Claude Desktop configuration:
Windows: %APPDATA%/Claude/claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"mcp-slurm": {
"command": "node",
"args": [
"C:/Users/tejasv/Documents/mcp-slurm/dist/index.js"
],
"env": {
"SLURM_HOST": "your-cluster.example.com",
"SLURM_USERNAME": "your-username",
"SLURM_PASSWORD": "your-password"
}
}
}
}
After Publishing to npm
{
"mcpServers": {
"mcp-slurm": {
"command": "npx",
"args": [
"mcp-slurm"
],
"env": {
"SLURM_HOST": "your-cluster.example.com",
"SLURM_USERNAME": "your-username",
"SLURM_PASSWORD": "your-password"
}
}
}
}
Security Considerations
- Store sensitive credentials securely (use SSH keys when possible)
- Limit the MCP server’s access to specific user accounts
- Consider using dedicated service accounts for automated operations
- Review and audit job submissions regularly
- Use network restrictions to limit access to trusted hosts
Common Use Cases
Checking Cluster Status
Ask Claude: “What’s the current status of the cluster?” or “Show me available nodes in the GPU partition”
Submitting Jobs
Ask Claude: “Submit a job named ‘data_analysis’ that runs ‘python analyze.py’ using 4 CPUs and 16GB memory for 2 hours”
Monitoring Jobs
Ask Claude: “Show me all my running jobs” or “What’s the status of job 12345?”
Managing Outputs
Ask Claude: “Show me the output of job 12345” or “Find all output files for my recent jobs”
Script Management
Ask Claude: “Upload and submit this job script: [paste script content]”
Troubleshooting
Connection Issues
- Verify SSH connectivity:
ssh username@hostname
- Check firewall rules and network access
- Ensure SSH key permissions are correct (600)
Authentication Errors
- Verify username and password/key path
- Check if two-factor authentication is required
- Ensure the user has SLURM access
Job Submission Failures
- Check if default partition/account are set correctly
- Verify resource requests are within limits
- Check SLURM configuration and policies
Development
To add new SLURM functionality:
- Create a new tool in
src/tools/
- Extend the
SlurmSSHClient
if needed - Build and test:
npm run build && npm start
The framework automatically discovers and loads tools from the src/tools/
directory.
License
This project is licensed under the MIT License.