- Explore MCP Servers
- BitNet-VSCode-Extension
Bitnet Vscode Extension
What is Bitnet Vscode Extension
BitNet-VSCode-Extension is an extension for Visual Studio Code that allows users to run BitNet via Model Context Provider endpoints, facilitating the management and interaction with BitNet server instances.
Use cases
Use cases include developing applications that leverage AI models, testing server performance, managing multiple BitNet instances, and integrating AI capabilities into development workflows.
How to use
To use BitNet-VSCode-Extension, install the extension and Docker. Open the command prompt in VSCode with ‘CTRL + Shift + P’, type ‘BitNet’ to see commands, then select ‘BitNet: Initialize MCP Server’ to download the Docker image. After that, start the server with ‘BitNet: Start Server’ and configure GitHub Copilot to connect to the server.
Key features
Key features include pulling Docker images from Docker Hub, creating Docker containers for BitNet, enabling parallel communication with multiple BitNet servers, integration with GitHub Copilot, running performance benchmarks, and checking server statuses.
Where to use
BitNet-VSCode-Extension can be used in software development environments, particularly for projects that require AI model interactions, REST API management, and performance testing.
Overview
What is Bitnet Vscode Extension
BitNet-VSCode-Extension is an extension for Visual Studio Code that allows users to run BitNet via Model Context Provider endpoints, facilitating the management and interaction with BitNet server instances.
Use cases
Use cases include developing applications that leverage AI models, testing server performance, managing multiple BitNet instances, and integrating AI capabilities into development workflows.
How to use
To use BitNet-VSCode-Extension, install the extension and Docker. Open the command prompt in VSCode with ‘CTRL + Shift + P’, type ‘BitNet’ to see commands, then select ‘BitNet: Initialize MCP Server’ to download the Docker image. After that, start the server with ‘BitNet: Start Server’ and configure GitHub Copilot to connect to the server.
Key features
Key features include pulling Docker images from Docker Hub, creating Docker containers for BitNet, enabling parallel communication with multiple BitNet servers, integration with GitHub Copilot, running performance benchmarks, and checking server statuses.
Where to use
BitNet-VSCode-Extension can be used in software development environments, particularly for projects that require AI model interactions, REST API management, and performance testing.
Content
BitNet VSCode Extension
This VSCode extension fully automates the deployment and management of the FastAPI-BitNet backend. It runs the powerful REST API inside a Docker container and exposes its capabilities directly to GitHub Copilot Chat as a set of tools.
With this extension, you can initialize, manage, and communicate with multiple BitNet model instances (both llama-server
and persistent llama-cli
sessions) directly from your editor’s chat panel.
Core Features
This extension empowers GitHub Copilot with a rich set of tools to manage the entire lifecycle of BitNet instances.
-
Automated Backend Management:
- Automatically pulls the required Docker image from Docker Hub.
- Starts the Docker container with the FastAPI-BitNet service.
- Gracefully shuts down the container when you close VSCode.
-
Seamless Copilot Integration:
- Exposes all API endpoints as tools for GitHub Copilot via the Model Context Protocol (MCP).
-
Resource & Server Management (
llama-server
):- Estimate Capacity: Check how many servers can be run based on available RAM and CPU.
- Initialize/Shutdown: Start and stop single or multiple
llama-server
instances in batches. - Check Status: Query the status, PID, and configuration of any running server.
-
Persistent CLI Session Management (
llama-cli
):- Start/Stop Sessions: Create and terminate persistent, conversational
llama-cli
processes, each identified by a unique alias. - Batch Operations: Manage multiple CLI sessions with a single command.
- Check Status: Get the status and configuration of any active CLI session.
- Start/Stop Sessions: Create and terminate persistent, conversational
-
Advanced Interaction:
- Chat: Send prompts to specific
llama-server
instances. - Multi-Chat: Broadcast a prompt to multiple servers concurrently and get all responses.
- Conversational Chat: Maintain an ongoing conversation with a persistent
llama-cli
session.
- Chat: Send prompts to specific
-
Model Utilities:
- Benchmark: Programmatically run
llama-bench
to evaluate model performance. - Perplexity: Calculate model perplexity for a given text using
llama-perplexity
. - Model Info: Get the file sizes of all available BitNet models.
- Benchmark: Programmatically run
Requirements
- Docker Desktop: You must have Docker installed and running on your computer.
- Disk Space: At least 8GB of free disk space is required for the uncompressed Docker image.
- RAM: Each BitNet instance consumes ~1.5GB of RAM. While this is manageable for a few instances, it can add up quickly. Ensure your system has enough memory for the number of instances you plan to run.
Setup Instructions
- Install this extension from the VSCode Marketplace.
- Ensure Docker Desktop is running.
- Open the VSCode Command Palette using
Ctrl+Shift+P
. - Type
BitNet: Initialize MCP Server
and press Enter. This will download the large Docker image and may take some time. - Once initialization is complete, run
BitNet: Start Server
from the Command Palette. - Open the GitHub Copilot Chat panel (it must be in “Chat” view, not “Inline Chat”).
- Click the wrench icon (Configure Tools…) at the top of the chat panel.
- Scroll to the bottom and select + Add MCP Server, then choose HTTP.
- Enter the URL:
http://127.0.0.1:8080/mcp
- Copilot will now detect the available tools. You may need to click the refresh icon next to the wrench if they don’t appear immediately.
AI Chat Instructions & Examples
Once set up, you can instruct GitHub Copilot to use the new tools. You can always ask “What tools can you use?” to get a fresh list of available actions.
1. Estimate Server Capacity
Before starting, see what your system can handle.
Your Prompt: “Estimate how many BitNet servers I can run.”
2. Manage llama-server
Instances
You’re able to launch one/many llama-server processes, however cnv/instruction mode is currently unsupported.
Start a single server: “Initialize a BitNet server, with 1 thread each, and a system prompt of you’re a helpful assistant.”
Start multiple servers: “Initialize a batch of two BitNet servers. The first on port 9001 with 2 threads and the second on port 9002 with 2 threads, with system prompts of you’re a helpful assistant.”
Check status: “What is the status of the server on port 9001?”
Shut down a server: “Shut down the server on port 9002.”
3. Manage Persistent llama-cli
Sessions
You’re able to launch one/many llama-cli processes which fully support cnv/instruction mode by default!
Start a session: “Start a persistent llama-cli session with the alias ‘research-chat’ using the default model, with 1 thread each, and a system prompt of you’re a helpful assistant.”
Start many sessions: “Start 5 llama-cli sessions, with 1 thread each, and a system prompt of you’re a helpful assistant.”
Check status: “What is the status of the llama-cli session ‘research-chat’?”
Shut down a session: “Stop the llama-cli session named ‘research-chat’.”
4. Chat with Models
You can talk to servers or conversational CLI sessions.
Chat with a server: “Using the server on port 9000, tell me about the BitNet architecture.”
Multi-chat with servers: “Ask both the server on port 9000 and the server on port 9001 to explain the concept of perplexity. Compare their answers.”
Chat with a CLI session: “Using the llama-cli session ‘research-chat’, what is the capial of France?”
5. Run Utilities
Analyze model performance directly from the chat.
Run a benchmark: “Run a benchmark on the default BitNet model with 128 tokens and 4 threads.”
Calculate perplexity: “Calculate the perplexity of the text ‘The quick brown fox jumps over the lazy dog’ using the default model and a context size of 10.”