Llmops Dashboard

2 MIT

FreeCommunity

AI Systems

#agents#dashboard#grafana#langchain#llm#llmops#monitoring#observability#openai#pipelines#python#tracing#fastapi#llama3#ollama#open-source#prometheus#prompt-logging#mcp#metrics

LLMOps Dashboard is a lightweight observability and control plane for LLM-powered apps — featuring real-time metrics, model policy enforcement, and secure request tracing. Built with FastAPI, Ollama (LLaMA3), Prometheus, and Grafana, it offers local-first tracking of latency, token usage, user flows, and fallback behavior across models and routes.

What is Llmops Dashboard

LLMOps Dashboard is a lightweight observability and control plane designed for LLM-powered applications. It provides real-time metrics, model policy enforcement, and secure request tracing, built using FastAPI, Ollama (LLaMA3), Prometheus, and Grafana.

Use cases

Use cases include monitoring LLM application performance, enforcing usage policies per user, analyzing token consumption, and tracking user interactions with models.

How to use

To use LLMOps Dashboard, set up the application using the provided starter template. Integrate it with your LLM applications to monitor metrics such as latency, token usage, and user flows. Utilize the Model Control Plane (MCP) to manage model registrations and enforce usage policies.

Key features

Key features include real-time dashboards for analysis, prompt/response metadata tracking, per-client policy enforcement, dynamic policy control, and JWT-based user tracking.

Where to use

LLMOps Dashboard is applicable in various fields such as AI development, machine learning operations, and any environment where LLM applications are deployed, whether locally or in the cloud.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Llmops Dashboard

Use cases

Use cases include monitoring LLM application performance, enforcing usage policies per user, analyzing token consumption, and tracking user interactions with models.

How to use

Key features

Key features include real-time dashboards for analysis, prompt/response metadata tracking, per-client policy enforcement, dynamic policy control, and JWT-based user tracking.

Where to use

LLMOps Dashboard is applicable in various fields such as AI development, machine learning operations, and any environment where LLM applications are deployed, whether locally or in the cloud.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

Secure, observable, local-first LLM workflows that are powered by FastAPI, LLaMA3, and Prometheus

Why This Exists

LLMOps Dashboard is a modular open-source observability stack
for LLM systems that is built with FastAPI, Prometheus, Grafana, and SQLite.

It helps you monitor:

Prompt/response metadata
Latency (p95, per-user)
Token usage and fallback behavior
JWT-based user tracking
Real-time dashboards for analysis

This OSS project provides a full-stack starter template for building
production-grade observability for LLM applications — local or cloud.

ℹ️ Built for local-first development, extensibility, and minimal infra overhead.

Model Control Plane (MCP)

The MCP (Model Control Plane) is a lightweight module that tracks which models are used, by whom, and under what policy constraints.

It enables:

Capability	Description
Model Registration	Track models by name, size, alias, and source
Per-Client Policies	Enforce token limits per user/client
Dynamic Policy Control	Policies can be modified at runtime or pre-configured at boot
Metrics Integration	Token counts and usage policies propagate into `/metrics` Prometheus feed
Identity Tracking	Associates JWT-authenticated users with tracked model usage

Example usage:

from llmops.mcp import model_registry, usage_policy

# Register a model
model_registry.register_model("llama3", "8b", alias="dev")

# Apply a per-user token limit
usage_policy.set_policy("client-x", max_tokens=5000)

ℹ️ This system can evolve into a policy enforcement and audit framework, especially in multi-user environments where tracking LLM usage, enforcing limits, or billing per token becomes critical.

What It Does (Current Stack)

Feature	Description
JWT Auth	Secure `/llm` with per-user access tokens
MCP Integration	Tracks model usage, policy limits, token stats
Prometheus Metrics	Request rate, p95 latency, fallback %, etc.
Grafana Dashboard	Includes working starter panels for request and latency
SQLite Audit Trail	Logs prompt, user, model, token count
Simulation + Testing	Run `make simulate` or `make smoke-test`
LLM Integration	Supports mock + real LLaMA 3 (Ollama) model endpoints
LLaMA 3 (Ollama)	Real local inference via `/llm/echo` using Ollama

LLM Integration Status

This project supports both mock and real LLM inference:

/llm (Simulated)

Default route for testing:

Returns mock responses instantly
Used for simulating traffic and testing fallback logic
No network or real model required

# Simulate fallback model logic
model_used = "openai-gpt"
if random.random() < 0.3:
    model_used = "local-ollama"
return {"response": f"[{model_used.capitalize()}] Answer to: {prompt}"}

/llm/echo (Real LLaMA 3 via Ollama)

Backed by real local inference using Ollama:

ollama run llama3

Once pulled, the model runs offline and is used for actual inference.

To call:

curl -X POST http://localhost:8000/llm/echo \
 -H "Authorization: Bearer <your-jwt>" \
 -H "x-user-id: demo-user" \
 -H "Content-Type: application/json" \
 -d '{"prompt": "What is vector search?"}'

Planned Integrations (Roadmap)

Currently supported:

✅ Local LLM echo endpoint via llama3 + Ollama (/llm/echo)
✅ GPU-ready Docker support with offline model warmup
✅ Prometheus + Grafana instrumentation
✅ Secure JWT-authenticated observability for LLM events
✅ SQLite-based request logging
✅ Test coverage and E2E support

Coming soon:

[ ] Auto Summary Mode
- Nightly background task summarizes recent logs via LLM
- Stored in DB or JSON for display in Grafana summary panel
[ ] Copilot UI Widget
- Frontend prompt box sends input to /llm
- Response is streamed or displayed with built-in observability
[ ] Runtime LLM backend toggle (OpenAI, Ollama, HF)
[ ] OAuth / Auth0 provider support
[ ] Token pricing and billing estimation
[ ] Slack alerting or LLM log summaries

Pluggable LLM Providers:

[ ] OpenAI API via openai.ChatCompletion
[x] Local Ollama models via ollama run
[ ] Hugging Face transformers with local inference engine

ℹ️ Contributions welcome — especially around modular LLM adapters and frontend UX.

JWT Secrets

This project requires JWT_SECRET to be set via .env, environment variables, or secret injection.

JWT_SECRET=supersecretkey        # ⚠️ For local testing only (ChangeMe)

Used in code

# token_issuer.py / auth.py
JWT_SECRET = os.getenv("JWT_SECRET")
if not JWT_SECRET:
    raise RuntimeError("JWT_SECRET must be set")

Used in tests

# conftest.py
secret = os.getenv("JWT_SECRET")
if not secret:
    raise RuntimeError("❌ JWT_SECRET not set in environment")

.env is auto-loaded in local development and test runs.
Docker services consume JWT_SECRET via docker-compose.yaml.

⚠️ Before production use, replace with secure injection methods:

Docker secrets

CI/CD secret management

Vault-backed key providers

Grafana Access

By default, the dashboard uses:

GRAFANA_ADMIN_USER=admin         # ⚠️ Used for initial dashboard provisioning and local testing only (ChangeMe)
GRAFANA_ADMIN_PASSWORD=llmops    # ⚠️ Used for initial dashboard provisioning and local testing only (ChangeMe)
GRAFANA_ALLOW_ANON=true          # ⚠️ Used for initial dashboard provisioning and local testing only (ChangeMe)

⚠️ Change these in .env for production deployments.

You can also enable or disable anonymous access via Grafana’s provisioning config.

Grafana Overview Dashboard

grafana/dashboards/llmops_overview.json includes:

Panel Title	Description
LLM Request Rate by User	Frequency of requests per unique user
Latency by User (p95)	p95 latency distribution by user ID

ℹ️ Auto-loaded by Grafana on container start using provisioning config.
ℹ️ Anonymous access enabled via .env.example credentials.

More panels (e.g., fallback %, token bar charts) can be added easily.

Run Tests (Unit + E2E)

make test-unit     # Fast logic tests (auth, db, policy)
make test-e2e      # Full-stack smoke test w/ JWT and DB

ℹ️ E2E tests simulate real API calls via HTTP, JWT, and DB assertions.

Quickstart

git clone https://github.com/Cre4T3Tiv3/llmops-dashboard.git
cd llmops-dashboard
make init

This does everything:

Verifies required tools (docker, sqlite3, ollama, etc.) via make check
Installs uv if missing
Sets up .venv and installs pyproject.toml dependencies
Auto-creates .env from .env.example if missing
Confirms your selected Ollama model (e.g. llama3) is available locally

This project uses uv — a fast and modern Python package manager — for all local and Docker-based dependency management.

ℹ️ No requirements.txt is needed — dependencies are resolved via pyproject.toml.

Step 1: Verify Local Environment

Run the following to re-check your setup at any time:

make check

This confirms:

Docker and docker-compose are available
sqlite3 is installed (required for make smoke-test)
.env is present and contains necessary keys like JWT_SECRET
ollama CLI is installed and working
Your selected model (via $OLLAMA_MODEL) is installed

If the model is missing, you’ll see a warning like:

❌ Model 'llama3' not found in ollama list

ℹ️ This step is included in make init but can be run independently.

Step 2: Launch the Full Stack

make up

This builds and starts:

Service	URL
FastAPI	http://localhost:8000
Prometheus	http://localhost:9090
Grafana	http://localhost:3000

ℹ️ Dashboard at Grafana auto-loads grafana/dashboards/llmops_overview.json

Want more? See:

HOWTO and E2E Testing Guide
Contributor Guide

Sample Authenticated Request

make generate-jwt
curl -X POST http://localhost:8000/llm \
  -H "Authorization: Bearer <token>" \
  -H "x-user-id: demo-user" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is RAG?"}'

Directory Layout

llmops-dashboard/
├── .dockerignore
├── .env
├── .env.example
├── .github/
│   └── workflows/
│       └── ci.yml
├── .gitignore
├── .jwt.tmp
├── Dockerfile
├── LICENSE
├── Makefile
├── README.dev.md
├── README.md
├── data/
├── docker-compose.override.yml
├── docker-compose.yaml
├── docs/
│   ├── CONTRIBUTING.md
│   └── HOWTO_and_E2E_Testing.md
├── grafana/
│   ├── dashboards/
│   │   └── llmops_overview.json
│   └── provisioning/
│       └── dashboards/
│           └── dashboards.yaml
├── llmops/
│   ├── auth.py
│   ├── database.py
│   ├── main.py
│   └── mcp/
│   │   ├── __init__.py
│   │   ├── client_tracker.py
│   │   ├── model_registry.py
│   │   └── usage_policy.py
│   └── routes/
│       ├── llm_echo.py
│       ├── llm_proxy.py
│       └── token_issuer.py
├── llmops_dashboard.egg-info/
├── prometheus.yml
├── pyproject.toml
└── tests/
    ├── conftest.py
    ├── e2e/
    │   ├── __init__.py
    │   ├── test_llm_echo.py
    │   ├── test_llm_flow.py
    │   ├── test_llm_traffic_simulation.py
    │   ├── test_metrics_exposure.py
    │   └── test_smoke_flow.py
    └── unit/
        ├── __init__.py
        ├── test_database.py
        ├── test_mcp_policy.py
        ├── test_mcp_registry.py
        ├── test_mcp_tracker.py
        └── test_reset_prometheus.py

Makefile Commands

make up                # Start full stack (FastAPI + Prometheus + Grafana)
make generate-jwt      # Create test JWT
make simulate          # Send mock traffic to /llm
make smoke-test        # Full E2E: token → API → DB → metrics
make reset-prometheus  # Clean and rebuild metrics store
make clean             # Delete usage.db and logs
make nuke              # Destroy all containers, volumes, cache

ℹ️ All commands assume uv is installed locally. See uv GitHub page

Want more? See:

HOWTO and E2E Testing Guide
Contributor Guide

Requirements

Docker (v20+)
Linux or WSL (native Windows not supported yet)
Python ≥ 3.10 for CLI/test scripts (optional)
uv for local development and installs

Use Cases

OpenAI/Ollama observability for internal tools
Fine-grained request tracking (JWT, latency, token use)
Test model fallback logic or simulate production LLM traffic
Plug into billing or cost-monitoring with token metadata

Built With

Philosophy

This project isn’t just a toy but, it’s also not a locked-in framework.

You can:

Swap SQLite for Postgres
Swap Prometheus for OpenTelemetry
Swap FastAPI for Flask or Django
Swap JWT with OAuth or session-based auth

The patterns are here.
The rest is yours to extend ♻️

License

Built for the LLM observability era.
OSS, modular, and easy to reason about.

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers