Lemonade

45 Apache-2.0

FreeCommunity

AI Systems

#amd#llama#llm#llm-inference#llms#local-server#mistral#npu#onnxruntime#qwen#openai-api#mcp#mcp-server

Local LLM Server with NPU Acceleration

What is Lemonade

Lemonade is a Local LLM Server with NPU acceleration designed to serve, benchmark, and deploy large language models (LLMs) across various hardware platforms, including CPU, GPU, and NPU.

Use cases

Use cases for Lemonade include serving LLMs in applications, conducting performance benchmarks, experimenting with different LLMs and frameworks, and integrating LLM capabilities into software solutions.

How to use

To use Lemonade, you can install it on Windows or Linux, and utilize the Lemonade Server, Python API, and CLI to integrate LLMs into your applications, run experiments, and benchmark performance.

Key features

Key features of Lemonade include a server interface compatible with the Open AI API, high-level and low-level Python APIs for integration, and a CLI for mixing LLMs and frameworks, along with tools for prompting, measuring accuracy, benchmarking, and profiling memory usage.

Where to use

Lemonade can be used in various fields such as natural language processing, AI research, application development, and any scenario requiring the deployment of large language models.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Lemonade

Lemonade is a Local LLM Server with NPU acceleration designed to serve, benchmark, and deploy large language models (LLMs) across various hardware platforms, including CPU, GPU, and NPU.

Use cases

How to use

To use Lemonade, you can install it on Windows or Linux, and utilize the Lemonade Server, Python API, and CLI to integrate LLMs into your applications, run experiments, and benchmark performance.

Key features

Where to use

Lemonade can be used in various fields such as natural language processing, AI research, application development, and any scenario requiring the deployment of large language models.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

🍋 Lemonade SDK: Quickly serve, benchmark and deploy LLMs

The Lemonade SDK makes it easy to run Large Language Models (LLMs) on your PC. Our focus is using the best tools, such as neural processing units (NPUs) and Vulkan GPU acceleration, to maximize LLM speed and responsiveness.

Features

The Lemonade SDK is comprised of the following:

🌐 Lemonade Server: A local LLM server for running ONNX and GGUF models using the OpenAI API standard. Install and enable your applications with NPU and GPU acceleration in minutes.
🐍 Lemonade API: High-level Python API to directly integrate Lemonade LLMs into Python applications.
🖥️ Lemonade CLI: The lemonade CLI lets you mix-and-match LLMs (ONNX, GGUF, SafeTensors) with measurement tools to characterize your models on your hardware. The available tools are:
- Prompting with templates.
- Measuring accuracy with a variety of tests.
- Benchmarking to get the time-to-first-token and tokens per second.
- Profiling the memory utilization.

Click here to get started with Lemonade.

Supported Configurations

Maximum LLM performance requires the right hardware accelerator with the right inference engine for your scenario. Lemonade supports the following configurations, while also making it easy to switch between them at runtime.

Hardware	🛠️ Engine Support			🖥️ OS (x86/x64)
Hardware	OGA	llamacpp	HF	Windows	Linux
🧠 CPU	All platforms	All platforms	All platforms	✅	✅
🎮 GPU	—	Vulkan: All platforms Focus: Ryzen™ AI 7000/8000/300 Radeon™ 7000/9000	—	✅	✅
🤖 NPU	AMD Ryzen™ AI 300 series	—	—	✅	—

Inference Engines Overview

Engine	Description
OnnxRuntime GenAI (OGA)	Microsoft engine that runs `.onnx` models and enables hardware vendors to provide their own execution providers (EPs) to support specialized hardware, such as neural processing units (NPUs).
llamacpp	Community-driven engine with strong GPU acceleration, support for thousands of `.gguf` models, and advanced features such as vision-language models (VLMs) and mixture-of-experts (MoEs).
Hugging Face (HF)	Hugging Face’s `transformers` library can run the original `.safetensors` trained weights for models on Meta’s PyTorch engine, which provides a source of truth for accuracy measurement.

Integrate Lemonade Server with Your Application

Lemonade Server enables languages including Python, C++, Java, C#, Node.js, Go, Ruby, Rust, and PHP. For the full list and integration details, see docs/server/README.md.

Contributing

We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our contribution guide.

Maintainers

This project is sponsored by AMD. It is maintained by @danielholanda @jeremyfowers @ramkrishna @vgodsoe in equal measure. You can reach us by filing an issue or email [email protected].

License

This project is licensed under the Apache 2.0 License. Portions of the project are licensed as described in NOTICE.md.

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers