- Explore MCP Servers
- lemonade
Lemonade
What is Lemonade
Lemonade is a Local LLM Server with NPU acceleration designed to serve, benchmark, and deploy large language models (LLMs) across various hardware platforms, including CPU, GPU, and NPU.
Use cases
Use cases for Lemonade include serving LLMs in applications, conducting performance benchmarks, experimenting with different LLMs and frameworks, and integrating LLM capabilities into software solutions.
How to use
To use Lemonade, you can install it on Windows or Linux, and utilize the Lemonade Server, Python API, and CLI to integrate LLMs into your applications, run experiments, and benchmark performance.
Key features
Key features of Lemonade include a server interface compatible with the Open AI API, high-level and low-level Python APIs for integration, and a CLI for mixing LLMs and frameworks, along with tools for prompting, measuring accuracy, benchmarking, and profiling memory usage.
Where to use
Lemonade can be used in various fields such as natural language processing, AI research, application development, and any scenario requiring the deployment of large language models.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Lemonade
Lemonade is a Local LLM Server with NPU acceleration designed to serve, benchmark, and deploy large language models (LLMs) across various hardware platforms, including CPU, GPU, and NPU.
Use cases
Use cases for Lemonade include serving LLMs in applications, conducting performance benchmarks, experimenting with different LLMs and frameworks, and integrating LLM capabilities into software solutions.
How to use
To use Lemonade, you can install it on Windows or Linux, and utilize the Lemonade Server, Python API, and CLI to integrate LLMs into your applications, run experiments, and benchmark performance.
Key features
Key features of Lemonade include a server interface compatible with the Open AI API, high-level and low-level Python APIs for integration, and a CLI for mixing LLMs and frameworks, along with tools for prompting, measuring accuracy, benchmarking, and profiling memory usage.
Where to use
Lemonade can be used in various fields such as natural language processing, AI research, application development, and any scenario requiring the deployment of large language models.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
🍋 Lemonade SDK: Quickly serve, benchmark and deploy LLMs
The Lemonade SDK makes it easy to run Large Language Models (LLMs) on your PC. Our focus is using the best tools, such as neural processing units (NPUs) and Vulkan GPU acceleration, to maximize LLM speed and responsiveness.
Features
The Lemonade SDK is comprised of the following:
- 🌐 Lemonade Server: A local LLM server for running ONNX and GGUF models using the OpenAI API standard. Install and enable your applications with NPU and GPU acceleration in minutes.
- 🐍 Lemonade API: High-level Python API to directly integrate Lemonade LLMs into Python applications.
- 🖥️ Lemonade CLI: The
lemonadeCLI lets you mix-and-match LLMs (ONNX, GGUF, SafeTensors) with measurement tools to characterize your models on your hardware. The available tools are:- Prompting with templates.
- Measuring accuracy with a variety of tests.
- Benchmarking to get the time-to-first-token and tokens per second.
- Profiling the memory utilization.
Click here to get started with Lemonade.
Supported Configurations
Maximum LLM performance requires the right hardware accelerator with the right inference engine for your scenario. Lemonade supports the following configurations, while also making it easy to switch between them at runtime.
| Hardware | 🛠️ Engine Support | 🖥️ OS (x86/x64) | |||
|---|---|---|---|---|---|
| OGA | llamacpp | HF | Windows | Linux | |
| 🧠 CPU | All platforms | All platforms | All platforms | ✅ | ✅ |
| 🎮 GPU | — | Vulkan: All platforms Focus: Ryzen™ AI 7000/8000/300 Radeon™ 7000/9000 |
— | ✅ | ✅ |
| 🤖 NPU | AMD Ryzen™ AI 300 series | — | — | ✅ | — |
Inference Engines Overview
| Engine | Description |
|---|---|
| OnnxRuntime GenAI (OGA) | Microsoft engine that runs .onnx models and enables hardware vendors to provide their own execution providers (EPs) to support specialized hardware, such as neural processing units (NPUs). |
| llamacpp | Community-driven engine with strong GPU acceleration, support for thousands of .gguf models, and advanced features such as vision-language models (VLMs) and mixture-of-experts (MoEs). |
| Hugging Face (HF) | Hugging Face’s transformers library can run the original .safetensors trained weights for models on Meta’s PyTorch engine, which provides a source of truth for accuracy measurement. |
Integrate Lemonade Server with Your Application
Lemonade Server enables languages including Python, C++, Java, C#, Node.js, Go, Ruby, Rust, and PHP. For the full list and integration details, see docs/server/README.md.
Contributing
We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our contribution guide.
Maintainers
This project is sponsored by AMD. It is maintained by @danielholanda @jeremyfowers @ramkrishna @vgodsoe in equal measure. You can reach us by filing an issue or email [email protected].
License
This project is licensed under the Apache 2.0 License. Portions of the project are licensed as described in NOTICE.md.
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










