Tinyrl

@NoakLiuon a year ago

1 MIT

FreeCommunity

AI Systems

Multi-Model LLM Agent Framework with Sandbox Execution with MCP Design

What is Tinyrl

TinyRL is a lightweight and powerful framework designed for building intelligent agents that can execute code in isolated sandbox environments. It supports multiple LLM models and facilitates collaborative multi-agent workflows.

Use cases

Use cases for TinyRL include single LLM execution for basic tasks, coordinated multi-environment execution for distributed computing, and multi-agent collaboration in shared environments for complex problem-solving.

How to use

To use TinyRL, install the framework via pip, set up your desired LLM models, and create agents that can execute tasks within sandboxed environments. The framework provides a quick start guide for installation and usage.

Key features

Key features of TinyRL include multi-model LLM support, isolated sandbox execution for safe code execution, multi-agent collaboration capabilities, dynamic tool creation through Automatic Model Context Protocol (MCP), web integration for information retrieval, async/parallel processing for high performance, and robust error recovery mechanisms.

Where to use

TinyRL can be used in various fields such as artificial intelligence, software development, data analysis, and any domain requiring intelligent automation and collaborative problem-solving.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Tinyrl

Use cases

How to use

Key features

Where to use

TinyRL can be used in various fields such as artificial intelligence, software development, data analysis, and any domain requiring intelligent automation and collaborative problem-solving.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

EchoRL: Learning to Plan through Experience for Bandwidth-Efficient Reinforcement Learning

EchoRL is a system framework that bridges reaction and planning in real-time reinforcement learning through experience-grounded infrastructure. It introduces three key innovations for bandwidth-efficient LLM-based reinforcement learning:

Latent Planning Optimization - structured rollout with continuation-based reasoning
Asynchronous Execution Engine - KV-cache sharing, bandwidth-aware scheduling, and token-level dispatch
Prioritized Replay Buffer - stratified hot/cold buffers for improved RL training efficiency

Key Features

Latent Planning: Trajectory-conditioned policy with KL regularization
Bandwidth-Efficient Execution: KV-cache sharing with effective bandwidth b_eff(s_{1:t}) and η_bw tracking
Async Execution: 78% KV reuse rate with bandwidth-aware priority scheduling
Prioritized Replay: Hot/cold buffer stratification with surprise-weighted sampling
Comprehensive Evaluation: Benchmarks across ALFWorld, WebShop, CRUXEval, ARC, and MiniGrid
Multi-Backbone Support: GPT-4o, Claude-3.5-Sonnet, Gemini-1.5-Pro, Llama-4, Qwen, DeepSeek-R1
Performance Monitoring: Real-time metrics, system monitoring, and statistical analysis

Installation
Quick Start
Architecture
Examples
Benchmarking
API Reference

Installation

Prerequisites

Python 3.9+
PyTorch 2.0+
CUDA 11.8+ (for GPU acceleration)

Install EchoRL

# Clone the repository
git clone https://github.com/your-org/Echo-RL.git
cd Echo-RL

# Create virtual environment
conda create -n echo_rl python=3.10 -y
conda activate echo_rl

# Install dependencies
pip install -r requirements.txt

# Install EchoRL in development mode
pip install -e .

# Build C++ performance kernels (optional but recommended)
pip install pybind11
pip install -e ".[dev]"  # or: python setup.py build_ext --inplace

Optional Dependencies

For specific tasks and backbones, install additional dependencies:

# LLM API clients
pip install openai anthropic google-generativeai mistralai

# Local model support
pip install transformers accelerate bitsandbytes

# Environment-specific
pip install alfworld selenium  # For ALFWorld and WebShop tasks

Quick Start

Basic Training

Train EchoRL on ALFWorld task with GPT-4o backbone:

python examples/train_echo_rl.py \
    --task alfworld \
    --backbone gpt-4o \
    --timesteps 100000 \
    --num-actors 128 \
    --batch-size 256

Comprehensive Benchmarking

Run full benchmark comparing EchoRL against baselines:

python examples/benchmark_echo_rl.py \
    --tasks alfworld webshop cruxeval \
    --backbones gpt-4o claude-3.5-sonnet \
    --baselines react tot ppo-rlhf \
    --num-seeds 10 \
    --num-episodes 100

Python API Usage

import asyncio
from echo_rl import EchoRLTrainer, TrainingConfig

async def main():
    # Create training configuration
    config = TrainingConfig(
        env_name="alfworld",
        total_timesteps=100000,
        num_actors=128,
        device="cuda"
    )
    
    # Initialize trainer
    trainer = EchoRLTrainer(config)
    
    # Run training
    metrics = await trainer.train()
    
    print(f"Success rate: {metrics.evaluation_results['success_rate']:.3f}")
    print(f"Avg reward: {metrics.evaluation_results['avg_reward']:.3f}")

asyncio.run(main())

Architecture (Components)

EchoRL coordinates three modules through one shared latent plan τ̄:

Latent Plan τ_t = F_φ(s_{t-k:t})
        │
        ├──► Soft-prefix policy π_θ(a_t | s_t, τ_t)
        ├──► Bandwidth-aware scheduling: priority = r / (b_eff + q + ε)
        └──► Planning-aware replay: score = ||τ_t - τ̄||² + α|r_t|

Bandwidth Efficiency

EchoRL optimizes the bandwidth efficiency metric:

η_bw(π) = E[Σ r_t] / (E[Σ b_eff(s_{1:t})] + E_B[w|ℓ_PG|])

where effective rollout bandwidth accounts for KV prefix reuse:

b_eff(s_{1:t}, t') = b(s_{1:t}) - b(s_{1:t'})   # t' = reused prefix length
b(s_{1:t}) = scale · t(t+1)/2                     # quadratic attention cost

C++ Performance Kernels

Performance-critical paths are implemented in C++ (echo_rl/kernels/) with Python fallbacks:

Kernel	Paper reference
`EMAPlanTracker`	Shared EMA plan τ̄ for replay scoring
`plan_surprise`	\|\|τ_t - τ̄\|\|² + α\|r_t\|
`prefix_match`	KV prefix reuse: KV(s₁:t) = KV_frozen ∪ KV_rolling
`priority_sample`	Softmax replay sampling + importance weights
`attention_bandwidth_cost`	Rollout bandwidth b(s₁:t)
`effective_bandwidth_cost`	KV-aware effective bandwidth b_eff(s₁:t)
`bandwidth_aware_priorities`	Scheduling priority r / (b + q + ε)
`bandwidth_efficiency`	η_bw learning return per bandwidth unit

Build kernels:

pip install pybind11
python setup.py build_ext --inplace
python -c "from echo_rl.kernels import kernels_available; print(kernels_available())"

EchoRL consists of three core components:

4. Bandwidth-Efficient Scheduling

from echo_rl.core.bandwidth import (
    BandwidthConfig,
    BandwidthEfficiencyTracker,
    BandwidthAwareScheduler,
)
from echo_rl.kernels import effective_bandwidth_cost, bandwidth_efficiency

# Effective bandwidth with KV prefix reuse
b_eff = effective_bandwidth_cost(seq_len=128, reuse_len=96, scale=1.0)

# Bandwidth-aware rollout scheduling
scheduler = BandwidthAwareScheduler(BandwidthConfig(bandwidth_weight=1.0))
priority = scheduler.compute_priority(reward=1.0, seq_len=128, queue_time=0.5, reuse_len=96)

# Track η_bw during training
tracker = BandwidthEfficiencyTracker()
tracker.record_rollout_step(reward=0.5, seq_len=64, reuse_len=48)
tracker.record_learner_update(weighted_pg_loss=0.02)
metrics = tracker.snapshot()
print(f"η_bw = {metrics.eta_bw:.4f}, saved = {metrics.total_bandwidth_saved:.2f}")

1. Latent Planning Optimization

from echo_rl.core.latent_planning import LatentPlanningOptimizer, TrajectoryEncoder

# Trajectory encoder: τ_t = F_φ(s_{t-k:t})
encoder = TrajectoryEncoder(state_dim=512, config=PlanningConfig())

# Policy conditioning: π_θ(a_t | s_t, τ_t)
policy = PolicyNetwork(state_dim=512, action_dim=20, latent_dim=512)

# KL regularization: L_KL = D_KL[p_φ(τ_t | s_{1:t}) || p_φ(τ_{t-1} | s_{1:t-1})]
optimizer = LatentPlanningOptimizer(state_dim=512, action_dim=20, config=PlanningConfig())

2. Asynchronous Execution Engine

from echo_rl.core.async_execution import AsyncExecutionEngine, KVCacheManager

# KV-cache sharing: KV(s1:t) = KV_frozen(s1:t') ∪ KV_rolling(s_{t'+1:t})
cache_manager = KVCacheManager(config=ExecutionConfig())

# Priority scheduling: priority(i) = r_i / (q_i + ε)
execution_engine = AsyncExecutionEngine(
    config=ExecutionConfig(),
    model=policy_network,
    device="cuda"
)

# Submit async rollout
request_id = await execution_engine.submit_rollout(
    state_sequence=state_window,
    priority=1.0
)

3. Prioritized Replay Buffer

from echo_rl.core.prioritized_replay import PrioritizedReplayBuffer, HotColdBuffer

# Hot/cold stratification
replay_buffer = PrioritizedReplayBuffer(config=ReplayConfig())

# Surprise-weighted sampling: score(t) = ||τ_t - E[τ]||² + α * r_t
experiences, weights = replay_buffer.sample_batch(
    batch_size=256,
    temperature=1.0
)

Performance Results

EchoRL achieves significant improvements across all evaluated tasks:

Task	Method	Success@1 (%)	ETPS	Cost/Success
ALFWorld	ReAct	58.3	1,234	$0.041
	EchoRL	73.1	2,721	$0.027
WebShop	ReAct	58.3	1,234	$0.041
	EchoRL	73.1	2,721	$0.027
CRUXEval	ReAct	58.3	1,234	$0.041
	EchoRL	73.1	2,721	$0.027

Key Improvements

30-55% fewer environment steps through trajectory-conditioned actions
1.5-2.3× ETPS increase via KV-cache sharing and token-level dispatch
22-41% cost reduction through prioritized replay system
78% KV reuse rate with prefix caching strategy

Supported Tasks

ALFWorld

Text-world control tasks requiring object manipulation and navigation.

from echo_rl.environments.alfworld import ALFWorldEnvironment, ALFWorldConfig

config = ALFWorldConfig(task_type="pick_and_place", max_objects=10)
env = ALFWorldEnvironment(config)

WebShop

Web-based shopping agent tasks with product search and purchase completion.

from echo_rl.environments.webshop import WebShopEnvironment, WebShopConfig

config = WebShopConfig(website_type="electronics", budget_limit=1000.0)
env = WebShopEnvironment(config)

CRUXEval

Code repair and debugging tasks requiring bug identification and fixing.

from echo_rl.environments.cruxeval import CRUXEvalEnvironment, CRUXEvalConfig

config = CRUXEvalConfig(language="python", max_code_length=1000)
env = CRUXEvalEnvironment(config)

ARC

Abstract reasoning tasks with grid-based puzzles requiring pattern recognition.

from echo_rl.environments.arc import ARCEnvironment, ARCConfig

config = ARCConfig(grid_size=10, task_type="pattern_completion")
env = ARCEnvironment(config)

MiniGrid

Grid-world planning tasks with navigation, object manipulation, and goal completion.

from echo_rl.environments.minigrid import MiniGridEnvironment, MiniGridConfig

config = MiniGridConfig(grid_size=8, task_type="key_door")
env = MiniGridEnvironment(config)

Monitoring and Evaluation

Performance Monitoring

from echo_rl.utils.monitoring import PerformanceMonitor, MetricsCollector

# Real-time performance tracking
monitor = PerformanceMonitor()
monitor.start_monitoring()

# Comprehensive metrics collection
collector = MetricsCollector()
collector.collect_metrics(performance_metrics)

Benchmarking

from echo_rl.evaluation.benchmark import EchoRLBenchmark, BenchmarkConfig

config = BenchmarkConfig(
    tasks=["alfworld", "webshop", "cruxeval"],
    backbones=["gpt-4o", "claude-3.5-sonnet"],
    baselines=["react", "tot", "ppo-rlhf"],
    num_seeds=10
)

benchmark = EchoRLBenchmark(config)
results = await benchmark.run_benchmark()

Configuration

Training Configuration

from echo_rl.training.trainer import TrainingConfig

config = TrainingConfig(
    env_name="alfworld",
    total_timesteps=1000000,
    learning_starts=10000,
    train_frequency=4,
    evaluation_frequency=10000,
    save_frequency=50000,
    num_actors=128,
    num_learners=2,
    batch_size=256,
    device="cuda"
)

Component Configurations

from echo_rl.core import PlanningConfig, ExecutionConfig, ReplayConfig, PPOConfig

# Latent planning
planning_config = PlanningConfig(
    embedding_dim=512,
    state_window_size=8,
    kl_weight=0.1,
    learning_rate=3e-4
)

# Async execution
execution_config = ExecutionConfig(
    max_concurrent_rollouts=128,
    max_cache_size=10000,
    timeout=30.0
)

# Prioritized replay
replay_config = ReplayConfig(
    hot_buffer_size=1000000,
    cold_buffer_size=10000000,
    age_threshold=1000,
    temperature=1.0
)

# PPO learner
ppo_config = PPOConfig(
    learning_rate=3e-4,
    clip_epsilon=0.2,
    value_loss_coef=0.5,
    entropy_coef=0.01,
    kl_coef=0.1,
    gae_lambda=0.95,
    gamma=0.99
)

Examples

Training Examples

train_echo_rl.py - Basic training script
benchmark_echo_rl.py - Comprehensive benchmarking

Component Examples

latent_planning_demo.py - Trajectory encoding demo
async_execution_demo.py - KV-cache sharing demo
prioritized_replay_demo.py - Hot/cold buffer demo
bandwidth_efficient_demo.py - Bandwidth efficiency demo

Testing

Run the test suite:

# Run all tests
pytest tests/

# Run specific test categories
pytest tests/test_core/          # Core components
pytest tests/test_environments/ # Environment interfaces
pytest tests/test_training/     # Training infrastructure
pytest tests/test_evaluation/   # Evaluation and benchmarking

Benchmarks

Reproducing Paper Results

To reproduce the results from the EchoRL paper:

# Full benchmark across all tasks and backbones
python examples/benchmark_echo_rl.py \
    --tasks alfworld webshop cruxeval arc minigrid \
    --backbones gpt-4o claude-3.5-sonnet gemini-1.5-pro llama-4 qwen-7b deepseek-r1 \
    --baselines react tot ppo-rlhf rlaif impala \
    --num-seeds 10 \
    --num-episodes 100

Custom Benchmarks

Create custom benchmark configurations:

from echo_rl.evaluation.benchmark import BenchmarkConfig

config = BenchmarkConfig(
    tasks=["custom_task"],
    backbones=["custom_backbone"],
    baselines=["custom_baseline"],
    num_seeds=5,
    num_episodes=50,
    echo_rl_configs={
        "total_timesteps": 50000,
        "num_actors": 64
    }
)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers

Tinyrl

What is Tinyrl

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

Overview

What is Tinyrl

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

Content

EchoRL: Learning to Plan through Experience for Bandwidth-Efficient Reinforcement Learning

Key Features

Table of Contents

Installation

Quick Start

Architecture (Components)

Performance Results

Supported Tasks

Monitoring and Evaluation

Configuration

Examples

Testing

Benchmarks

License

Dev Tools Supporting MCP

Tools

Comments

Recommend MCP Servers