- Explore MCP Servers
- TinyRL
Tinyrl
What is Tinyrl
TinyRL is a lightweight and powerful framework designed for building intelligent agents that can execute code in isolated sandbox environments. It supports multiple LLM models and facilitates collaborative multi-agent workflows.
Use cases
Use cases for TinyRL include single LLM execution for basic tasks, coordinated multi-environment execution for distributed computing, and multi-agent collaboration in shared environments for complex problem-solving.
How to use
To use TinyRL, install the framework via pip, set up your desired LLM models, and create agents that can execute tasks within sandboxed environments. The framework provides a quick start guide for installation and usage.
Key features
Key features of TinyRL include multi-model LLM support, isolated sandbox execution for safe code execution, multi-agent collaboration capabilities, dynamic tool creation through Automatic Model Context Protocol (MCP), web integration for information retrieval, async/parallel processing for high performance, and robust error recovery mechanisms.
Where to use
TinyRL can be used in various fields such as artificial intelligence, software development, data analysis, and any domain requiring intelligent automation and collaborative problem-solving.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Tinyrl
TinyRL is a lightweight and powerful framework designed for building intelligent agents that can execute code in isolated sandbox environments. It supports multiple LLM models and facilitates collaborative multi-agent workflows.
Use cases
Use cases for TinyRL include single LLM execution for basic tasks, coordinated multi-environment execution for distributed computing, and multi-agent collaboration in shared environments for complex problem-solving.
How to use
To use TinyRL, install the framework via pip, set up your desired LLM models, and create agents that can execute tasks within sandboxed environments. The framework provides a quick start guide for installation and usage.
Key features
Key features of TinyRL include multi-model LLM support, isolated sandbox execution for safe code execution, multi-agent collaboration capabilities, dynamic tool creation through Automatic Model Context Protocol (MCP), web integration for information retrieval, async/parallel processing for high performance, and robust error recovery mechanisms.
Where to use
TinyRL can be used in various fields such as artificial intelligence, software development, data analysis, and any domain requiring intelligent automation and collaborative problem-solving.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
EchoRL: Learning to Plan through Experience for Bandwidth-Efficient Reinforcement Learning
EchoRL is a system framework that bridges reaction and planning in real-time reinforcement learning through experience-grounded infrastructure. It introduces three key innovations for bandwidth-efficient LLM-based reinforcement learning:
- Latent Planning Optimization - structured rollout with continuation-based reasoning
- Asynchronous Execution Engine - KV-cache sharing, bandwidth-aware scheduling, and token-level dispatch
- Prioritized Replay Buffer - stratified hot/cold buffers for improved RL training efficiency
Key Features
- Latent Planning: Trajectory-conditioned policy with KL regularization
- Bandwidth-Efficient Execution: KV-cache sharing with effective bandwidth b_eff(s_{1:t}) and η_bw tracking
- Async Execution: 78% KV reuse rate with bandwidth-aware priority scheduling
- Prioritized Replay: Hot/cold buffer stratification with surprise-weighted sampling
- Comprehensive Evaluation: Benchmarks across ALFWorld, WebShop, CRUXEval, ARC, and MiniGrid
- Multi-Backbone Support: GPT-4o, Claude-3.5-Sonnet, Gemini-1.5-Pro, Llama-4, Qwen, DeepSeek-R1
- Performance Monitoring: Real-time metrics, system monitoring, and statistical analysis
Table of Contents
Installation
Prerequisites
- Python 3.9+
- PyTorch 2.0+
- CUDA 11.8+ (for GPU acceleration)
Install EchoRL
# Clone the repository
git clone https://github.com/your-org/Echo-RL.git
cd Echo-RL
# Create virtual environment
conda create -n echo_rl python=3.10 -y
conda activate echo_rl
# Install dependencies
pip install -r requirements.txt
# Install EchoRL in development mode
pip install -e .
# Build C++ performance kernels (optional but recommended)
pip install pybind11
pip install -e ".[dev]" # or: python setup.py build_ext --inplace
Optional Dependencies
For specific tasks and backbones, install additional dependencies:
# LLM API clients
pip install openai anthropic google-generativeai mistralai
# Local model support
pip install transformers accelerate bitsandbytes
# Environment-specific
pip install alfworld selenium # For ALFWorld and WebShop tasks
Quick Start
Basic Training
Train EchoRL on ALFWorld task with GPT-4o backbone:
python examples/train_echo_rl.py \ --task alfworld \ --backbone gpt-4o \ --timesteps 100000 \ --num-actors 128 \ --batch-size 256
Comprehensive Benchmarking
Run full benchmark comparing EchoRL against baselines:
python examples/benchmark_echo_rl.py \ --tasks alfworld webshop cruxeval \ --backbones gpt-4o claude-3.5-sonnet \ --baselines react tot ppo-rlhf \ --num-seeds 10 \ --num-episodes 100
Python API Usage
import asyncio
from echo_rl import EchoRLTrainer, TrainingConfig
async def main():
# Create training configuration
config = TrainingConfig(
env_name="alfworld",
total_timesteps=100000,
num_actors=128,
device="cuda"
)
# Initialize trainer
trainer = EchoRLTrainer(config)
# Run training
metrics = await trainer.train()
print(f"Success rate: {metrics.evaluation_results['success_rate']:.3f}")
print(f"Avg reward: {metrics.evaluation_results['avg_reward']:.3f}")
asyncio.run(main())
Architecture (Components)
EchoRL coordinates three modules through one shared latent plan τ̄:
Latent Plan τ_t = F_φ(s_{t-k:t}) │ ├──► Soft-prefix policy π_θ(a_t | s_t, τ_t) ├──► Bandwidth-aware scheduling: priority = r / (b_eff + q + ε) └──► Planning-aware replay: score = ||τ_t - τ̄||² + α|r_t|
Bandwidth Efficiency
EchoRL optimizes the bandwidth efficiency metric:
η_bw(π) = E[Σ r_t] / (E[Σ b_eff(s_{1:t})] + E_B[w|ℓ_PG|])
where effective rollout bandwidth accounts for KV prefix reuse:
b_eff(s_{1:t}, t') = b(s_{1:t}) - b(s_{1:t'}) # t' = reused prefix length b(s_{1:t}) = scale · t(t+1)/2 # quadratic attention cost
C++ Performance Kernels
Performance-critical paths are implemented in C++ (echo_rl/kernels/) with Python fallbacks:
| Kernel | Paper reference |
|---|---|
EMAPlanTracker |
Shared EMA plan τ̄ for replay scoring |
plan_surprise |
||τ_t - τ̄||² + α|r_t| |
prefix_match |
KV prefix reuse: KV(s₁:t) = KV_frozen ∪ KV_rolling |
priority_sample |
Softmax replay sampling + importance weights |
attention_bandwidth_cost |
Rollout bandwidth b(s₁:t) |
effective_bandwidth_cost |
KV-aware effective bandwidth b_eff(s₁:t) |
bandwidth_aware_priorities |
Scheduling priority r / (b + q + ε) |
bandwidth_efficiency |
η_bw learning return per bandwidth unit |
Build kernels:
pip install pybind11
python setup.py build_ext --inplace
python -c "from echo_rl.kernels import kernels_available; print(kernels_available())"
EchoRL consists of three core components:
4. Bandwidth-Efficient Scheduling
from echo_rl.core.bandwidth import (
BandwidthConfig,
BandwidthEfficiencyTracker,
BandwidthAwareScheduler,
)
from echo_rl.kernels import effective_bandwidth_cost, bandwidth_efficiency
# Effective bandwidth with KV prefix reuse
b_eff = effective_bandwidth_cost(seq_len=128, reuse_len=96, scale=1.0)
# Bandwidth-aware rollout scheduling
scheduler = BandwidthAwareScheduler(BandwidthConfig(bandwidth_weight=1.0))
priority = scheduler.compute_priority(reward=1.0, seq_len=128, queue_time=0.5, reuse_len=96)
# Track η_bw during training
tracker = BandwidthEfficiencyTracker()
tracker.record_rollout_step(reward=0.5, seq_len=64, reuse_len=48)
tracker.record_learner_update(weighted_pg_loss=0.02)
metrics = tracker.snapshot()
print(f"η_bw = {metrics.eta_bw:.4f}, saved = {metrics.total_bandwidth_saved:.2f}")
1. Latent Planning Optimization
from echo_rl.core.latent_planning import LatentPlanningOptimizer, TrajectoryEncoder
# Trajectory encoder: τ_t = F_φ(s_{t-k:t})
encoder = TrajectoryEncoder(state_dim=512, config=PlanningConfig())
# Policy conditioning: π_θ(a_t | s_t, τ_t)
policy = PolicyNetwork(state_dim=512, action_dim=20, latent_dim=512)
# KL regularization: L_KL = D_KL[p_φ(τ_t | s_{1:t}) || p_φ(τ_{t-1} | s_{1:t-1})]
optimizer = LatentPlanningOptimizer(state_dim=512, action_dim=20, config=PlanningConfig())
2. Asynchronous Execution Engine
from echo_rl.core.async_execution import AsyncExecutionEngine, KVCacheManager
# KV-cache sharing: KV(s1:t) = KV_frozen(s1:t') ∪ KV_rolling(s_{t'+1:t})
cache_manager = KVCacheManager(config=ExecutionConfig())
# Priority scheduling: priority(i) = r_i / (q_i + ε)
execution_engine = AsyncExecutionEngine(
config=ExecutionConfig(),
model=policy_network,
device="cuda"
)
# Submit async rollout
request_id = await execution_engine.submit_rollout(
state_sequence=state_window,
priority=1.0
)
3. Prioritized Replay Buffer
from echo_rl.core.prioritized_replay import PrioritizedReplayBuffer, HotColdBuffer
# Hot/cold stratification
replay_buffer = PrioritizedReplayBuffer(config=ReplayConfig())
# Surprise-weighted sampling: score(t) = ||τ_t - E[τ]||² + α * r_t
experiences, weights = replay_buffer.sample_batch(
batch_size=256,
temperature=1.0
)
Performance Results
EchoRL achieves significant improvements across all evaluated tasks:
| Task | Method | Success@1 (%) | ETPS | Cost/Success |
|---|---|---|---|---|
| ALFWorld | ReAct | 58.3 | 1,234 | $0.041 |
| EchoRL | 73.1 | 2,721 | $0.027 | |
| WebShop | ReAct | 58.3 | 1,234 | $0.041 |
| EchoRL | 73.1 | 2,721 | $0.027 | |
| CRUXEval | ReAct | 58.3 | 1,234 | $0.041 |
| EchoRL | 73.1 | 2,721 | $0.027 |
Key Improvements
- 30-55% fewer environment steps through trajectory-conditioned actions
- 1.5-2.3× ETPS increase via KV-cache sharing and token-level dispatch
- 22-41% cost reduction through prioritized replay system
- 78% KV reuse rate with prefix caching strategy
Supported Tasks
ALFWorld
Text-world control tasks requiring object manipulation and navigation.
from echo_rl.environments.alfworld import ALFWorldEnvironment, ALFWorldConfig
config = ALFWorldConfig(task_type="pick_and_place", max_objects=10)
env = ALFWorldEnvironment(config)
WebShop
Web-based shopping agent tasks with product search and purchase completion.
from echo_rl.environments.webshop import WebShopEnvironment, WebShopConfig
config = WebShopConfig(website_type="electronics", budget_limit=1000.0)
env = WebShopEnvironment(config)
CRUXEval
Code repair and debugging tasks requiring bug identification and fixing.
from echo_rl.environments.cruxeval import CRUXEvalEnvironment, CRUXEvalConfig
config = CRUXEvalConfig(language="python", max_code_length=1000)
env = CRUXEvalEnvironment(config)
ARC
Abstract reasoning tasks with grid-based puzzles requiring pattern recognition.
from echo_rl.environments.arc import ARCEnvironment, ARCConfig
config = ARCConfig(grid_size=10, task_type="pattern_completion")
env = ARCEnvironment(config)
MiniGrid
Grid-world planning tasks with navigation, object manipulation, and goal completion.
from echo_rl.environments.minigrid import MiniGridEnvironment, MiniGridConfig
config = MiniGridConfig(grid_size=8, task_type="key_door")
env = MiniGridEnvironment(config)
Monitoring and Evaluation
Performance Monitoring
from echo_rl.utils.monitoring import PerformanceMonitor, MetricsCollector
# Real-time performance tracking
monitor = PerformanceMonitor()
monitor.start_monitoring()
# Comprehensive metrics collection
collector = MetricsCollector()
collector.collect_metrics(performance_metrics)
Benchmarking
from echo_rl.evaluation.benchmark import EchoRLBenchmark, BenchmarkConfig
config = BenchmarkConfig(
tasks=["alfworld", "webshop", "cruxeval"],
backbones=["gpt-4o", "claude-3.5-sonnet"],
baselines=["react", "tot", "ppo-rlhf"],
num_seeds=10
)
benchmark = EchoRLBenchmark(config)
results = await benchmark.run_benchmark()
Configuration
Training Configuration
from echo_rl.training.trainer import TrainingConfig
config = TrainingConfig(
env_name="alfworld",
total_timesteps=1000000,
learning_starts=10000,
train_frequency=4,
evaluation_frequency=10000,
save_frequency=50000,
num_actors=128,
num_learners=2,
batch_size=256,
device="cuda"
)
Component Configurations
from echo_rl.core import PlanningConfig, ExecutionConfig, ReplayConfig, PPOConfig
# Latent planning
planning_config = PlanningConfig(
embedding_dim=512,
state_window_size=8,
kl_weight=0.1,
learning_rate=3e-4
)
# Async execution
execution_config = ExecutionConfig(
max_concurrent_rollouts=128,
max_cache_size=10000,
timeout=30.0
)
# Prioritized replay
replay_config = ReplayConfig(
hot_buffer_size=1000000,
cold_buffer_size=10000000,
age_threshold=1000,
temperature=1.0
)
# PPO learner
ppo_config = PPOConfig(
learning_rate=3e-4,
clip_epsilon=0.2,
value_loss_coef=0.5,
entropy_coef=0.01,
kl_coef=0.1,
gae_lambda=0.95,
gamma=0.99
)
Examples
Training Examples
train_echo_rl.py- Basic training scriptbenchmark_echo_rl.py- Comprehensive benchmarking
Component Examples
latent_planning_demo.py- Trajectory encoding demoasync_execution_demo.py- KV-cache sharing demoprioritized_replay_demo.py- Hot/cold buffer demobandwidth_efficient_demo.py- Bandwidth efficiency demo
Testing
Run the test suite:
# Run all tests
pytest tests/
# Run specific test categories
pytest tests/test_core/ # Core components
pytest tests/test_environments/ # Environment interfaces
pytest tests/test_training/ # Training infrastructure
pytest tests/test_evaluation/ # Evaluation and benchmarking
Benchmarks
Reproducing Paper Results
To reproduce the results from the EchoRL paper:
# Full benchmark across all tasks and backbones
python examples/benchmark_echo_rl.py \
--tasks alfworld webshop cruxeval arc minigrid \
--backbones gpt-4o claude-3.5-sonnet gemini-1.5-pro llama-4 qwen-7b deepseek-r1 \
--baselines react tot ppo-rlhf rlaif impala \
--num-seeds 10 \
--num-episodes 100
Custom Benchmarks
Create custom benchmark configurations:
from echo_rl.evaluation.benchmark import BenchmarkConfig
config = BenchmarkConfig(
tasks=["custom_task"],
backbones=["custom_backbone"],
baselines=["custom_baseline"],
num_seeds=5,
num_episodes=50,
echo_rl_configs={
"total_timesteps": 50000,
"num_actors": 64
}
)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










