Etl Migration Agent

1 MIT

FreeCommunity

AI Systems

A set of MCP tools / agents to accelerate ETL migrations.

What is Etl Migration Agent

ETL Migration Agent is a Model Context Protocol (MCP) server designed to facilitate the migration of legacy ETL scripts to Python using AI-powered tools.

Use cases

Use cases include migrating SQL-based ETL logic to Python, automating the validation of ETL transformations, and ensuring data consistency during migration.

How to use

To use ETL Migration Agent, integrate it with your existing ETL processes, utilize the provided tools for analysis and conversion, and validate the generated Python code against your legacy scripts.

Key features

Key features include AI-driven code generation, row order consistency checks, output difference analysis, and seamless integration with GitHub Copilot.

Where to use

ETL Migration Agent is applicable in data engineering, data migration projects, and any scenario where legacy ETL scripts need to be modernized or converted to Python.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Etl Migration Agent

ETL Migration Agent is a Model Context Protocol (MCP) server designed to facilitate the migration of legacy ETL scripts to Python using AI-powered tools.

Use cases

Use cases include migrating SQL-based ETL logic to Python, automating the validation of ETL transformations, and ensuring data consistency during migration.

How to use

To use ETL Migration Agent, integrate it with your existing ETL processes, utilize the provided tools for analysis and conversion, and validate the generated Python code against your legacy scripts.

Key features

Key features include AI-driven code generation, row order consistency checks, output difference analysis, and seamless integration with GitHub Copilot.

Where to use

ETL Migration Agent is applicable in data engineering, data migration projects, and any scenario where legacy ETL scripts need to be modernized or converted to Python.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

ETL Migration Agent

GitHub Copilot for Migrating Legacy ETL Scripts 🤯

Overview

ETL Migration Agent is a Model Context Protocol (MCP) server that extends GitHub Copilot with specialized tools for migrating legacy ETL code to Python. It provides a suite of AI-powered tools that work together to analyze, convert, and validate ETL transformations.

Sample Output: See test_data/transformation_script.py for an example of generated Python code that migrates legacy SQL ETL logic to pandas operations.

Architecture

Core Services

Azure AI Foundry: Provides access to LLMs (o3, gpt-4.1, gpt-4o) for language understanding and code generation
Azure AI Agent Service: Used to build agents that perform changes or analysis on files
MCP Server: Enables integration with GitHub Copilot by exposing tools through the Model Context Protocol

Tool and Agent Overview

Tool/Agent	Purpose	Implementation	Models Used
Order Consistency Agent	Ensures source and target files have matching row orders + Uses LLM judge to select best ordering from multiple models	Azure AI Agent Service + Code Interpreter + LLM Judge	`gpt-4`, `gpt-4.1`
Code Bootstrap Agent	Initial Python code generation from legacy ETL	Azure AI Foundry	Configured deployment model
Rowlevel Analyzer Agent	Comparison of output differences + Uses LLM judge to identify matching rows across files	Azure AI Agent Service + Code Interpreter + LLM Judge	Multiple model parallel processing
Code Refinement Agent	Refine Python code based on legacy ETL knowledge and current output.	Azure AI Foundry	`gpt-4.1` (configurable)

Migration Steps with Prompts

1. File Order Alignment

call #order_consistency_agent
Legacy ETL Output file: /workspaces/etl-migration-agent/test_data/output.csv
Source File: /workspaces/etl-migration-agent/test_data/input.csv
Data directory: /workspaces/etl-migration-agent/test_data

Review the generated files in the /workspaces/etl-migration-agent/test_data directory. The agent will create a new file with the same name as the original output file, but with _ordered appended to it (e.g., output_ordered.csv).
You may need to rerun the agent multiple times to ensure the row order matches between the source and target files. The agent will use multiple models to generate different row orders, and a built-in LLM judge will select the best one.

2. Initial Code Generation

call #code_bootstrap_agent
Target directory: /workspaces/etl-migration-agent/test_data
Output file: transformation_script.py

This may take a few minutes as the agent analyzes the legacy ETL code and generates the initial Python transformation script. The generated script will be saved as transformation_script.py in the specified directory. We’ll use the strongest reasoning model available to ensure the code is as accurate as possible.

3. Code Refinement Options

Choose the appropriate refinement method based on the issue:

A. Row Mapping Issues

call #rowlevel_analyzer_agent
Working directory: /workspaces/etl-migration-agent/test_data
Files: output_PY.csv, output.csv

B. Missing Legacy Functionality

call #code_refinement_agent
Working directory: /workspaces/etl-migration-agent/test_data
Files: transformation_script.py, input.csv, output.csv, output_PY.csv

C… CLI-based Validation
Use diff (or other CLI tools) to compare outputs:

diff <(head -n 5 output_PY.csv) <(head -n 5 output.csv)

Prompt 3a: Analyzing Row-Level Differences (GitHub Copilot - Edit Mode)

Evaluate the current Python code and its output.  
Compare it to the expected output.

For files with different row mappings or sorting, use:  

call #rowlevel_analyzer_agent  

(Python output: output_PY.csv, Original output: output.csv)  

Files are located in: /workspaces/etl-migration-agent/test_data

Prompt 3b: Refining Missing Functionality (GitHub Copilot - Edit Mode)

Evaluate the current Python code and its output.  
Compare it to the expected output.

For missing functionality that requires knowledge of the legacy ETL code, use:  

call #code_refinement_agent  

(Python output: output_PY.csv, Original output: output.csv, input file: input.csv, transformation_script.py is not complete)  

Files are located in: /workspaces/etl-migration-agent/test_data

Goal: Fix transformation_script.py so that output_PY.csv matches output.csv.

Prompt 3c: Applying Final Fixes (GitHub Copilot - Edit Mode)

Before running this prompt:
Compare outputs using this command:

diff <(head -n 5 output_PY.csv) <(head -n 5 output.csv)

Then run this prompt:

call #terminalLastCommand

Apply necessary fixes in transformation_script.py to resolve any differences.

You can use this command to check the similarity percentage between the two CSV files:

python3 -c "import csv; a = list(csv.reader(open('output.csv'))); b = list(csv.reader(open('output_PY.csv'))); total = min(len(a), len(b)) * min(len(a[0]), len(b[0])); same = sum(1 for i in range(min(len(a), len(b))) for j in range(min(len(a[i]), len(b[i]))) if a[i][j] == b[i][j]); print(f'{same/total*100:.2f} % similar')"

You may need to diff more lines as you get closer to the final output.

Setup

Prerequisites

Python 3.11+
Azure OpenAI API credentials
Azure AI Agents Project endpoint

Configuration Options

Environment Variable	Description	Default
`AZURE_OPENAI_API_KEY`	Azure OpenAI API key	Required
`AZURE_OPENAI_ENDPOINT`	Azure OpenAI endpoint URL	Required
`AZURE_OPENAI_DEPLOYMENT_NAME`	OpenAI model deployment name	`o3`
`AZURE_OPENAI_API_VERSION`	API version	`2024-12-01-preview`
`MODEL_DEPLOYMENTS`	Comma-separated list of models	`gpt-4,gpt-4o-dz,gpt-4.1,gpt-4o`
`REFINEMENT_MODEL`	Model for code refinement	`gpt-4.1`
`BEST_OUTPUT_SELECTOR_MODEL`	Model for selecting best output	`gpt-4.1`
`LEGACY_ETL_CODE_DESCRIPTION`	Type of legacy code	e.g. `SQL`
`PROJECT_ENDPOINT`	Azure AI Agents project endpoint	Required
`LOG_LEVEL`	Logging level	`INFO`

Quick Start

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Run the server with environment variables:

export AZURE_OPENAI_API_KEY="your_api_key_here"
export AZURE_OPENAI_ENDPOINT="your_endpoint_here"
export PROJECT_ENDPOINT="your_project_endpoint"
export LOG_LEVEL="INFO"
python server.py

VS Code MCP Integration

Add this configuration to your VS Code settings (mcp.json):

{
  "inputs": [
    {
      "type": "promptString",
      "id": "azure_openai_api_key",
      "description": "Azure OpenAI API Key",
      "password": true
    },
    {
      "type": "promptString",
      "id": "azure_openai_endpoint",
      "description": "Azure OpenAI Endpoint URL"
    },
    {
      "type": "promptString",
      "id": "azure_openai_deployment_name",
      "description": "Azure OpenAI Deployment Name (e.g., gpt-4, o3-mini)"
    },
    {
      "type": "promptString",
      "id": "azure_openai_api_version",
      "description": "Azure OpenAI API Version"
    },
    {
      "type": "promptString",
      "id": "project_endpoint",
      "description": "Azure AI Agents Project Endpoint URL"
    },
    {
      "type": "promptString",
      "id": "data_folder_path",
      "description": "Path to folder containing legacy code and CSV files"
    },
    {
      "type": "promptString",
      "id": "log_level",
      "description": "Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)"
    }
  ],
  "servers": {
    "etl-migration": {
      "command": "python",
      "args": [
        "${workspaceFolder}/server.py"
      ],
      "env": {
        "PYTHONUNBUFFERED": "1",
        "AZURE_OPENAI_API_KEY": "${input:azure_openai_api_key}",
        "AZURE_OPENAI_ENDPOINT": "${input:azure_openai_endpoint}",
        "AZURE_OPENAI_DEPLOYMENT_NAME": "${input:azure_openai_deployment_name}",
        "LEGACY_ETL_CODE_DESCRIPTION": "SQL",
        "AZURE_OPENAI_API_VERSION": "${input:azure_openai_api_version}",
        "PROJECT_ENDPOINT": "${input:project_endpoint}",
        "LOG_LEVEL": "${input:log_level}"
      },
      "workingDirectory": "${workspaceFolder}/mcp"
    }
  }
}

Detailed Dev Workflow

graph TD
    A[Legacy ETL Code + CSV Files] --> B[Order Consistency Agent]
    B -->|Reordered Files| C[Code Bootstrap Agent]
    C -->|Initial Python Code| D[Iterative Refinement]
    D --> E{Analysis Method}
    E -->|Simple Differences| F[Terminal diff]
    E -->|Complex Row Mappings| G[Rowlevel Analyzer Agent]
    E -->|Legacy Logic Needed| H[Code Refinement Agent]
    F & G & H --> I{Issues Resolved?}
    I -->|No| D
    I -->|Yes| J

License

This project is licensed under the MIT License - see the LICENSE file for details.

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers