- Explore MCP Servers
- finAgent
Finagent
What is Finagent
finAgent is an AI-powered finance assistant that utilizes Google’s Gemini for natural language processing and Whisper for voice transcription, enabling users to interact through text, voice commands, and file uploads.
Use cases
Use cases include retrieving account balances, listing transactions, querying document contents, and processing uploaded financial documents.
How to use
Users can access finAgent via its web interface, where they can type messages, use voice commands, or upload files such as CSV, Excel, TXT, and JSON for processing.
Key features
Key features include conversational AI for understanding requests, voice interaction for speech-to-text conversion, file processing for structured data extraction, and dynamic tool usage through MCP for executing various financial tasks.
Where to use
finAgent is suitable for finance-related tasks in various sectors, including personal finance management, accounting, and financial analysis.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Finagent
finAgent is an AI-powered finance assistant that utilizes Google’s Gemini for natural language processing and Whisper for voice transcription, enabling users to interact through text, voice commands, and file uploads.
Use cases
Use cases include retrieving account balances, listing transactions, querying document contents, and processing uploaded financial documents.
How to use
Users can access finAgent via its web interface, where they can type messages, use voice commands, or upload files such as CSV, Excel, TXT, and JSON for processing.
Key features
Key features include conversational AI for understanding requests, voice interaction for speech-to-text conversion, file processing for structured data extraction, and dynamic tool usage through MCP for executing various financial tasks.
Where to use
finAgent is suitable for finance-related tasks in various sectors, including personal finance management, accounting, and financial analysis.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
AI Finance Agent with Voice, File Upload, and Tool Use
Live Demo: https://finagent-vc73.onrender.com/
Overview
This project implements an AI-powered assistant designed to help with finance-related tasks. It features a web-based chat interface where users can interact using text, voice commands, or by uploading files. The agent leverages Google’s Gemini language model for natural language understanding and response generation, Groq’s API for fast voice transcription, and a custom backend server using the Multi-Capability Protocol (MCP) for executing specific tools and actions.
While the initial vision included building a highly dynamic agent builder for complex ERP systems using only open-source components and advanced RAG, this implementation focuses on a functional Finance Agent showcasing:
- Conversational AI: Interacting with Gemini for understanding requests and generating responses.
- Voice Interaction: Whisper model for speech-to-text.
- File Processing: Uploading documents (CSV, Excel, TXT, JSON) and automatically converting tabular/text data into a structured JSON format stored alongside the original.
- Tool Use: Gemini dynamically calls backend tools (via MCP) to perform actions like retrieving account balances, listing transactions, querying document contents, and processing uploaded files.
- Modular Backend: Using MCP allows the backend logic (tool implementations) to run as a separate process, promoting separation of concerns.
Features
- Web Interface: Clean chat interface built with FastAPI and HTMX for dynamic updates without full page reloads.
- Text & Voice Commands: Interact via typed messages or by recording voice commands.
- Fast Transcription: Utilizes Groq API for quick and accurate speech-to-text conversion (Whisper-large-v3).
- File Upload & Processing:
- Upload CSV, Excel (.xls, .xlsx), JSON, or TXT files.
- The backend automatically saves the original file.
- Automated JSON Conversion: Attempts to convert the full content of uploaded CSV, Excel (all sheets), or TXT files into a structured JSON representation (
<original_filename>.json
). This JSON version is saved in the knowledge base alongside the original. - Handles potential decoding errors and imposes size limits for conversion (
MAX_ROWS_FULL_CONVERSION
,MAX_CHARS_FULL_CONVERSION
).
- Gemini Integration: Uses the
google-generativeai
library to interact with the Gemini family of models (configurable, defaults to 1.5 Flash). - Tool Calling:
- Gemini identifies when a specific capability (tool) defined on the backend server is needed.
- The agent logic calls the appropriate tool via MCP (e.g.,
get_account_balance
,list_transactions
,create_invoice
,get_document_content
,process_uploaded_document
). - Results from the tool are sent back to Gemini to inform the final response.
- MCP Backend Server: A separate Python process (
finance_agent_server_json.py
) hosts the tool implementations and manages data persistence (finance_data.json
) and the document knowledge base (uploaded_files/
). - Knowledge Base / Basic RAG:
- Uploaded files (originals and generated
.json
versions) are stored in theuploaded_files/
directory. get_document_content
tool allows Gemini to retrieve the full content of any specific file (original or JSON).query_knowledge_base
tool performs basic keyword searches across all files in the knowledge base.
- Uploaded files (originals and generated
- Chat History Management: Maintains conversation history for context and provides a “Clear History” function.
Architecture
The application consists of three main components:
-
FastAPI Web Server (
main.py
):- Serves the HTML frontend (using Jinja2 templates).
- Handles HTTP requests for chat, file uploads, voice transcription, file listing, and history clearing.
- Communicates with the
FinanceAgentClientLogic
. - Uses HTMX to update the UI dynamically.
- Manages the Groq API client for transcription.
-
Agent Client Logic (
finance_agent_logic.py
):- Acts as the orchestrator between the frontend, the AI model, and the backend server.
- Manages the connection to the MCP server using
mcp-fastmcp
client library. - Formats available MCP tools for Gemini.
- Manages the conversation history with Gemini.
- Handles the interaction flow: sending prompts, processing function calls, calling MCP tools, sending results back.
- Specifically handles file uploads by first calling the
process_uploaded_document
tool on the server before prompting Gemini, informing the LLM about the newly available original and JSON files.
-
MCP Backend Server (
finance_agent_server_json.py
):- Runs as a separate process, typically managed by the
FinanceAgentClientLogic
via stdio. - Implements the MCP server using
mcp-fastmcp
. - Defines the actual tools (Python functions decorated with
@mcp.tool()
) that perform specific actions (e.g., accessingfinance_data.json
, interacting with files inuploaded_files/
, converting uploaded files to JSON). - Handles data persistence for basic financial records (
static/finance_data.json
). - Manages the
uploaded_files/
directory.
- Runs as a separate process, typically managed by the
Interaction Flow:
- User Input (Text/Voice/File): User interacts via the web UI. Voice is transcribed via Groq.
- FastAPI: Receives the request. If a file is present, it’s read.
- Agent Logic:
- Ensures MCP connection is active.
- If a file was uploaded, calls the
process_uploaded_document
tool on the MCP server. The server saves the original and attempts to create/save a.json
version. - Constructs a prompt for Gemini, including the user command and information about any processed file (mentioning the original and the generated
.json
filename). - Sends the prompt and available tools list to Gemini.
- Gemini: Processes the prompt. Either generates a text response or requests a function call (tool use).
- Agent Logic:
- If Gemini requests a tool:
- Calls the corresponding tool on the MCP server via the MCP protocol.
- Receives the result from the MCP server.
- Sends the tool result back to Gemini.
- Receives the final text response from Gemini.
- If Gemini provides a direct text response, uses that.
- If Gemini requests a tool:
- FastAPI: Receives the final text response from the Agent Logic, converts it to HTML (using Markdown), and sends it back to the browser via HTMX.
Technology Stack
- Backend Framework: FastAPI
- Web Server: Uvicorn
- Templating: Jinja2
- Frontend Dynamics: HTMX (via CDN in
index.html
) - AI Model: Google Gemini (via
google-generativeai
) - Voice Transcription: Whisper model ( via
Groq API
) - Backend Communication: MCP (Multi-Capability Protocol) via
mcp-fastmcp
- Data Handling (Server): Pandas (for CSV/Excel processing)
Setup and Running
-
Prerequisites:
- Python 3.8 or higher
- Git
-
Clone the Repository:
git clone <your-repository-url> cd <repository-directory>
-
Create a Virtual Environment:
python -m venv venv # On Windows .\venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install Dependencies:
(First, ensure you have the requirements listed. If not, create the file)# If requirements.txt doesn't exist, create it from the installed packages # Ensure you have installed: fastapi uvicorn jinja2 python-dotenv google-generativeai mcp-fastmcp groq pandas markdown requests aiohttp # (add any others imported) # pip freeze > requirements.txt # Run this after installing needed packages manually if needed # Install from requirements.txt pip install -r requirements.txt
(Note: You need to create a
requirements.txt
file listing all dependencies, e.g., by runningpip freeze > requirements.txt
after installing them manually.) -
Set Up Environment Variables:
- Create a file named
.env
in the project root directory. - Add your API keys to the
.env
file:GEMINI_API_KEY=YOUR_GOOGLE_AI_STUDIO_API_KEY GROQ_API_KEY=YOUR_GROQ_CLOUD_API_KEY
- Get your Gemini API key from Google AI Studio.
- Get your Groq API key from GroqCloud.
- Create a file named
-
Run the Application:
uvicorn main:app --reload --host 127.0.0.1 --port 5001
-
Access the UI:
Open your web browser and navigate tohttp://127.0.0.1:5001
.
How It Works: Detailed Flows
Voice Command
- User clicks the microphone button.
- Browser records audio (typically webm format).
- Audio data is sent to the
/transcribe
endpoint inmain.py
. main.py
sends the audio data to the Groq API using thegroq
Python client.- Groq transcribes the audio using Whisper-large-v3.
- The transcribed text is returned to the browser and populates the chat input.
- User clicks “Send” (or presses Enter).
- Follows the standard “Text Command” flow below.
Text Command / Sending Transcribed Text
- User types a command or sends transcribed text.
- The command is sent to the
/chat
endpoint inmain.py
. main.py
callsagent_logic_instance.process_single_command
.finance_agent_logic
sends the command (and chat history) to Gemini.- Gemini responds with text or a function call request.
- If it’s a function call (tool use):
finance_agent_logic
parses the tool name and arguments.- It sends a
call_tool
request to thefinance_agent_server_json
process via MCP. - The server executes the tool function (e.g.,
get_account_balance
). - The server returns the result via MCP.
finance_agent_logic
sends the tool result back to Gemini.- Gemini generates the final text response based on the tool result.
- The final text response is sent back through
main.py
to the browser.
File Upload
- User selects a file and optionally types a command.
- The file and command are sent to the
/chat
endpoint inmain.py
. main.py
reads the file content and passes it along with the command toagent_logic_instance.process_single_command
.- Crucially:
finance_agent_logic
first calls theprocess_uploaded_document
tool on the MCP server, passing the filename and base64-encoded content. - The
finance_agent_server_json
process:- Decodes the content.
- Saves the original file to the
uploaded_files/
directory. - Detects the file type (CSV, Excel, JSON, TXT).
- If convertible (CSV, Excel, TXT, valid JSON):
- Reads/parses the content (using Pandas for CSV/Excel).
- Applies size limits.
- Creates a structured JSON representation.
- Saves this structure as
<original_filename>.json
inuploaded_files/
.
- Returns a JSON response summarizing the outcome (success, original only, partial failure) including the original filename and the generated JSON filename (if successful).
finance_agent_logic
receives this summary response from the server.finance_agent_logic
constructs the prompt for Gemini:- Includes the summary message from the file processing.
- Explicitly tells Gemini that the original file (
filename
) and a structured JSON version (filename.json
) are available in the knowledge base and can be accessed using theget_document_content
tool. - Includes the user’s original text command (if any).
- Sends the augmented prompt to Gemini.
- The flow continues like a standard text command, but Gemini is now aware of the newly processed file and its JSON counterpart, potentially using
get_document_content
in a subsequent turn if needed.
File Handling & Knowledge Base
- Storage: All uploaded files and their generated JSON counterparts are stored in the
uploaded_files/
directory relative to the server script’s location (finance_agent_server_json.py
). - JSON Conversion: The server attempts to create a complete JSON representation of CSV, Excel (all sheets), and TXT files. This allows Gemini to potentially analyze the structured data via the
get_document_content
tool applied to the.json
file. - Access: The
get_document_content(file_name)
tool can retrieve the content of any file in theuploaded_files/
directory, whether it’s an original upload or a server-generated.json
file. - Querying: The
query_knowledge_base(query)
tool performs a simple, case-insensitive keyword search across the text content of all files inuploaded_files/
. It returns snippets from files containing the query terms. This is a basic form of RAG.
Configuration
- API Keys: Configure
GEMINI_API_KEY
andGROQ_API_KEY
in the.env
file. - Server Script: The path to the backend server script is defined in
main.py
(SERVER_SCRIPT_PATH
). - Upload Directory: The directory for uploads (
UPLOAD_DIR
) is derived relative to the server script path in bothmain.py
andfinance_agent_server_json.py
. Ensure consistency if paths are changed. - Data File: The path for the simple JSON database is configured in
finance_agent_server_json.py
(DATA_FILE
). - Conversion Limits:
MAX_ROWS_FULL_CONVERSION
andMAX_CHARS_FULL_CONVERSION
infinance_agent_server_json.py
control the size limits for generating JSON files from large uploads. - Gemini Model: The Gemini model name can be changed in
finance_agent_logic.py
(model_name='gemini-1.5-flash-latest'
).
Adding New Tools/Operations
- Define the Tool Function: In
finance_agent_server_json.py
, create a newasync
Python function that performs the desired action. - Decorate the Function: Add the
@mcp.tool()
decorator above your function definition. - Add Type Hinting & Docstring: Use Python type hints for arguments and the return type. Write a clear docstring explaining what the tool does, its arguments, and what it returns. Gemini uses this information to understand how and when to use the tool. The return value should ideally be a JSON string representing the result or status.
- Restart the Application: When the application restarts, the
FinanceAgentClientLogic
will automatically connect to the MCP server, list the available tools (including your new one), and make it available to Gemini during subsequent interactions.
License
This project is licensed under the MIT License - see the LICENSE file for details (You should create a LICENSE file with the MIT license text).
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.