Unstructured API MCP Server

28 MIT

FreeOfficial

Knowledge Base

#unstructured#api#document processing#workflow#connectors

An MCP server implementation for interacting with the Unstructured API. This server provides tools to list sources and workflows.

What is Unstructured API MCP Server

The Unstructured MCP Server is a service designed for interacting with the Unstructured API, enabling users to manage and operate various data sources and workflows. It provides a set of tools to list, create, update, and delete source and destination connectors, as well as workflows that facilitate data processing tasks.

Use cases

The server can be utilized in various scenarios, including data ingestion from multiple sources like S3, Azure, and Google Drive, and exporting processed data to destinations like Pinecone and MongoDB. It’s particularly useful for automating workflows that integrate machine learning models with diverse data sources and destinations.

How to use

To set up the Unstructured MCP Server, users need to have Python 3.12+, an environment management tool like uv, and an API key from Unstructured. The server can be run directly using the provided command-line tools or through integration with applications like Claude Desktop. After configuration, users can call specific API endpoints to list sources, create workflows, and manage jobs effectively.

Key features

The Unstructured MCP Server boasts features like listing various sources and destinations, creating and managing workflows, running jobs asynchronously, and integrating with Firecrawl for web crawling and text generation tasks. It supports multiple connectors, ensuring flexibility in data handling.

Where to use

This server can be deployed in environments where automated data processing is required, such as data engineering pipelines, machine learning model training setups, and web data extraction tasks. It is suitable for developers and data scientists aiming to streamline their workflows and enhance data accessibility.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Unstructured API MCP Server

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

Unstructured API MCP Server

An MCP server implementation for interacting with the Unstructured API. This server provides tools to list sources and workflows.

Available Tools

Tool	Description
`list_sources`	Lists available sources from the Unstructured API.
`get_source_info`	Get detailed information about a specific source connector.
`create_source_connector`	Create a source connector.)
`update_source_connector`	Update an existing source connector by params.
`delete_source_connector`	Delete a source connector by source id.
`list_destinations`	Lists available destinations from the Unstructured API.
`get_destination_info`	Get detailed info about a specific destination connector
`create_destination_connector`	Create a destination connector by params.
`update_destination_connector`	Update an existing destination connector by destination id.
`delete_destination_connector`	Delete a destination connector by destination id.
`list_workflows`	Lists workflows from the Unstructured API.
`get_workflow_info`	Get detailed information about a specific workflow.
`create_workflow`	Create a new workflow with source, destination id, etc.
`run_workflow`	Run a specific workflow with workflow id
`update_workflow`	Update an existing workflow by params.
`delete_workflow`	Delete a specific workflow by id.
`list_jobs`	Lists jobs for a specific workflow from the Unstructured API.
`get_job_info`	Get detailed information about a specific job by job id.
`cancel_job`	Delete a specific job by id.
`list_workflows_with_finished_jobs`	Lists all workflows that have any completed job, together with information about source and destination details.

Below is a list of connectors the UNS-MCP server currently supports, please see the full list of source connectors that Unstructured platform supports here and destination list here. We are planning on adding more!

Source	Destination
S3	S3
Azure	Weaviate
Google Drive	Pinecone
OneDrive	AstraDB
Salesforce	MongoDB
Sharepoint	Neo4j
	Databricks Volumes
	Databricks Volumes Delta Table

To use the tool that creates/updates/deletes a connector, the credentials for that specific connector must be defined in your .env file. Below is the list of credentials for the connectors we support:

Credential Name	Description
`ANTHROPIC_API_KEY`	required to run the `minimal_client` to interact with our server.
`AWS_KEY`, `AWS_SECRET`	required to create S3 connector via `uns-mcp` server, see how in documentation and here
`WEAVIATE_CLOUD_API_KEY`	required to create Weaviate vector db connector, see how in documentation
`FIRECRAWL_API_KEY`	required to use Firecrawl tools in `external/firecrawl.py`, sign up on Firecrawl and get an API key.
`ASTRA_DB_APPLICATION_TOKEN`, `ASTRA_DB_API_ENDPOINT`	required to create Astradb connector via `uns-mcp` server, see how in documentation
`AZURE_CONNECTION_STRING`	required option 1 to create Azure connector via `uns-mcp` server, see how in documentation
`AZURE_ACCOUNT_NAME`+`AZURE_ACCOUNT_KEY`	required option 2 to create Azure connector via `uns-mcp` server, see how in documentation
`AZURE_ACCOUNT_NAME`+`AZURE_SAS_TOKEN`	required option 3 to create Azure connector via `uns-mcp` server, see how in documentation
`NEO4J_PASSWORD`	required to create Neo4j connector via `uns-mcp` server, see how in documentation
`MONGO_DB_CONNECTION_STRING`	required to create Mongodb connector via `uns-mcp` server, see how in documentation
`GOOGLEDRIVE_SERVICE_ACCOUNT_KEY`	a string value. The original server account key (follow documentation) is in json file, run `base64 < /path/to/google_service_account_key.json` in terminal to get the string value
`DATABRICKS_CLIENT_ID`,`DATABRICKS_CLIENT_SECRET`	required to create Databricks volume/delta table connector via `uns-mcp` server, see how in documentation and here
`ONEDRIVE_CLIENT_ID`, `ONEDRIVE_CLIENT_CRED`,`ONEDRIVE_TENANT_ID`	required to create One Drive connector via `uns-mcp` server, see how in documentation
`PINECONE_API_KEY`	required to create Pinecone vector DB connector via `uns-mcp` server, see how in documentation
`SALESFORCE_CONSUMER_KEY`,`SALESFORCE_PRIVATE_KEY`	required to create salesforce source connector via `uns-mcp` server, see how in documentation
`SHAREPOINT_CLIENT_ID`, `SHAREPOINT_CLIENT_CRED`,`SHAREPOINT_TENANT_ID`	required to create One Drive connector via `uns-mcp` server, see how in documentation
`LOG_LEVEL`	Used to set logging level for our `minimal_client`, e.g. set to ERROR to get everything
`CONFIRM_TOOL_USE`	set to true so that `minimal_client` can confirm execution before each tool call
`DEBUG_API_REQUESTS`	set to true so that `uns_mcp/server.py` can output request parameters for better debugging

Firecrawl Source

Firecrawl is a web crawling API that provides two main capabilities in our MCP:

HTML Content Retrieval: Using invoke_firecrawl_crawlhtml to start crawl jobs and check_crawlhtml_status to monitor them
LLM-Optimized Text Generation: Using invoke_firecrawl_llmtxt to generate text and check_llmtxt_status to retrieve results

How Firecrawl works:

Web Crawling Process:

Starts with a specified URL and analyzes it to identify links
Uses the sitemap if available; otherwise follows links found on the website
Recursively traverses each link to discover all subpages
Gathers content from every visited page, handling JavaScript rendering and rate limits
Jobs can be cancelled with cancel_crawlhtml_job if needed
Use this if you require all the info extracted into raw HTML, Unstructured’s workflow cleans it up really well :smile:

LLM Text Generation:

After crawling, extracts clean, meaningful text content from the crawled pages
Generates optimized text formats specifically formatted for large language models
Results are automatically uploaded to the specified S3 location
Note: LLM text generation jobs cannot be cancelled once started. The cancel_llmtxt_job function is provided for consistency but is not currently supported by the Firecrawl API.

Note: A FIRECRAWL_API_KEY environment variable must be set to use these functions.

Installation & Configuration

This guide provides step-by-step instructions to set up and configure the UNS_MCP server using Python 3.12 and the uv tool.

Prerequisites

Python 3.12+
uv for environment management
An API key from Unstructured. You can sign up and obtain your API key here.

Using uv (Recommended)

No additional installation is required when using uvx as it handles execution. However, if you prefer to install the package directly:

uv pip install uns_mcp

Configure Claude Desktop

For integration with Claude Desktop, add the following content to your claude_desktop_config.json:

Note: The file is located in the ~/Library/Application Support/Claude/ directory.

Using uvx Command:

{
  "mcpServers": {
    "UNS_MCP": {
      "command": "uvx",
      "args": [
        "uns_mcp"
      ],
      "env": {
        "UNSTRUCTURED_API_KEY": "<your-key>"
      }
    }
  }
}

Alternatively, Using Python Package:

{
  "mcpServers": {
    "UNS_MCP": {
      "command": "python",
      "args": [
        "-m",
        "uns_mcp"
      ],
      "env": {
        "UNSTRUCTURED_API_KEY": "<your-key>"
      }
    }
  }
}

Using Source Code

Clone the repository.
Install dependencies:
```
uv sync
```
Set your Unstructured API key as an environment variable. Create a .env file in the root directory with the following content:
```
UNSTRUCTURED_API_KEY="YOUR_KEY"
```
Refer to .env.template for the configurable environment variables.

You can now run the server using one of the following methods:

Using Editable Package Installation

Install as an editable package:

uvx pip install -e .

Update your Claude Desktop config:

{
  "mcpServers": {
    "UNS_MCP": {
      "command": "uvx",
      "args": [
        "uns_mcp"
      ]
    }
  }
}

Note: Remember to point to the uvx executable in environment where you installed the package

Using SSE Server Protocol

Note: Not supported by Claude Desktop.

For SSE protocol, you can debug more easily by decoupling the client and server:

Start the server in one terminal:

uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080
# or
make sse-server

Test the server using a local client in another terminal:

uv run python minimal_client/client.py "http://127.0.0.1:8080/sse"
# or
make sse-client

Note: To stop the services, use Ctrl+C on the client first, then the server.

Using Stdio Server Protocol

Configure Claude Desktop to use stdio:

{
  "mcpServers": {
    "UNS_MCP": {
      "command": "ABSOLUTE/PATH/TO/.local/bin/uv",
      "args": [
        "--directory",
        "ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp",
        "run",
        "server.py"
      ]
    }
  }
}

Alternatively, run the local client:

uv run python minimal_client/client.py uns_mcp/server.py

Additional Local Client Configuration

Configure the minimal client using environmental variables:

LOG_LEVEL="ERROR": Set to suppress debug outputs from the LLM, displaying clear messages for users.
CONFIRM_TOOL_USE='false': Disable tool use confirmation before execution. Use with caution, especially during development, as LLM may execute expensive workflows or delete data.

Debugging tools

Anthropic provides MCP Inspector tool to debug/test your MCP server. Run the following command to spin up a debugging UI. From there, you will be able to add environment variables (pointing to your local env) on the left pane. Include your personal API key there as env var. Go to tools, you can test out the capabilities you add to the MCP server.

mcp dev uns_mcp/server.py

If you need to log request call parameters to UnstructuredClient, set the environment variable DEBUG_API_REQUESTS=false.
The logs are stored in a file with the format unstructured-client-{date}.log, which can be examined to debug request call parameters to UnstructuredClient functions.

Add terminal access to minimal client

We are going to use @wonderwhy-er/desktop-commander to add terminal access to the minimal client. It is built on the MCP Filesystem Server. Be careful, as the client (also LLM) now has access to private files.

Execute the following command to install the package:

npx @wonderwhy-er/desktop-commander setup

Then start client with extra parameter:

uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" "@wonderwhy-er/desktop-commander"
# or
make sse-client-terminal

Using subset of tools

If your client supports using only subset of tools here are the list of things you should be aware:

update_workflow tool has to be loaded in the context together with create_workflow tool, because it contains detailed description on how to create and configure custom node.

Known issues

update_workflow - needs to have in context the configuration of the workflow it is updating either by providing it by the user or by calling get_workflow_info tool, as this tool doesn’t work as patch applier, it fully replaces the workflow config.

CHANGELOG.md

Any new developed features/fixes/enhancements will be added to CHANGELOG.md. 0.x.x-dev pre-release format is preferred before we bump to a stable version.

Troubleshooting

If you encounter issues with Error: spawn <command> ENOENT it means <command> is not installed or visible in your PATH:
- Make sure to install it and add it to your PATH.
- or provide absolute path to the command in the command field of your config. So for example replace python with /opt/miniconda3/bin/python

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

create_s3_source

Create an S3 source connector. Args: name: A unique name for this connector remote_url: The S3 URI to the bucket or folder (e.g., s3://my-bucket/) recursive: Whether to access subfolders within the bucket Returns: String containing the created source connector information

update_s3_source

Update an S3 source connector. Args: source_id: ID of the source connector to update remote_url: The S3 URI to the bucket or folder recursive: Whether to access subfolders within the bucket Returns: String containing the updated source connector information

delete_s3_source

Delete an S3 source connector. Args: source_id: ID of the source connector to delete Returns: String containing the result of the deletion

create_azure_source

Create an Azure source connector. Args: name: A unique name for this connector remote_url: The Azure Storage remote URL, with the format az://<container-name>/<path/to/file/or/folder/in/container/as/needed> recursive: Whether to access subfolders within the bucket Returns: String containing the created source connector information

update_azure_source

Update an azure source connector. Args: source_id: ID of the source connector to update remote_url: The Azure Storage remote URL, with the format az://<container-name>/<path/to/file/or/folder/in/container/as/needed> recursive: Whether to access subfolders within the bucket Returns: String containing the updated source connector information

delete_azure_source

Delete an azure source connector. Args: source_id: ID of the source connector to delete Returns: String containing the result of the deletion

create_gdrive_source

Create a gdrive source connector. Args: name: A unique name for this connector remote_url: The gdrive URI to the bucket or folder (e.g., gdrive://my-bucket/) recursive: Whether to access subfolders within the bucket Returns: String containing the created source connector information

update_gdrive_source

Update an gdrive source connector. Args: source_id: ID of the source connector to update remote_url: The gdrive URI to the bucket or folder recursive: Whether to access subfolders within the bucket Returns: String containing the updated source connector information

delete_gdrive_source

Delete an gdrive source connector. Args: source_id: ID of the source connector to delete Returns: String containing the result of the deletion

create_onedrive_source

Create a OneDrive source connector. Args: name: A unique name for this connector path: The path to the target folder in the OneDrive account, starting with the account’s root folder user_pname: The User Principal Name (UPN) for the OneDrive user account in Entra ID. This is typically the user’s email address. recursive: Whether to access subfolders authority_url: The authentication token provider URL for the Entra ID app registration. The default is https://login.microsoftonline.com. Returns: String containing the created source connector information

update_onedrive_source

Update a OneDrive source connector. Args: source_id: ID of the source connector to update path: The path to the target folder in the OneDrive account, starting with the account’s root folder user_pname: The User Principal Name (UPN) for the OneDrive user account in Entra ID. This is typically the user’s email address. recursive: Whether to access subfolders authority_url: The authentication token provider URL for the Entra ID app registration. The default is https://login.microsoftonline.com. tenant: The directory (tenant) ID of the Entra ID app registration. client_id: The application (client) ID of the Microsoft Entra ID app registration that has access to the OneDrive account. Returns: String containing the updated source connector information

delete_onedrive_source

Delete a OneDrive source connector. Args: source_id: ID of the source connector to delete Returns: String containing the result of the deletion

create_s3_destination

Create an S3 destination connector. Args: name: A unique name for this connector remote_url: The S3 URI to the bucket or folder key: The AWS access key ID secret: The AWS secret access key token: The AWS STS session token for temporary access (optional) endpoint_url: Custom URL if connecting to a non-AWS S3 bucket Returns: String containing the created destination connector information

update_s3_destination

Update an S3 destination connector. Args: destination_id: ID of the destination connector to update remote_url: The S3 URI to the bucket or folder Returns: String containing the updated destination connector information

delete_s3_destination

Delete an S3 destination connector. Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion

create_weaviate_destination

Create an weaviate vector database destination connector. Args: cluster_url: URL of the weaviate cluster collection : Name of the collection to use in the weaviate cluster Note: The collection is a table in the weaviate cluster. In platform, there are dedicated code to generate collection for users here, due to the simplicity of the server, we are not generating it for users. Returns: String containing the created destination connector information

update_weaviate_destination

Update an weaviate destination connector. Args: destination_id: ID of the destination connector to update cluster_url (optional): URL of the weaviate cluster collection (optional): Name of the collection(like a file) to use in the weaviate cluster Returns: String containing the updated destination connector information

delete_weaviate_destination

Delete an weaviate destination connector. Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion

create_astradb_destination

Create an AstraDB destination connector. Args: name: A unique name for this connector collection_name: The name of the collection to use keyspace: The AstraDB keyspace batch_size: The batch size for inserting documents, must be positive (default: 20) Note: A collection in AstraDB is a schemaless document store optimized for NoSQL workloads, equivalent to a table in traditional databases. A keyspace is the top-level namespace in AstraDB that groups multiple collections. We require the users to create their own collection and keyspace before creating the connector. Returns: String containing the created destination connector information

update_astradb_destination

Update an AstraDB destination connector. Args: destination_id: ID of the destination connector to update collection_name: The name of the collection to use (optional) keyspace: The AstraDB keyspace (optional) batch_size: The batch size for inserting documents (optional) Note: We require the users to create their own collection and keyspace before creating the connector. Returns: String containing the updated destination connector information

delete_astradb_destination

Delete an AstraDB destination connector. Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion

create_neo4j_destination

Create an neo4j destination connector. Args: name: A unique name for this connector database: The neo4j database, e.g. "neo4j" uri: The neo4j URI, e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io username: The neo4j username Returns: String containing the created destination connector information

update_neo4j_destination

Update an neo4j destination connector. Args: destination_id: ID of the destination connector to update database: The neo4j database, e.g. "neo4j" uri: The neo4j URI, e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io username: The neo4j username Returns: String containing the updated destination connector information

delete_neo4j_destination

Delete an neo4j destination connector. Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion

create_mongodb_destination

Create an MongoDB destination connector. Args: name: A unique name for this connector database: The name of the database to connect to. collection: The name of the target MongoDB collection Returns: String containing the created destination connector information

update_mongodb_destination

Update an MongoDB destination connector. Args: destination_id: ID of the destination connector to update database: The name of the database to connect to. collection: The name of the target MongoDB collection Returns: String containing the updated destination connector information

delete_mongodb_destination

Delete an MongoDB destination connector. Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion

create_databricks_volumes_destination

Create an databricks volume destination connector. Args: name: A unique name for this connector catalog: Name of the catalog in the Databricks Unity Catalog service for the workspace. host: The Databricks host URL for the Databricks workspace. volume: Name of the volume associated with the schema. schema: Name of the schema associated with the volume. The default value is "default". volume_path: Any target folder path within the volume, starting from the root of the volume. Returns: String containing the created destination connector information

update_databricks_volumes_destination

Update an databricks volumes destination connector. Args: destination_id: ID of the destination connector to update catalog: Name of the catalog to update in the Databricks Unity Catalog service for the workspace. host: The Databricks host URL for the Databricks workspace to update. volume: Name of the volume associated with the schema to update. schema: Name of the schema associated with the volume to update. The default value is "default". volume_path: Any target folder path within the volume to update, starting from the root of the volume. Returns: String containing the updated destination connector information

delete_databricks_volumes_destination

Delete an databricks volumes destination connector. Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion

create_databricks_delta_table_destination

Create an databricks volume destination connector. Args: name: A unique name for this connector catalog: Name of the catalog in the Databricks Unity Catalog service for the workspace. database: The name of the schema (formerly known as a database) in Unity Catalog for the target table http_path: The cluster’s or SQL warehouse’s HTTP Path value server_hostname: The Databricks cluster’s or SQL warehouse’s Server Hostname value table_name: The name of the table in the schema volume: Name of the volume associated with the schema. schema: Name of the schema associated with the volume. The default value is "default". volume_path: Any target folder path within the volume, starting from the root of the volume. Returns: String containing the created destination connector information

update_databricks_delta_table_destination

Update an databricks volumes destination connector. Args: destination_id: ID of the destination connector to update database: The name of the schema (formerly known as a database) in Unity Catalog for the target table http_path: The cluster’s or SQL warehouse’s HTTP Path value server_hostname: The Databricks cluster’s or SQL warehouse’s Server Hostname value volume_path: Any target folder path within the volume to update, starting from the root of the volume. Returns: String containing the updated destination connector information

delete_databricks_delta_table_destination

Delete an databricks volumes destination connector. Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion

invoke_firecrawl_crawlhtml

Start an asynchronous web crawl job using Firecrawl to retrieve HTML content. Args: url: URL to crawl s3_uri: S3 URI where results will be uploaded limit: Maximum number of pages to crawl (default: 100) Returns: Dictionary with crawl job information including the job ID

check_crawlhtml_status

Check the status of an existing Firecrawl HTML crawl job. Args: crawl_id: ID of the crawl job to check Returns: Dictionary containing the current status of the crawl job

invoke_firecrawl_llmtxt

Start an asynchronous llmfull.txt generation job using Firecrawl. This file is a standardized markdown file containing information to help LLMs use a website at inference time. The llmstxt endpoint leverages Firecrawl to crawl your website and extracts data using gpt-4o-mini Args: url: URL to crawl s3_uri: S3 URI where results will be uploaded max_urls: Maximum number of pages to crawl (1-100, default: 10) Returns: Dictionary with job information including the job ID

check_llmtxt_status

Check the status of an existing llmfull.txt generation job. Args: job_id: ID of the llmfull.txt generation job to check Returns: Dictionary containing the current status of the job and text content if completed

cancel_crawlhtml_job

Cancel an in-progress Firecrawl HTML crawl job. Args: crawl_id: ID of the crawl job to cancel Returns: Dictionary containing the result of the cancellation

list_sources

List available sources from the Unstructured API. Args: source_type: Optional source connector type to filter by Returns: String containing the list of sources

get_source_info

Get detailed information about a specific source connector. Args: source_id: ID of the source connector to get information for, should be valid UUID Returns: String containing the source connector information

list_destinations

List available destinations from the Unstructured API. Args: destination_type: Optional destination connector type to filter by Returns: String containing the list of destinations

get_destination_info

Get detailed information about a specific destination connector. Args: destination_id: ID of the destination connector to get information for Returns: String containing the destination connector information

list_workflows

List workflows from the Unstructured API. Args: destination_id: Optional destination connector ID to filter by source_id: Optional source connector ID to filter by status: Optional workflow status to filter by Returns: String containing the list of workflows

get_workflow_info

Get detailed information about a specific workflow. Args: workflow_id: ID of the workflow to get information for Returns: String containing the workflow information

create_workflow

Create a new workflow. Args: workflow_config: A Typed Dictionary containing required fields (destination_id - should be a valid UUID, name, source_id - should be a valid UUID, workflow_type) and non-required fields (schedule, and workflow_nodes). Note workflow_nodes is only enabled when workflow_type is `custom` and is a list of WorkflowNodeTypedDict: partition, prompter,chunk, embed Below is an example of a partition workflow node: { "name": "vlm-partition", "type": "partition", "sub_type": "vlm", "settings": { "provider": "your favorite provider", "model": "your favorite model" } } Returns: String containing the created workflow information Custom workflow DAG nodes - If WorkflowType is set to custom, you must also specify the settings for the workflow’s directed acyclic graph (DAG) nodes. These nodes’ settings are specified in the workflow_nodes array. - A Source node is automatically created when you specify the source_id value outside of the workflow_nodes array. - A Destination node is automatically created when you specify the destination_id value outside of the workflow_nodes array. - You can specify Partitioner, Chunker, Prompter, and Embedder nodes. - The order of the nodes in the workflow_nodes array will be the same order that these nodes appear in the DAG, with the first node in the array added directly after the Source node. The Destination node follows the last node in the array. - Be sure to specify nodes in the allowed order. The following DAG placements are all allowed: - Source -> Partitioner -> Destination, - Source -> Partitioner -> Chunker -> Destination, - Source -> Partitioner -> Chunker -> Embedder -> Destination, - Source -> Partitioner -> Prompter -> Chunker -> Destination, - Source -> Partitioner -> Prompter -> Chunker -> Embedder -> Destination Partitioner node A Partitioner node has a type of partition and a subtype of auto, vlm, hi_res, or fast. Examples: - auto strategy: { "name": "Partitioner", "type": "partition", "subtype": "vlm", "settings": { "provider": "anthropic", (required) "model": "claude-3-5-sonnet-20241022", (required) "output_format": "text/html", "user_prompt": null, "format_html": true, "unique_element_ids": true, "is_dynamic": true, "allow_fast": true } } - vlm strategy: Allowed values are provider and model. Below are examples: - "provider": "anthropic" "model": "claude-3-5-sonnet-20241022", - "provider": "openai" "model": "gpt-4o" - hi_res strategy: { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "hi_res", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "<element-name>", "<element-name>" ], "xml_keep_tags": <true|false>, "encoding": "<encoding>", "ocr_languages": [ "<language>", "<language>" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } } - fast strategy { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "fast", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "<element-name>", "<element-name>" ], "xml_keep_tags": <true|false>, "encoding": "<encoding>", "ocr_languages": [ "<language-code>", "<language-code>" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } } Chunker node A Chunker node has a type of chunk and subtype of chunk_by_character or chunk_by_title. - chunk_by_character { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_character", "settings": { "include_orig_elements": <true|false>, "new_after_n_chars": <new-after-n-chars>, (required, if not provided set same as max_characters) "max_characters": <max-characters>, (required) "overlap": <overlap>, (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } } - chunk_by_title { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_title", "settings": { "multipage_sections": <true|false>, "combine_text_under_n_chars": <combine-text-under-n-chars>, "include_orig_elements": <true|false>, "new_after_n_chars": <new-after-n-chars>, (required, if not provided set same as max_characters) "max_characters": <max-characters>, (required) "overlap": <overlap>, (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } } Prompter node An Prompter node has a type of prompter and subtype of: - openai_image_description, - anthropic_image_description, - bedrock_image_description, - vertexai_image_description, - openai_table_description, - anthropic_table_description, - bedrock_table_description, - vertexai_table_description, - openai_table2html, - openai_ner Example: { "name": "Prompter", "type": "prompter", "subtype": "<subtype>", "settings": {} } Embedder node An Embedder node has a type of embed Allowed values for subtype and model_name include: - "subtype": "azure_openai" - "model_name": "text-embedding-3-small" - "model_name": "text-embedding-3-large" - "model_name": "text-embedding-ada-002" - "subtype": "bedrock" - "model_name": "amazon.titan-embed-text-v2:0" - "model_name": "amazon.titan-embed-text-v1" - "model_name": "amazon.titan-embed-image-v1" - "model_name": "cohere.embed-english-v3" - "model_name": "cohere.embed-multilingual-v3" - "subtype": "togetherai": - "model_name": "togethercomputer/m2-bert-80M-2k-retrieval" - "model_name": "togethercomputer/m2-bert-80M-8k-retrieval" - "model_name": "togethercomputer/m2-bert-80M-32k-retrieval" Example: { "name": "Embedder", "type": "embed", "subtype": "<subtype>", "settings": { "model_name": "<model-name>" } }

run_workflow

Run a specific workflow. Args: workflow_id: ID of the workflow to run Returns: String containing the response from the workflow execution

update_workflow

Update an existing workflow. Args: workflow_id: ID of the workflow to update workflow_config: A Typed Dictionary containing required fields (destination_id, name, source_id, workflow_type) and non-required fields (schedule, and workflow_nodes) Returns: String containing the updated workflow information Custom workflow DAG nodes - If WorkflowType is set to custom, you must also specify the settings for the workflow’s directed acyclic graph (DAG) nodes. These nodes’ settings are specified in the workflow_nodes array. - A Source node is automatically created when you specify the source_id value outside of the workflow_nodes array. - A Destination node is automatically created when you specify the destination_id value outside of the workflow_nodes array. - You can specify Partitioner, Chunker, Prompter, and Embedder nodes. - The order of the nodes in the workflow_nodes array will be the same order that these nodes appear in the DAG, with the first node in the array added directly after the Source node. The Destination node follows the last node in the array. - Be sure to specify nodes in the allowed order. The following DAG placements are all allowed: - Source -> Partitioner -> Destination, - Source -> Partitioner -> Chunker -> Destination, - Source -> Partitioner -> Chunker -> Embedder -> Destination, - Source -> Partitioner -> Prompter -> Chunker -> Destination, - Source -> Partitioner -> Prompter -> Chunker -> Embedder -> Destination Partitioner node A Partitioner node has a type of partition and a subtype of auto, vlm, hi_res, or fast. Examples: - auto strategy: { "name": "Partitioner", "type": "partition", "subtype": "vlm", "settings": { "provider": "anthropic", (required) "model": "claude-3-5-sonnet-20241022", (required) "output_format": "text/html", "user_prompt": null, "format_html": true, "unique_element_ids": true, "is_dynamic": true, "allow_fast": true } } - vlm strategy: Allowed values are provider and model. Below are examples: - "provider": "anthropic" "model": "claude-3-5-sonnet-20241022", - "provider": "openai" "model": "gpt-4o" - hi_res strategy: { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "hi_res", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "<element-name>", "<element-name>" ], "xml_keep_tags": <true|false>, "encoding": "<encoding>", "ocr_languages": [ "<language>", "<language>" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } } - fast strategy { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "fast", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "<element-name>", "<element-name>" ], "xml_keep_tags": <true|false>, "encoding": "<encoding>", "ocr_languages": [ "<language-code>", "<language-code>" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } } Chunker node A Chunker node has a type of chunk and subtype of chunk_by_character or chunk_by_title. - chunk_by_character { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_character", "settings": { "include_orig_elements": <true|false>, "new_after_n_chars": <new-after-n-chars>, (required, if not provided set same as max_characters) "max_characters": <max-characters>, (required) "overlap": <overlap>, (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } } - chunk_by_title { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_title", "settings": { "multipage_sections": <true|false>, "combine_text_under_n_chars": <combine-text-under-n-chars>, "include_orig_elements": <true|false>, "new_after_n_chars": <new-after-n-chars>, (required, if not provided set same as max_characters) "max_characters": <max-characters>, (required) "overlap": <overlap>, (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } } Prompter node An Prompter node has a type of prompter and subtype of: - openai_image_description, - anthropic_image_description, - bedrock_image_description, - vertexai_image_description, - openai_table_description, - anthropic_table_description, - bedrock_table_description, - vertexai_table_description, - openai_table2html, - openai_ner Example: { "name": "Prompter", "type": "prompter", "subtype": "<subtype>", "settings": {} } Embedder node An Embedder node has a type of embed Allowed values for subtype and model_name include: - "subtype": "azure_openai" - "model_name": "text-embedding-3-small" - "model_name": "text-embedding-3-large" - "model_name": "text-embedding-ada-002" - "subtype": "bedrock" - "model_name": "amazon.titan-embed-text-v2:0" - "model_name": "amazon.titan-embed-text-v1" - "model_name": "amazon.titan-embed-image-v1" - "model_name": "cohere.embed-english-v3" - "model_name": "cohere.embed-multilingual-v3" - "subtype": "togetherai": - "model_name": "togethercomputer/m2-bert-80M-2k-retrieval" - "model_name": "togethercomputer/m2-bert-80M-8k-retrieval" - "model_name": "togethercomputer/m2-bert-80M-32k-retrieval" Example: { "name": "Embedder", "type": "embed", "subtype": "<subtype>", "settings": { "model_name": "<model-name>" } }

delete_workflow

Delete a specific workflow. Args: workflow_id: ID of the workflow to delete Returns: String containing the response from the workflow deletion

list_jobs

List jobs via the Unstructured API. Args: workflow_id: Optional workflow ID to filter by status: Optional job status to filter by Returns: String containing the list of jobs

get_job_info

Get detailed information about a specific job. Args: job_id: ID of the job to get information for Returns: String containing the job information

cancel_job

Delete a specific job. Args: job_id: ID of the job to cancel Returns: String containing the response from the job cancellation

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers