Skip to main content

MCP Integration

The Model Context Protocol (MCP) enables GPT Researcher to connect with diverse data sources and tools through a standardized interface. GPT Researcher features an intelligent two-stage MCP approach that automatically selects the best tools and generates contextual research, powered by LangChain's MCP adapters for seamless integration.

How MCP Works in GPT Researcher

GPT Researcher uses a two stage intelligent approach for MCP integration:

  1. Stage 1: Smart Tool Selection - LLM analyzes your query and available MCP servers to select the most relevant tools
  2. Stage 2: Contextual Research - LLM uses selected tools with dynamically generated, query specific arguments

This happens automatically behind the scenes, optimized for the best balance of speed, cost, and research quality. The integration leverages the langchain-mcp-adapters library, ensuring compatibility with the growing ecosystem of MCP tool servers.

MCP Research Flow

The following diagram illustrates the hybrid strategy using RETRIEVER=tavily,mcp as an example:

Screenshot 2025-06-06 at 14 38 04

Flow Breakdown:

  1. Configuration: Set RETRIEVER environment variable to enable MCP
  2. Strategy Selection: Choose pure MCP or hybrid approach
  3. Initialization: GPT Researcher loads your mcp_configs
  4. Stage 1: LLM intelligently selects the most relevant tools from available MCP servers
  5. Stage 2: LLM executes research using selected tools with query-specific arguments
  6. Hybrid Processing: If using hybrid strategy, combines MCP results with web search
  7. Report Generation: Synthesizes all findings into a comprehensive report

Prerequisites

MCP support is included with GPT Researcher installation:

pip install gpt-researcher
# All MCP dependencies are included automatically

Essential Configuration: Enabling MCP

Important: To use MCP with GPT Researcher, you must set the RETRIEVER environment variable:

Pure MCP Research

export RETRIEVER=mcp
# Combines web search with MCP for comprehensive research
export RETRIEVER=tavily,mcp

# Alternative hybrid combinations
export RETRIEVER=tavily,mcp
export RETRIEVER=google,mcp,arxiv

Quick Start

from gpt_researcher import GPTResearcher
import os

# Set retriever to enable MCP
os.environ["RETRIEVER"] = "tavily,mcp" # Hybrid approach

# Simple MCP configuration - works automatically
researcher = GPTResearcher(
query="How does React's useState hook work?",
mcp_configs=[
{
"name": "github_api"
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")}
}
]
)

context = await researcher.conduct_research()
report = await researcher.write_report()

Configuration Structure

Each MCP configuration dictionary supports these keys:

KeyDescriptionExampleRequired
nameIdentifier for the MCP server"github"Yes
commandCommand to start the server"python"Yes*
argsArguments for the server command["-m", "my_server"]Yes*
envEnvironment variables for the server{"API_KEY": "key"}No
connection_urlURL for remote connections"wss://api.example.com"Yes**
connection_typeConnection type (auto-detected)"websocket"No
connection_tokenAuthentication token"bearer_token"No

Local servers: Require name, command, and args
Remote servers: Require name and connection_url

Examples

News and Web Research with Tavily

Perfect for current events, market research, and general information gathering:

from gpt_researcher import GPTResearcher
import os

# Enable hybrid research: web search + MCP
os.environ["RETRIEVER"] = "tavily,mcp"

researcher = GPTResearcher(
query="What are the latest updates in the NBA playoffs?",
mcp_configs=[
{
"name": "tavily",
"command": "npx",
"args": ["-y", "tavily-mcp@0.1.2"],
"env": {
"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")
}
}
]
)

context = await researcher.conduct_research()
report = await researcher.write_report()

Code Research with GitHub

Ideal for technical documentation, code examples, and software development research:

# Pure MCP research for technical queries
os.environ["RETRIEVER"] = "mcp"

researcher = GPTResearcher(
query="What are the key features and implementation of React's useState hook? How has it evolved in recent versions?",
mcp_configs=[
{
"name": "github",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": os.getenv("GITHUB_PERSONAL_ACCESS_TOKEN")
}
}
]
)

Academic Research with Hybrid Strategy

Combining academic papers with MCP tools:

# Academic + MCP hybrid approach
os.environ["RETRIEVER"] = "arxiv,semantic_scholar,mcp"

researcher = GPTResearcher(
query="Analyze the latest developments in quantum error correction algorithms",
mcp_configs=[
{
"name": "quantum_research",
"command": "python",
"args": ["quantum_mcp_server.py"],
"env": {
"ARXIV_API_KEY": os.getenv("ARXIV_API_KEY"),
"RESEARCH_DB_PATH": "/path/to/quantum_papers.db"
}
}
]
)

Multi-Server Research: Comprehensive Market Analysis

Here's a real-world example combining multiple MCP servers for comprehensive business intelligence:

from gpt_researcher import GPTResearcher
import os

# Multi-retriever hybrid strategy for comprehensive coverage
os.environ["RETRIEVER"] = "tavily,google,mcp"

# Multi-domain research combining news, code, and financial data
researcher = GPTResearcher(
query="Analyze Tesla's Q4 2024 performance, including stock trends, recent innovations, and market sentiment",
mcp_configs=[
# Financial data and stock analysis
{
"name": "financial_data",
"command": "python",
"args": ["financial_mcp_server.py"],
"env": {
"ALPHA_VANTAGE_KEY": os.getenv("ALPHA_VANTAGE_KEY"),
"YAHOO_FINANCE_KEY": os.getenv("YAHOO_FINANCE_KEY")
}
},
# News and market sentiment
{
"name": "news_research",
"command": "npx",
"args": ["-y", "tavily-mcp@0.1.2"],
"env": {
"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")
}
},
# Technical innovations and patents
{
"name": "github_research",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": os.getenv("GITHUB_PERSONAL_ACCESS_TOKEN")
}
},
# Academic research and papers
{
"name": "academic_papers",
"command": "python",
"args": ["arxiv_mcp_server.py"],
"env": {
"ARXIV_API_KEY": os.getenv("ARXIV_API_KEY")
}
}
]
)

# GPT Researcher automatically orchestrates all servers
context = await researcher.conduct_research()
report = await researcher.write_report()

print(f"Generated comprehensive report using {len(researcher.mcp_configs)} MCP servers")
print(f"Research cost: ${researcher.get_costs():.4f}")

This example demonstrates how GPT Researcher intelligently:

  • Selects relevant tools from each server based on the query
  • Coordinates multi-domain research across financial, news, technical, and academic sources
  • Synthesizes information from different domains into a cohesive analysis
  • Optimizes performance by using only the most relevant tools from each server

E-commerce Competitive Analysis

Another practical multi-server scenario for business research:

# Comprehensive hybrid strategy
os.environ["RETRIEVER"] = "tavily,bing,exa,mcp"

researcher = GPTResearcher(
query="Comprehensive competitive analysis of sustainable fashion brands in 2024",
mcp_configs=[
# Web trends and consumer sentiment
{
"name": "web_trends",
"command": "npx",
"args": ["-y", "tavily-mcp@0.1.2"],
"env": {"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")}
},
# Social media analytics
{
"name": "social_analytics",
"command": "python",
"args": ["social_mcp_server.py"],
"env": {
"TWITTER_BEARER_TOKEN": os.getenv("TWITTER_BEARER_TOKEN"),
"INSTAGRAM_ACCESS_TOKEN": os.getenv("INSTAGRAM_ACCESS_TOKEN")
}
},
# Patent and innovation research
{
"name": "patent_research",
"command": "python",
"args": ["patent_mcp_server.py"],
"env": {"USPTO_API_KEY": os.getenv("USPTO_API_KEY")}
}
]
)

Remote MCP Server

# Enable MCP with web search fallback
os.environ["RETRIEVER"] = "tavily,mcp"

researcher = GPTResearcher(
query="Latest AI research papers on transformer architectures",
mcp_configs=[
{
"name": "arxiv_api",
"connection_url": "wss://mcp.arxiv.org/ws", # Auto-detects WebSocket
"connection_token": os.getenv("ARXIV_TOKEN"),
}
]
)

MCP works seamlessly alongside traditional web search for comprehensive research:

from gpt_researcher import GPTResearcher

# Hybrid strategy: combines web search with MCP automatically
os.environ["RETRIEVER"] = "tavily,mcp"

researcher = GPTResearcher(
query="Impact of AI on software development practices",
# MCP will be used alongside web search automatically
mcp_configs=[
{
"name": "github",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")}
}
]
)

# This uses both MCP (for code examples) and web search (for articles/news)
context = await researcher.conduct_research()

Complete Working Example

Here's a production-ready example demonstrating MCP integration:

import asyncio
import os
from gpt_researcher import GPTResearcher

async def main():
# Set up environment
os.environ["GITHUB_PERSONAL_ACCESS_TOKEN"] = "your_github_token"
os.environ["OPENAI_API_KEY"] = "your_openai_key"
os.environ["TAVILY_API_KEY"] = "your_tavily_key"

# Enable hybrid research strategy
os.environ["RETRIEVER"] = "tavily,mcp"

# Create researcher with multi-server MCP configuration
researcher = GPTResearcher(
query="How are leading tech companies implementing AI safety measures in 2024?",
mcp_configs=[
# Code repositories and technical implementations
{
"name": "github",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": os.getenv("GITHUB_PERSONAL_ACCESS_TOKEN")
}
},
# Current news and industry reports
{
"name": "tavily",
"command": "npx",
"args": ["-y", "tavily-mcp@0.1.2"],
"env": {
"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")
}
}
],
verbose=True # See the intelligent research process
)

print("🔍 Starting multi-source research...")

# Intelligent tool selection and research happens automatically
context = await researcher.conduct_research()

print("📝 Generating comprehensive report...")
report = await researcher.write_report()

print("✅ Research complete!")
print(f"📊 Report length: {len(report)} characters")
print(f"💰 Total cost: ${researcher.get_costs():.4f}")

# Save the report
with open("ai_safety_research.md", "w") as f:
f.write(report)

if __name__ == "__main__":
asyncio.run(main())

Retriever Strategy Comparison

StrategyUse CasePerformanceCoverage
RETRIEVER=mcpSpecialized domains, structured data⚡ Fast🎯 Focused
RETRIEVER=tavily,mcpGeneral research with specialized tools⚖️ Balanced🌐 Comprehensive
RETRIEVER=google,arxiv,tavily,mcpMaximum coverage, redundancy🐌 Slower🌍 Extensive
RETRIEVER=arxiv,mcpAcademic + specialized research⚡ Fast🎓 Academic-focused

Advanced Configuration

Research Strategies

For advanced users who need more control over how MCP research is executed:

StrategyDescriptionUse CasePerformance
"fast"Run MCP once with main query (default)Most research needs⚡ Optimal
"deep"Run MCP for all sub-queriesComprehensive analysis🔍 Thorough
"disabled"Skip MCP entirelyWeb-only research⚡ Fastest
# Default behavior (recommended for most use cases)
os.environ["RETRIEVER"] = "tavily,mcp"
researcher = GPTResearcher(
query="Analyze Tesla's performance",
mcp_configs=[...]
)

# For comprehensive analysis (advanced)
os.environ["MCP_STRATEGY"] = "deep"
researcher = GPTResearcher(
query="Comprehensive renewable energy analysis",
mcp_configs=[...]
)

# For web-only research (advanced)
os.environ["RETRIEVER"] = "tavily" # Excludes MCP entirely

Environment Variable Configuration

Set global defaults using environment variables:

# Essential: Enable MCP
export RETRIEVER=tavily,mcp

# Advanced: Set MCP strategy
export MCP_STRATEGY=deep

# Or in .env file
RETRIEVER=tavily,mcp
MCP_STRATEGY=fast
MCP_AUTO_TOOL_SELECTION=true

Custom Tool Selection

Enable automatic tool selection for servers with multiple tools:

# Environment variable approach
os.environ["MCP_AUTO_TOOL_SELECTION"] = "true"
os.environ["RETRIEVER"] = "mcp"

researcher = GPTResearcher(
query="your query",
mcp_configs=[
{
"command": "python",
"args": ["multi_tool_server.py"]
# AI will choose the best tool automatically
}
]
)

Connection Type Detection

GPT Researcher automatically detects connection types:

# WebSocket (detected from wss:// prefix)
{"connection_url": "wss://api.example.com/mcp"}

# HTTP (detected from https:// prefix)
{"connection_url": "https://api.example.com/mcp"}

# Stdio (default when no URL provided)
{"command": "python", "args": ["server.py"]}

Troubleshooting

Common Issues

"No retriever specified" or "MCP not working"

  • Solution: Set RETRIEVER=mcp or RETRIEVER=tavily,mcp
  • Verify environment variable is set: echo $RETRIEVER

"Invalid retriever(s) found"

  • Check available retrievers: tavily, mcp, google, bing, arxiv, etc.
  • Ensure no typos in retriever names

"No MCP server configurations found"

  • Ensure mcp_configs is a list of dictionaries
  • Verify at least one configuration is provided
  • Check configuration format matches examples

"MCP server connection failed"

  • Verify server command and arguments
  • Check environment variables are set correctly
  • Test the MCP server independently
  • Ensure required dependencies are installed

"No tools available from MCP server"

  • Verify the server exposes tools correctly
  • Check server startup logs for errors
  • Try enabling MCP_AUTO_TOOL_SELECTION=true

"Tool execution failed"

  • Check authentication tokens and API keys
  • Verify tool arguments are valid
  • Review server logs for detailed errors
  • Enable debug logging for more information

Debug Mode

Enable detailed logging to diagnose issues:

import logging
logging.basicConfig(level=logging.DEBUG)

# Your research code here - will show detailed MCP operations

Testing Your Setup

Quick test to verify MCP configuration:

import os
from gpt_researcher import GPTResearcher

# Test retriever configuration
os.environ["RETRIEVER"] = "mcp"

# Test basic configuration
researcher = GPTResearcher(
query="test query",
mcp_configs=[
{
"name": "test",
"command": "echo",
"args": ["hello world"]
}
]
)

print(f"✅ RETRIEVER set to: {os.environ.get('RETRIEVER')}")
print(f"✅ MCP configs loaded: {len(researcher.mcp_configs)}")

Best Practices

  1. Always set the RETRIEVER environment variable - This is required for MCP functionality
  2. Use hybrid strategies (tavily,mcp) for comprehensive research
  3. Use descriptive server names for easier debugging
  4. Store sensitive data in environment variables
  5. Test MCP servers independently before integration
  6. Enable verbose mode during development
  7. Choose appropriate retriever combinations based on your research domain
  8. Let the default settings handle optimization for most use cases

For more examples and advanced use cases, check out the GPT Researcher examples repository. :-)