MCP Integration
The Model Context Protocol (MCP) enables GPT Researcher to connect with diverse data sources and tools through a standardized interface. GPT Researcher features an intelligent two-stage MCP approach that automatically selects the best tools and generates contextual research, powered by LangChain's MCP adapters for seamless integration.
How MCP Works in GPT Researcher
GPT Researcher uses a two stage intelligent approach for MCP integration:
- Stage 1: Smart Tool Selection - LLM analyzes your query and available MCP servers to select the most relevant tools
- Stage 2: Contextual Research - LLM uses selected tools with dynamically generated, query specific arguments
This happens automatically behind the scenes, optimized for the best balance of speed, cost, and research quality. The integration leverages the langchain-mcp-adapters library, ensuring compatibility with the growing ecosystem of MCP tool servers.
MCP Research Flow
The following diagram illustrates the hybrid strategy using RETRIEVER=tavily,mcp
as an example:

Flow Breakdown:
- Configuration: Set
RETRIEVER
environment variable to enable MCP - Strategy Selection: Choose pure MCP or hybrid approach
- Initialization: GPT Researcher loads your
mcp_configs
- Stage 1: LLM intelligently selects the most relevant tools from available MCP servers
- Stage 2: LLM executes research using selected tools with query-specific arguments
- Hybrid Processing: If using hybrid strategy, combines MCP results with web search
- Report Generation: Synthesizes all findings into a comprehensive report
Prerequisites
MCP support is included with GPT Researcher installation:
pip install gpt-researcher
# All MCP dependencies are included automatically
Essential Configuration: Enabling MCP
Important: To use MCP with GPT Researcher, you must set the RETRIEVER
environment variable:
Pure MCP Research
export RETRIEVER=mcp
Hybrid Strategy (Recommended)
# Combines web search with MCP for comprehensive research
export RETRIEVER=tavily,mcp
# Alternative hybrid combinations
export RETRIEVER=tavily,mcp
export RETRIEVER=google,mcp,arxiv
Quick Start
from gpt_researcher import GPTResearcher
import os
# Set retriever to enable MCP
os.environ["RETRIEVER"] = "tavily,mcp" # Hybrid approach
# Simple MCP configuration - works automatically
researcher = GPTResearcher(
query="How does React's useState hook work?",
mcp_configs=[
{
"name": "github_api"
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")}
}
]
)
context = await researcher.conduct_research()
report = await researcher.write_report()
Configuration Structure
Each MCP configuration dictionary supports these keys:
Key | Description | Example | Required |
---|---|---|---|
name | Identifier for the MCP server | "github" | Yes |
command | Command to start the server | "python" | Yes* |
args | Arguments for the server command | ["-m", "my_server"] | Yes* |
env | Environment variables for the server | {"API_KEY": "key"} | No |
connection_url | URL for remote connections | "wss://api.example.com" | Yes** |
connection_type | Connection type (auto-detected) | "websocket" | No |
connection_token | Authentication token | "bearer_token" | No |
Local servers: Require name
, command
, and args
Remote servers: Require name
and connection_url
Examples
News and Web Research with Tavily
Perfect for current events, market research, and general information gathering:
from gpt_researcher import GPTResearcher
import os
# Enable hybrid research: web search + MCP
os.environ["RETRIEVER"] = "tavily,mcp"
researcher = GPTResearcher(
query="What are the latest updates in the NBA playoffs?",
mcp_configs=[
{
"name": "tavily",
"command": "npx",
"args": ["-y", "tavily-mcp@0.1.2"],
"env": {
"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")
}
}
]
)
context = await researcher.conduct_research()
report = await researcher.write_report()
Code Research with GitHub
Ideal for technical documentation, code examples, and software development research:
# Pure MCP research for technical queries
os.environ["RETRIEVER"] = "mcp"
researcher = GPTResearcher(
query="What are the key features and implementation of React's useState hook? How has it evolved in recent versions?",
mcp_configs=[
{
"name": "github",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": os.getenv("GITHUB_PERSONAL_ACCESS_TOKEN")
}
}
]
)
Academic Research with Hybrid Strategy
Combining academic papers with MCP tools:
# Academic + MCP hybrid approach
os.environ["RETRIEVER"] = "arxiv,semantic_scholar,mcp"
researcher = GPTResearcher(
query="Analyze the latest developments in quantum error correction algorithms",
mcp_configs=[
{
"name": "quantum_research",
"command": "python",
"args": ["quantum_mcp_server.py"],
"env": {
"ARXIV_API_KEY": os.getenv("ARXIV_API_KEY"),
"RESEARCH_DB_PATH": "/path/to/quantum_papers.db"
}
}
]
)
Multi-Server Research: Comprehensive Market Analysis
Here's a real-world example combining multiple MCP servers for comprehensive business intelligence:
from gpt_researcher import GPTResearcher
import os
# Multi-retriever hybrid strategy for comprehensive coverage
os.environ["RETRIEVER"] = "tavily,google,mcp"
# Multi-domain research combining news, code, and financial data
researcher = GPTResearcher(
query="Analyze Tesla's Q4 2024 performance, including stock trends, recent innovations, and market sentiment",
mcp_configs=[
# Financial data and stock analysis
{
"name": "financial_data",
"command": "python",
"args": ["financial_mcp_server.py"],
"env": {
"ALPHA_VANTAGE_KEY": os.getenv("ALPHA_VANTAGE_KEY"),
"YAHOO_FINANCE_KEY": os.getenv("YAHOO_FINANCE_KEY")
}
},
# News and market sentiment
{
"name": "news_research",
"command": "npx",
"args": ["-y", "tavily-mcp@0.1.2"],
"env": {
"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")
}
},
# Technical innovations and patents
{
"name": "github_research",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": os.getenv("GITHUB_PERSONAL_ACCESS_TOKEN")
}
},
# Academic research and papers
{
"name": "academic_papers",
"command": "python",
"args": ["arxiv_mcp_server.py"],
"env": {
"ARXIV_API_KEY": os.getenv("ARXIV_API_KEY")
}
}
]
)
# GPT Researcher automatically orchestrates all servers
context = await researcher.conduct_research()
report = await researcher.write_report()
print(f"Generated comprehensive report using {len(researcher.mcp_configs)} MCP servers")
print(f"Research cost: ${researcher.get_costs():.4f}")
This example demonstrates how GPT Researcher intelligently:
- Selects relevant tools from each server based on the query
- Coordinates multi-domain research across financial, news, technical, and academic sources
- Synthesizes information from different domains into a cohesive analysis
- Optimizes performance by using only the most relevant tools from each server
E-commerce Competitive Analysis
Another practical multi-server scenario for business research:
# Comprehensive hybrid strategy
os.environ["RETRIEVER"] = "tavily,bing,exa,mcp"
researcher = GPTResearcher(
query="Comprehensive competitive analysis of sustainable fashion brands in 2024",
mcp_configs=[
# Web trends and consumer sentiment
{
"name": "web_trends",
"command": "npx",
"args": ["-y", "tavily-mcp@0.1.2"],
"env": {"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")}
},
# Social media analytics
{
"name": "social_analytics",
"command": "python",
"args": ["social_mcp_server.py"],
"env": {
"TWITTER_BEARER_TOKEN": os.getenv("TWITTER_BEARER_TOKEN"),
"INSTAGRAM_ACCESS_TOKEN": os.getenv("INSTAGRAM_ACCESS_TOKEN")
}
},
# Patent and innovation research
{
"name": "patent_research",
"command": "python",
"args": ["patent_mcp_server.py"],
"env": {"USPTO_API_KEY": os.getenv("USPTO_API_KEY")}
}
]
)
Remote MCP Server
# Enable MCP with web search fallback
os.environ["RETRIEVER"] = "tavily,mcp"
researcher = GPTResearcher(
query="Latest AI research papers on transformer architectures",
mcp_configs=[
{
"name": "arxiv_api",
"connection_url": "wss://mcp.arxiv.org/ws", # Auto-detects WebSocket
"connection_token": os.getenv("ARXIV_TOKEN"),
}
]
)
Combining MCP with Web Search
MCP works seamlessly alongside traditional web search for comprehensive research:
from gpt_researcher import GPTResearcher
# Hybrid strategy: combines web search with MCP automatically
os.environ["RETRIEVER"] = "tavily,mcp"
researcher = GPTResearcher(
query="Impact of AI on software development practices",
# MCP will be used alongside web search automatically
mcp_configs=[
{
"name": "github",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")}
}
]
)
# This uses both MCP (for code examples) and web search (for articles/news)
context = await researcher.conduct_research()
Complete Working Example
Here's a production-ready example demonstrating MCP integration:
import asyncio
import os
from gpt_researcher import GPTResearcher
async def main():
# Set up environment
os.environ["GITHUB_PERSONAL_ACCESS_TOKEN"] = "your_github_token"
os.environ["OPENAI_API_KEY"] = "your_openai_key"
os.environ["TAVILY_API_KEY"] = "your_tavily_key"
# Enable hybrid research strategy
os.environ["RETRIEVER"] = "tavily,mcp"
# Create researcher with multi-server MCP configuration
researcher = GPTResearcher(
query="How are leading tech companies implementing AI safety measures in 2024?",
mcp_configs=[
# Code repositories and technical implementations
{
"name": "github",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": os.getenv("GITHUB_PERSONAL_ACCESS_TOKEN")
}
},
# Current news and industry reports
{
"name": "tavily",
"command": "npx",
"args": ["-y", "tavily-mcp@0.1.2"],
"env": {
"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")
}
}
],
verbose=True # See the intelligent research process
)
print("🔍 Starting multi-source research...")
# Intelligent tool selection and research happens automatically
context = await researcher.conduct_research()
print("📝 Generating comprehensive report...")
report = await researcher.write_report()
print("✅ Research complete!")
print(f"📊 Report length: {len(report)} characters")
print(f"💰 Total cost: ${researcher.get_costs():.4f}")
# Save the report
with open("ai_safety_research.md", "w") as f:
f.write(report)
if __name__ == "__main__":
asyncio.run(main())
Retriever Strategy Comparison
Strategy | Use Case | Performance | Coverage |
---|---|---|---|
RETRIEVER=mcp | Specialized domains, structured data | ⚡ Fast | 🎯 Focused |
RETRIEVER=tavily,mcp | General research with specialized tools | ⚖️ Balanced | 🌐 Comprehensive |
RETRIEVER=google,arxiv,tavily,mcp | Maximum coverage, redundancy | 🐌 Slower | 🌍 Extensive |
RETRIEVER=arxiv,mcp | Academic + specialized research | ⚡ Fast | 🎓 Academic-focused |
Advanced Configuration
Research Strategies
For advanced users who need more control over how MCP research is executed:
Strategy | Description | Use Case | Performance |
---|---|---|---|
"fast" | Run MCP once with main query (default) | Most research needs | ⚡ Optimal |
"deep" | Run MCP for all sub-queries | Comprehensive analysis | 🔍 Thorough |
"disabled" | Skip MCP entirely | Web-only research | ⚡ Fastest |
# Default behavior (recommended for most use cases)
os.environ["RETRIEVER"] = "tavily,mcp"
researcher = GPTResearcher(
query="Analyze Tesla's performance",
mcp_configs=[...]
)
# For comprehensive analysis (advanced)
os.environ["MCP_STRATEGY"] = "deep"
researcher = GPTResearcher(
query="Comprehensive renewable energy analysis",
mcp_configs=[...]
)
# For web-only research (advanced)
os.environ["RETRIEVER"] = "tavily" # Excludes MCP entirely
Environment Variable Configuration
Set global defaults using environment variables:
# Essential: Enable MCP
export RETRIEVER=tavily,mcp
# Advanced: Set MCP strategy
export MCP_STRATEGY=deep
# Or in .env file
RETRIEVER=tavily,mcp
MCP_STRATEGY=fast
MCP_AUTO_TOOL_SELECTION=true
Custom Tool Selection
Enable automatic tool selection for servers with multiple tools:
# Environment variable approach
os.environ["MCP_AUTO_TOOL_SELECTION"] = "true"
os.environ["RETRIEVER"] = "mcp"
researcher = GPTResearcher(
query="your query",
mcp_configs=[
{
"command": "python",
"args": ["multi_tool_server.py"]
# AI will choose the best tool automatically
}
]
)
Connection Type Detection
GPT Researcher automatically detects connection types:
# WebSocket (detected from wss:// prefix)
{"connection_url": "wss://api.example.com/mcp"}
# HTTP (detected from https:// prefix)
{"connection_url": "https://api.example.com/mcp"}
# Stdio (default when no URL provided)
{"command": "python", "args": ["server.py"]}
Troubleshooting
Common Issues
"No retriever specified" or "MCP not working"
- Solution: Set
RETRIEVER=mcp
orRETRIEVER=tavily,mcp
- Verify environment variable is set:
echo $RETRIEVER
"Invalid retriever(s) found"
- Check available retrievers:
tavily
,mcp
,google
,bing
,arxiv
, etc. - Ensure no typos in retriever names
"No MCP server configurations found"
- Ensure
mcp_configs
is a list of dictionaries - Verify at least one configuration is provided
- Check configuration format matches examples
"MCP server connection failed"
- Verify server command and arguments
- Check environment variables are set correctly
- Test the MCP server independently
- Ensure required dependencies are installed
"No tools available from MCP server"
- Verify the server exposes tools correctly
- Check server startup logs for errors
- Try enabling
MCP_AUTO_TOOL_SELECTION=true
"Tool execution failed"
- Check authentication tokens and API keys
- Verify tool arguments are valid
- Review server logs for detailed errors
- Enable debug logging for more information
Debug Mode
Enable detailed logging to diagnose issues:
import logging
logging.basicConfig(level=logging.DEBUG)
# Your research code here - will show detailed MCP operations
Testing Your Setup
Quick test to verify MCP configuration:
import os
from gpt_researcher import GPTResearcher
# Test retriever configuration
os.environ["RETRIEVER"] = "mcp"
# Test basic configuration
researcher = GPTResearcher(
query="test query",
mcp_configs=[
{
"name": "test",
"command": "echo",
"args": ["hello world"]
}
]
)
print(f"✅ RETRIEVER set to: {os.environ.get('RETRIEVER')}")
print(f"✅ MCP configs loaded: {len(researcher.mcp_configs)}")
Best Practices
- Always set the RETRIEVER environment variable - This is required for MCP functionality
- Use hybrid strategies (
tavily,mcp
) for comprehensive research - Use descriptive server names for easier debugging
- Store sensitive data in environment variables
- Test MCP servers independently before integration
- Enable verbose mode during development
- Choose appropriate retriever combinations based on your research domain
- Let the default settings handle optimization for most use cases
For more examples and advanced use cases, check out the GPT Researcher examples repository. :-)