Retrievers
Retrievers are search engines used to find the most relevant documents for a given research task. You can specify your preferred web search or use any custom retriever of your choice.
Web Search Engines
GPT Researcher defaults to using the Tavily search engine for retrieving search results.
But you can also use other search engines by specifying the RETRIEVER
env var. Please note that each search engine has its own API Key requirements and usage limits.
For example:
RETRIEVER=bing
You can also specify multiple retrievers by separating them with commas. The system will use each specified retriever in sequence. For example:
RETRIEVER=tavily, arxiv
Thanks to our community, we have integrated the following web search engines:
- Tavily - Default
- Bing - Env:
RETRIEVER=bing
- Google - Env:
RETRIEVER=google
- SearchApi - Env:
RETRIEVER=searchapi
- Serp API - Env:
RETRIEVER=serpapi
- Serper - Env:
RETRIEVER=serper
- Searx - Env:
RETRIEVER=searx
- Duckduckgo - Env:
RETRIEVER=duckduckgo
- Arxiv - Env:
RETRIEVER=arxiv
- Exa - Env:
RETRIEVER=exa
- PubMedCentral - Env:
RETRIEVER=pubmed_central
Custom Retrievers
You can also use any custom retriever of your choice by specifying the RETRIEVER=custom
env var.
Custom retrievers allow you to use any search engine that provides an API to retrieve documents and is widely used for enterprise research tasks.
In addition to setting the RETRIEVER
env, you also need to set the following env vars:
RETRIEVER_ENDPOINT
: The endpoint URL of the custom retriever.- Additional arguments required by the retriever should be prefixed with
RETRIEVER_ARG_
(e.g., RETRIEVER_ARG_API_KEY).
Example
RETRIEVER=custom
RETRIEVER_ENDPOINT=https://api.myretriever.com
RETRIEVER_ARG_API_KEY=YOUR_API_KEY
Response Format
For the custom retriever to work correctly, the response from the endpoint should be in the following format:
[
{
"url": "http://example.com/page1",
"raw_content": "Content of page 1"
},
{
"url": "http://example.com/page2",
"raw_content": "Content of page 2"
}
]
The system assumes this response format and processes the list of sources accordingly.
Missing a retriever? Feel free to contribute to this project by submitting issues or pull requests on our GitHub page.