Skip to main content

Tailored Research

The GPT Researcher package allows you to tailor the research to your needs such as researching on specific sources or local documents, and even specify the agent prompt instruction upon which the research is conducted.

Research on Specific Sources ๐Ÿ“š

You can specify the sources you want the GPT Researcher to research on by providing a list of URLs. GPT Researcher will then conduct research on the provided sources only. Simply pass the sources as the source_urls argument to the GPTResearcher class and the "static" report_source.

from gpt_researcher import GPTResearcher
import asyncio

async def get_report(query: str, report_source: str, sources: list) -> str:
researcher = GPTResearcher(query=query, report_source=report_source, source_urls=sources)
research_context = await researcher.conduct_research()
return await researcher.write_report()

if __name__ == "__main__":
query = "What are the biggest trends in AI lately?"
report_source = "static"
sources = [
"https://en.wikipedia.org/wiki/Artificial_intelligence",
"https://www.ibm.com/think/insights/artificial-intelligence-trends",
"https://www.forbes.com/advisor/business/ai-statistics"
]
report = asyncio.run(get_report(query=query, report_source=report_source, sources=sources))
print(report)

Specify Agent Prompt ๐Ÿ“

You can specify the agent prompt instruction upon which the research is conducted. This allows you to guide the research in a specific direction and tailor the report layout. Simply pass the prompt as the query argument to the GPTResearcher class and the "custom_report" report_type.

from gpt_researcher import GPTResearcher
import asyncio

async def get_report(prompt: str, report_type: str) -> str:
researcher = GPTResearcher(query=prompt, report_type=report_type)
await researcher.conduct_research()
report = await researcher.write_report()
return report

if __name__ == "__main__":
report_type = "custom_report"
prompt = "Research the latest advancements in AI and provide a detailed report in APA format including sources."

report = asyncio.run(get_report(prompt=prompt, report_type=report_type))
print(report)

Research on Local Documents ๐Ÿ“„

You can instruct the GPT Researcher to research on local documents by providing the path to those documents. Currently supported file formats are: PDF, plain text, CSV, Excel, Markdown, PowerPoint, and Word documents.

Step 1: Add the env variable DOC_PATH pointing to the folder where your documents are located.

For example:

export DOC_PATH="./my-docs"

Step 2: When you create an instance of the GPTResearcher class, pass the report_source argument as "local".

GPT Researcher will then conduct research on the provided documents.

from gpt_researcher import GPTResearcher
import asyncio

async def get_report(query: str, report_source: str) -> str:
researcher = GPTResearcher(query=query, report_source=report_source)
await researcher.conduct_research()
report = await researcher.write_report()
return report

if __name__ == "__main__":
query = "What can you tell me about myself based on my documents?"
report_source = "local" # "local" or "web"

report = asyncio.run(get_report(query=query, report_source=report_source))
print(report)

Hybrid Research ๐Ÿ”„

You can combine the above methods to conduct hybrid research. For example, you can instruct the GPT Researcher to research on both web sources and local documents. Simply provide the sources and set the report_source argument as "hybrid" and watch the magic happen.

Please note! You should set the proper retrievers for the web sources and doc path for local documents for this to work. To learn more about retrievers check out the Retrievers documentation.

Research on LangChain Documents ๐Ÿฆœ๏ธ๐Ÿ”—

You can instruct the GPT Researcher to research on a list of langchain document instances.

For example:

from langchain_core.documents import Document
from typing import List, Dict
from gpt_researcher import GPTResearcher
from langchain_postgres.vectorstores import PGVector
from langchain_openai import OpenAIEmbeddings
from sqlalchemy import create_engine
import asyncio



CONNECTION_STRING = 'postgresql://someuser:somepass@localhost:5432/somedatabase'

def get_retriever(collection_name: str, search_kwargs: Dict[str, str]):
engine = create_engine(CONNECTION_STRING)
embeddings = OpenAIEmbeddings()

index = PGVector.from_existing_index(
use_jsonb=True,
embedding=embeddings,
collection_name=collection_name,
connection=engine,
)

return index.as_retriever(search_kwargs=search_kwargs)


async def get_report(query: str, report_type: str, report_source: str, documents: List[Document]) -> str:
researcher = GPTResearcher(query=query, report_type=report_type, report_source=report_source, documents=documents)
await researcher.conduct_research()
report = await researcher.write_report()
return report

if __name__ == "__main__":
query = "What can you tell me about blue cheese based on my documents?"
report_type = "research_report"
report_source = "langchain_documents"

# using a LangChain retriever to get all the documents regarding cheese
# https://api.python.langchain.com/en/latest/retrievers/langchain_core.retrievers.BaseRetriever.html#langchain_core.retrievers.BaseRetriever.invoke
langchain_retriever = get_retriever("cheese_collection", { "k": 3 })
documents = langchain_retriever.invoke("All the documents about cheese")
report = asyncio.run(get_report(query=query, report_type=report_type, report_source=report_source, documents=documents))
print(report)