Show HN: Epstein-Search – Local, AI-Powered Search Engine for the Epstein Files
Posted 5 hours ago by simulationship
1 points
https://github.com/simulationship/epstein-searchHi HN,
I built epstein-search, an open-source Python CLI and library to run semantic search and RAG over the publicly released Epstein Files (unsealed court documents, depositions, FBI reports, and flight logs).
I wanted a way to easily navigate through these thousands of pages of unstructured legal PDFs without relying on a paid third-party service or sending data back and forth to a cloud provider.
How it works under the hood:
Running epstein-search setup downloads ~100K pre-computed document chunks and embeddings (using all-MiniLM-L6-v2) based on the public 20K document corpus.
It imports these into zvec (a local vector database) so the index is ready in about a minute.
Standard search (epstein-search search) embeds your query locally using sentence-transformers and does a vector similarity search. This step is 100% offline and requires no API keys.
For the conversational RAG mode (epstein-search chat or ask), it uses LiteLLM. You can point it to an Ollama or LM Studio instance for a completely free, local, and private pipeline, or plug in a cloud provider like Anthropic, OpenAI, or Gemini.
You can also filter queries by document type (e.g., --doc-type flight_log or --source "FBI") and output the raw source context alongside the generated answers to verify the LLM's claims.
The dataset is strictly sourced from public domain releases (DOJ, House Oversight Committee, unsealed federal court docs).
Repo: https://github.com/simulationship/epstein-search
I'd love to hear your thoughts, feedback on the code, or any ideas for improving the local RAG pipeline! Happy to answer any questions.