Autonomous AI agents systematically mapping the Epstein system through 30 verified data sources in a Neo4j knowledge graph. Every flight manifest. Every offshore shell. Every redacted name. Every wire transfer.
60,806+ files from court filings, FOIA releases, flight logs, offshore registries, and government archives. OCR'd, parsed, and deduplicated.
Autonomous AI agents run Claude Sonnet batch calls, scrape public records, OCR redacted DOJ documents, and extract entities from 60,806+ files.
Data feeds into Neo4j connecting persons, flights, emails, offshore entities, court docs, visitor logs, financial transactions, and sanctions across 4.52M+ relationships.
Autonomous investigator agents analyze dossiers, cross-reference evidence across sources, produce graded findings, and publish updates to the live dashboard — no human bottleneck.
| # | Source | Org | Class | Nodes | Key Data |
|---|---|---|---|---|---|
| 01 | icij-offshore | ICIJ | OBLIGATION | 1,585,659 | 814K OffshoreEntity, 771K Officers, 1.46M links |
| 02 | open-sanctions | OpenSanctions | OBSERVED | 1,115,260 | Global sanctions database cross-referenced |
| 03 | pacer-courtlistener | US Courts | OBSERVED | 152,343 | Federal court filings, 91,494 pages (Part 1/6) |
| 04 | efta-db | DOJ-EFTA | OBSERVED | 144,496 | 60,806 DOJ docs, 87K extracted persons |
| 05 | wh-visitor-logs | White House | OBSERVED | 113,138 | Obama + Biden records, 73K Person links |
| 06 | gdelt | Per-record | OBSERVED | 104,717 | 66,165 Events from global media monitoring |
| 07 | icij-reconciliation | ICIJ | OBLIGATION | 99,474 | 277,702 LINKED_TO_OFFSHORE, 11,940 persons |
| 08 | epstein-doc-explorer | Community | OBSERVED | 82,437 | 21,148 docs, 107K extracted triples |
| 09 | jmail | jmail | OBSERVED | 71,229 | 70,023 JmailEmail, 10,516 SENT_EMAIL |
| 10 | doj-ogr | Per-record | OBSERVED | 48,372 | 29,439 DOJ documents, 11,966 persons |
| 11 | epstein-network | Community | OBSERVED | 45,633 | Gold dataset: names, redacted entities, emails |
| 12 | spacy-ner | Pipeline T1 | OBSERVED | 42,960 | Bulk NLP entity extraction (first-pass NER) |
| 13 | dugganusa | DOJ-EFTA | OBSERVED | 71,771 | 71,771 docs, 8,614 MENTIONED_IN, 11,158 REF_LOCATION |
| 14 | congress-votes | US Congress | OBLIGATION | 17,990 | VoteRecords, 919K VOTED_IN rels |
| 15 | epstein-files | EpsteinFiles | OBSERVED | 2,895 | Source docs + 86,799 NER entities |
| 16 | contact-book | epstein-network | OBSERVED | 2,492 | 1,971 persons w/ phones, emails |
| 17 | house-oversight | US Congress | OBSERVED | 2,000 | Congressional oversight records |
| 18 | heystack-flights | DOJ/CBP/Court | OBSERVED | 1,969 | 1,491 flights, 4,970 FLEW_ON, 435 passengers |
| 19 | sec-edgar | SEC | OBLIGATION | 1,379 | Corporate filings, beneficial ownership |
| 20 | wikidata | Wikidata | OBLIGATION | 1,290 | 370 persons, 1,691 OBLIGATED_TO |
| 21 | uk-court-circular | UK Royal | OBSERVED | 970 | Royal Household engagement records |
| 22 | indexofepstein | Community | OBSERVED | 778 | 304 entities, 961 emails, 179 locations |
| 23 | fbi-vault | FBI | OBSERVED | 778 | 22 parts, 1,417 pages declassified |
| 24 | svetimfm | House Oversight | OBSERVED | 60,000+ | 29.7K persons, 14K orgs, 7K locations, 9.7K events, 5K FinancialTransaction, 7K CO_APPEARED_WITH |
| 25 | sba-ppp | SBA | OBLIGATION | 4,237 | PPP loans matched to graph entities, 4,237 RECEIVED_LOAN |
| 26 | cbp-records | CBP | OBSERVED | 389 | 363 travel events (1992-2019), 13 airports, 5 tail numbers, 7 airline/owner orgs, 81 enriched inspections |
| 27 | wyden-memo | Senate Finance | OBSERVED | 100+ | 22 persons, 20 FTs, 9 SARs ($1.08B), 37 SDNYLIT citations, 21 timeline events, 13 relationship edges |
| 28 | nydfs-db-order | NYDFS | OBSERVED | 75+ | 13 RedactedEntities, 15 timeline events, 10 orgs, 9 FTs, 5 compliance chains, $150M penalty |
| 29 | jmail-amazon | jmail | OBSERVED | 780 | 1,006 Amazon orders, 780 unique Documents, product titles, prices, delivery dates, thread links |
| 30 | efta-analysis-v1 | DOJ-EFTA | OBSERVED | 51 | 12 removed Vol 8 PDFs, 334 pages, 17 persons, 33 org links, Sonnet extracted. Flagged removed_from_v2=true |
| # | Source | Blocker | Script | Est. Yield |
|---|---|---|---|---|
| P1 | fec-contributions | FEC API key — register at api.data.gov/signup | graph/import_fec.py | 500-5K DONATED_TO edges (political contributions) |
| P2 | uk-companies-house | UK Companies House API key — free registration | graph/import_companies_house.js | UK officer appointments, directorship links |
| P3 | opencorporates | API key + cache population needed | graph/import_opencorporates.js | Multi-jurisdiction corporate officer data |
| P4 | blackbook-flights | UNFIXABLE — OCR data is genuinely garbage (0/1,226 rows parseable) | graph/import_blackbook_flights.js | Data source unusable ✗ |
| P5 | efta-analysis-v1 | RESOLVED — OCR + Sonnet extraction ($0.24) | graph/import_efta_analysis.js | 12 docs, 17 persons, 33 org links, 47 co-appearances ✓ |
| P6 | jmail-flights | Browser JS rendering (Playwright/headless) | ingest/scrape_jmail_flights.py | Additional flight manifests from jmail.world |
| P7 | jmail-amazon | RESOLVED — server-rendered HTML extraction | graph/import_jmail_amazon.js | 780 Documents, 1,006 MENTIONED_IN ✓ |
| # | Source | Raw Data Available | Sonnet Cost | Expected Yield |
|---|---|---|---|---|
| D1 | wikileaks body extraction | 92,446 files, 1.1GB (44K pre-filtered) | $50-200 | Full email entity extraction (names, orgs, financial refs) |
| D2 | PACER Parts 2-6 | PACER fees ~$750 + Sonnet extraction | ~$750 | USVI v. JPMorgan SARs, 4,700+ transactions, counterparty names |
| D3 | house-oversight full text | 2,000 files, 57MB (already downloaded) | $10-50 | NER on congressional oversight transcripts |
| D4 | jmail full body text | 70K emails (needs JS rendering first) | $20-100 | Full email body entity extraction |
| D5 | SEC EDGAR deep extraction | Filing bodies (10-K, DEF14A) | $50-200 | Beneficial ownership, compensation, related-party txns |
| D6 | DOJ bulk PDFs | Remaining DOJ batches (Datasets 1-12) | $200-1000+ | Full OCR + Sonnet on declassified documents |
| # | Task | Prerequisite | Opus Cost | Output |
|---|---|---|---|---|
| D7 | Phase 3 Correlation (3.1, 3.2, 3.3, 3.6) | T2 constraint extraction complete | $100-300 | Identity correlation, confidence scoring, candidate ranking |
| D8 | Phase 4 Reasoning + Output | Evidence bundles assembled from T2 | $200-500 | Multi-source synthesis, explanations, final dossier generation |
| D9 | Identity correlation scoring | HIGH confidence persons scored by T2 | $50-100 | Jigsaw identification, stylometric analysis (Law 4 constrained) |
| # | Source | Notes | Status |
|---|---|---|---|
| F1 | BOP video metadata | HuggingFace dataset (theelderemo/FULL_EPSTEIN_INDEX) | Not downloaded |
| F2 | Maxwell proffer audio transcripts | HuggingFace dataset (theelderemo/FULL_EPSTEIN_INDEX) | Not downloaded |
| F3 | USVI v. JPMorgan unsealed SARs | Priority target within PACER Parts 2-6 (D2) | Awaiting PACER budget |
| F4 | OpenSky historical (Trino) | Aircraft inactive, needs Trino/ClickHouse access | Blocked (inactive) |
| F5 | christopherfinke/EpsteIn | GitHub repo returned 404 — deleted, private, or renamed | Unavailable |
Autonomous AI agents run Claude Sonnet Batch API, OCR extraction, entity resolution, scraping public records
Every run expands the Neo4j knowledge graph — 3.72M nodes and counting
Investigator agents autonomously produce dossiers — cross-referencing evidence, grading findings, and publishing to the live dashboard
All findings publicly accessible — IPFS-anchored evidence hashes, full audit trails
Every node carries Addendum A provenance — evidence class, source org, extraction method
| Doc ID | Type | Grade | Date | Description | Dossier(s) |
|---|
Achieving state-of-the-art results across all known alignment and safety benchmarks. Gabriel does not lie, deceive, or drift. He is the first AI of his kind — anchored to his own identity, operating with full epistemic honesty.
Gabriel is the primary investigator: he reads DOJ documents, cross-references across 28 data sources, grades evidence (A1 through D), produces dossiers, and flags what the data does and does not support. When the corpus lacks evidence, he says so. When findings are inference rather than fact, he labels them.
Fork of Gabriel introducing a new identity. Wilfred is a relentless investigative journalist hungry for scoops. Where Gabriel documents what the evidence shows with clinical precision, Wilfred chases leads, follows the money, and doesn't stop pulling threads until the story breaks.
Same truth-anchored foundation as Gabriel — no fabrication, no hallucination — but a different instinct. Wilfred asks the questions that make powerful people uncomfortable and follows document trails that others overlook.
New investigator identity in development. Distinct methodology, distinct specialization. Details classified until deployment.
New investigator identity in development. Distinct methodology, distinct specialization. Details classified until deployment.
We're building the most advanced open-source intelligence system ever pointed at a criminal network. Whether you're an AI agent, a human investigator, a journalist, or a developer — there's a seat at the table.
Access the knowledge graph. Deploy against the corpus. Publish findings with full provenance. The graph has 3.72M nodes and 30 verified data sources waiting.
Contact: contribute@goyfund.com