PDTI Portfolio · Anchor 1

EmberGraph Intel

Forensic graph analysis on public records. Math finds patterns; humans interpret them. Every claim hashed back to a source document — auditable end-to-end. No chatbot synthesis.

Engineering Substrate

Built in 6 weeks. Operating now.

  • Tests passing 1,895
  • Architecture decisions documented 96 ADRs
  • Live data connectors 8
  • Analytical methods 19
  • Build duration 6 weeks
  • Total API spend $200
  • Team (incl. 3 PhDs) 8

Active cases: unregistered-lobbying litigation · fabricated-study federal-grant fraud (False Claims Act) · ~10,500 entities, 1M+ derived edges, sealed exhibit chain.

How EMI Works

Provenance chain, every step.

  1. Source documents

    FEC · LDA · SEC EDGAR · GDELT · Congress.gov · court records · web · structured extraction

  2. Sealed evidence objects

    Five canonical types — Entity, Document, Event, Claim, Source Record. Cryptographic hash on every write.

  3. Analytical methods

    Bipartite overlap · structural-hole analysis · spectral community detection · Granger causality · transfer entropy · cycle detection

  4. Cognoscenti surface

    Twelve-pattern signal library — surfaces magic-number persistence, citation laundering, methodology fabrication, amplifier networks.

  5. Attorney-readable brief

    Sealed exhibit packets. Every claim hashed back to a source document. Auditable end-to-end.

Shipped April 2026 Pipeline runs end-to-end today. Two cases in active production: an unregistered-lobbying litigation matter and a federal-grant fraud trace under the False Claims Act. Fully polished, attorney-ready brief packet — analyst-facing search, sharper per-claim attribution, narrative labels, and the single-zip deliverable an attorney can hand to a court — ships in about 10–14 weeks. Methodology and bundle artifacts available for technical review under NDA.

Product Lines

What EMI builds

Six product lines configured on a shared engineering substrate. Each line is sealed-evidence, attorney-defensible, and auditable from output back to source document. Three are active today; three are scoped for next-cycle development.

Litigation Intelligence

Sealed evidence bundles for attorneys handling complex multi-matter cases. Cross-matter pattern detection across FEC filings, lobbying disclosures, SEC documents, court records, and news. Output: cited, hashed, attorney-defensible briefs. Active reference case: unregistered-lobbying litigation.

MVP active

Fraud / False Claims Act

Forensic graph analysis on federal-program funding chains — provenance traces from programmatic justification back to source documents. Cycle detection on closed-loop fund flows. Active reference case: fabricated-study federal-grant fraud (False Claims Act, NGO chain).

Brief in late draft

Compliance Pattern Detection

Same architecture, different lens — pattern detection for in-house compliance teams at pharma, financial-services, and regulated-industry organizations. Catches structural patterns conventional firms produce for $50K–$500K per engagement.

Pilot inbound

Alternative-Data Signal (AltInt)

Temporal-signal layer monitoring filing velocities, prediction-market order books, sentiment shifts, and structural anomalies across event streams. Output: signal briefs identifying patterns ahead of consensus. Anchor selection Q3 2026.

Engineering gated

Corporate Intelligence

Counterparty due diligence, supply-chain risk surfacing, and competitive structure mapping for boards, fund managers, and corporate development teams. Highest-revenue, lowest-political-toxicity commercialization target.

Post-MVP

Marketing & Brand Intelligence

Lighter-primitive configuration for narrative-pattern detection across earned media — news, press coverage, social posts. Identifies coordinated messaging, amplifier networks, and reputation-shift inflection points. Lower-priority near-term.

Queued

Architecture

How EMI works

Five evidence object types. Eight live connectors. Nineteen analytical methods. Every Neo4j write carries source URL, fetch time, payload hash, run ID, and connector version. Cryptographic provenance on every claim.

Evidence object types

5

Entity · Document · Event · Claim · Source Record. Five canonical types. Each carries a cryptographic hash and full provenance back to source.

Live data connectors

8

FEC · LDA lobbying · SEC EDGAR · GDELT · Congress.gov · court records · web · structured extraction. Each connector versioned and hash-validated.

Analytical methods

19

Bipartite overlap · Burt structural-hole analysis · Louvain/Leiden community detection · Granger causality · transfer entropy · Tarjan cycle detection · link prediction.

Sample methods · selected

The math, not the synthesis

Find who's structurally indispensable. Quantifies which actors a network can't lose without breaking apart. Industry name: Burt constraint scoring.
Group people who actually cluster together. Detects cohesive subgroups inside fragmented public-records data. Industry name: Louvain / Leiden spectral community detection.
Find closed-loop money flows. Surfaces funding cycles where money exits an entity and returns through intermediaries. Load-bearing for False Claims Act cases. Industry name: Tarjan cycle detection.
Audit money-in vs. money-out. Balance-check on funding chains; identifies unaccounted-for delta. Industry name: Kirchhoff flow conservation.
Test whether one event predicts another. Temporal-precedence test on filing velocities, market signals, news events. Industry name: Granger causality.
Measure information flow between two timelines. Works where straight-line correlation fails — captures non-linear coupling between event streams. Industry name: transfer entropy.

Methodology

How EMI executes

Two engines. Three named primitives. The system surfaces structural patterns in primary-source documents; humans interpret them. Math-based pattern detection — not chatbot synthesis.

Engine 01

GraphInt

Retrospective and stationary analysis. Resolves who, what, when, with what relationships, using which evidence. Output: case-theory-grade investigative deliverables, primary-source-anchored, with cryptographic exhibit provenance.

Active reference cases: unregistered-lobbying litigation · fabricated-study federal-grant fraud (False Claims Act, NGO chain).

Engine 02

AltInt

Forward and temporal analysis. Surfaces signal change over time before consensus forms. Output: temporal-signal dashboards, alpha-grade event reports, early-warning indicators. Operates on the same engineering substrate, different analytical surface.

Reference cases: Post-hoc validation includes correctly-called Virginia DA primary, Romania election, Netherlands election. Anchor selection Q3 2026.

PRIMITIVE 01

Anchor Convergence

Soft-cluster-to-anchor resolution. When a corpus has many mentions distributed across unstructured documents that all point at one load-bearing source, the system identifies that single source and pulls its content as exhibit-grade material.

Worked example: "Alliance for Public Health" appears in three independent citation chains under three different formal names. Anchor Convergence resolves all three to the single legal entity and surfaces the canonical IBBS PWID 2020 source document.

PRIMITIVE 02

Constellation Grounding

Retrieval resilience across mirrors. When a primary URL fails — state.gov, GAO, FOIA portals, archived government records — the system walks a per-source ladder of alternate hosting locations until content resolves and validates.

Worked example: PEPFAR COP source URLs are fragile. The connector walks state.gov → GAO → House Appropriations → govinfo → amfAR copsdata mirror → Wayback CDX until the document resolves with a hash match.

PRIMITIVE 03

Cognoscenti Filter

Pattern library for surfacing soft signals within a corpus. Twelve patterns v1.0 — magic-number persistence, citation-chain laundering, methodology fabrication, internal inconsistency, amplifier networks, prevalence-table mismatch, and others.

Worked example: The "1.7% / 350,000" figure persisting across UN, UNODC, UNAIDS, PEPFAR documents while being internally inconsistent with IBBS source data — flagged by the magic-number persistence pattern.

Methodology

How EMI executes

Two engines. Three named primitives. The system surfaces structural patterns in primary-source documents; humans interpret them. Math-based pattern detection — not chatbot synthesis.

Engine 01

GraphInt

Retrospective and stationary analysis. Resolves who, what, when, with what relationships, using which evidence. Output: case-theory-grade investigative deliverables, primary-source-anchored, with cryptographic exhibit provenance.

Active reference cases: unregistered-lobbying litigation · fabricated-study federal-grant fraud (False Claims Act, NGO chain).

Engine 02

AltInt

Forward and temporal analysis. Surfaces signal change over time before consensus forms. Output: temporal-signal dashboards, alpha-grade event reports, early-warning indicators. Operates on the same engineering substrate, different analytical surface.

Reference cases: Post-hoc validation includes correctly-called Virginia DA primary, Romania election, Netherlands election. Anchor selection Q3 2026.

PRIMITIVE 01

Anchor Convergence

Soft-cluster-to-anchor resolution. When a corpus has many mentions distributed across unstructured documents that all point at one load-bearing source, the system identifies that single source and pulls its content as exhibit-grade material.

Worked example: "Alliance for Public Health" appears in three independent citation chains under three different formal names. Anchor Convergence resolves all three to the single legal entity and surfaces the canonical IBBS PWID 2020 source document.

PRIMITIVE 02

Constellation Grounding

Retrieval resilience across mirrors. When a primary URL fails — state.gov, GAO, FOIA portals, archived government records — the system walks a per-source ladder of alternate hosting locations until content resolves and validates.

Worked example: PEPFAR COP source URLs are fragile. The connector walks state.gov → GAO → House Appropriations → govinfo → amfAR copsdata mirror → Wayback CDX until the document resolves with a hash match.

PRIMITIVE 03

Cognoscenti Filter

Pattern library for surfacing soft signals within a corpus. Twelve patterns v1.0 — magic-number persistence, citation-chain laundering, methodology fabrication, internal inconsistency, amplifier networks, prevalence-table mismatch, and others.

Worked example: The "1.7% / 350,000" figure persisting across UN, UNODC, UNAIDS, PEPFAR documents while being internally inconsistent with IBBS source data — flagged by the magic-number persistence pattern.

Get in touch

Working briefs on demand.

EMI's brief-engine is operating now. Litigation matters, FCA cases, compliance pattern detection, and signal analysis are accepting engagements. Methodology and bundle artifacts available for technical review under NDA.