AI Researcher & Founder
I keep picking problems a little out of my depth🪨then poking at them till they give in🧩,betting one of these swings moves the world forward🚀 (probably)
Part dreamer 💭, part builder 🛠️, part sucker for a good “why” 🌀 that keeps me up well past 2am. Overthinks everything, regrets none of it.
Count me in for
- a game of chess
- coffee runs & café-hopping
- long, aimless drives
- a good book
- teaching kids who never got a fair shot
Selected Work
Things I’ve built to make
machines reason well.
Research and infrastructure at the edge of memory, retrieval, and reasoning. Every number below is measured, not claimed.
Vrin
Founder · Reasoning infrastructure · 2025 →A production knowledge & reasoning layer for AI agents. Documents become temporal knowledge graphs that answers can be traced back to.
Multi-tenant RAG running on AWS (Neptune knowledge graph, OpenSearch, a Lambda fleet) with live sub-second graph retrieval. Bi-temporal fact versioning and provenance let agents reason about what was true at a point in time, something standard RAG can't do. Cross-account VPC deployment keeps enterprise data fully sovereign. Ships an MCP server (PyPI), a TypeScript SDK and CLI (npm), and a multi-step agentic query pipeline.
- 95.1%
- MultiHop-RAG accuracy
- +28%
- over HippoRAG-2 on MuSiQue
- sub-second
- graph retrieval
SUPERSEDE
Research · RL environment + paper · 2026Diagnosing and training the memory-update gap in LLM agents: the first RL environment whose reward targets temporal fact-currency.
When facts change, agents read the update fine but keep citing the stale value. SUPERSEDE isolates this 'supersession gap' as a distinct failure: even gpt-5.4 drops from 92% to 77% the moment it must maintain bounded memory (p=0.0033), and the gap doesn't close with scale or more memory. The fix is training, not size. A GRPO-tuned Qwen2.5-3B (LoRA) trained on procedurally-generated timelines nearly doubled held-out accuracy on real LongMemEval conversations. Open environment on the Prime Intellect Hub; preprint (cs.CL) also formatted for the LLA @ COLM workshop.
- 9.0% → 16.7%
- held-out accuracy (+86%)
- 92% → 77%
- frontier gap, bounded memory
- Preprint
- cs.CL · LLA @ COLM
Engram
Open source · RAG library · 2025An open-source RAG library with an adaptive per-query router that hits SOTA-tier accuracy at ~40% lower median latency.
Engram reasons over a corpus instead of just matching chunks. It layers iterative retrieve-and-reason (IRCoT) and optional knowledge-graph reasoning on top of standard retrieval, then a strategic router decides per-query how much machinery to spend. The result matches field SOTA for gpt-4o-mini on MuSiQue (F1 0.54 / EM 0.40) while cutting median latency ~40%. Every architectural decision is backed by a published ablation, including the ones that didn't work.
- F1 0.54 / EM 0.40
- MuSiQue (gpt-4o-mini)
- −40%
- median latency
- Field SOTA
- matched, measured
Also built
About
Currently
- Building Vrin, a production knowledge & reasoning layer for AI agents.
- Researching how agents update beliefs over time (the memory-update gap).
- Open-sourcing the parts of the engine worth sharing.
Languages
AI / ML
Systems & Cloud
I'm an AI researcher and founder working at the intersection of memory, retrieval, and reasoning: the parts of intelligence that decide whether an agent is right, not just fluent.
Most of my work starts from the same frustration: language models reason beautifully over good context, and fall apart without it. So I build the layer underneath: temporal knowledge graphs, graph-aware retrieval, and training environments that teach models to keep what they know current.
Underneath the research is a simpler reason for any of it. I believe carefully-built tools can hand more people a fair shot, and that's the throughline from what I build to what I care about most: making a real education reachable for kids who never got one.
The Path
Where I’ve done the work.
Vrin
Chief AI Researcher & Founder
Oct 2025 – Present · Folsom, CA
- Built and deployed a production multi-tenant RAG knowledge & reasoning layer for AI agents on AWS, serving live sub-second graph retrieval.
- Designed bi-temporal fact versioning and provenance for point-in-time reasoning standard RAG can't do.
- Achieved 28% relative accuracy gain over HippoRAG-2 (published SOTA) on MuSiQue, and field-SOTA F1 on MultiHop-RAG.
Drevol
AI/ML Research Engineer
Aug 2025 – Mar 2026 · Redmond, WA (Hybrid)
- Lifted AOV by 5% and add-to-cart attach rate by 12% with a cart-add-on recommender on Azure Databricks.
- Built a demand-sensing forecasting pipeline with LangGraph orchestration and LLM observability, cutting stockouts by 65%.
Microsoft (through Drevol)
Software Engineering Intern
Apr 2025 – Jun 2025 · Redmond, WA
- Built privacy-preserving RAG workflows for distributed log analysis, cutting triage time 65% and improving resolution accuracy 40%.
- Designed a load-balanced, K8s-backed test-orchestration service raising throughput 50% and release cadence 25%.
UC Davis
Lead ML & GenAI (Agentic Systems) Researcher
Feb 2024 – Jun 2025 · Davis, CA
- Shipped a clinician-facing Health LLM aggregating temporal, multi-source signals into FHIR/JSON pipelines, with 92% prediction precision at 30% lower latency.
- Built a quantized TinyML ECG classifier (92.8% accuracy) extending wearable battery life from 14 to 30+ days; published at IEEE ISCAS.


