Eight years of shipping LLM, RAG, NLP, and MLOps systems into businesses that need them to work — quietly, accurately, and under audit.
I don't build AI to impress. I build it to do work no one else wants to do — protect sensitive records, surface revenue hiding in unstructured data, and make predictions operations teams actually act on.
The interesting question is never can the model do it. It's whether anyone uses what the model produces, whether the system holds up under audit, and whether the metric it moves is one the business already cares about. Those are the only three things that matter.
Privacy isn't a compliance checkbox. It's an architectural decision made on the first day. Models that handle people's data should be designed like banks are designed — trust is the product, and everything else follows.
These aren't slogans. They're the filter every system passes through before I'd put my name on it. Most failed AI projects I've seen broke one of these six rules first — usually rule five.
I build AI systems that do things that matter — systems that protect millions of sensitive records, surface revenue hiding in unstructured data, and make predictions operations teams actually act on.
Over eight years, I've shipped production ML across the full stack: BERT-based NER pipelines that auto-tag PII at 97.3% accuracy across 5M+ records, LLM-powered RAG tools that cut manual acquisition effort for marketing teams, federated learning architectures that train across siloed data without moving a single sensitive record, and anomaly-detection systems that slashed breach detection time from 14 days to under 2 hours.
My engineering instinct is to build things that scale and comply — AWS SageMaker, Lambda, Step Functions, KMS. My product instinct is to connect model outputs to decisions that move numbers: churn, conversion, settlement success, cost. Compliance isn't a constraint — it's a design requirement.
I publish what I learn. The AI Career Radar series on LinkedIn — eight guides, 250+ pages — is the resource I wish I'd had when I started. If something in my head might save someone a week of debugging, it belongs in writing.
A production GenAI pipeline combining Microsoft Presidio and AWS Bedrock, deployed via serverless orchestration and SageMaker, automatically redacting PII from 50K+ debt-settlement documents per month — keeping records analytically usable while sharply reducing exposure risk in a heavily regulated domain.
A production RAG retrieval layer using sentence-transformer embeddings, an HNSW-indexed vector store, and hybrid (vector + BM25) search with a cross-encoder re-ranking pass over the top 50 candidates. Chunking tuned for legal–finance docs so boundaries don't break across clauses.
A spaCy / BERT named-entity model detecting SSNs, account and routing numbers across intake records, deployed via AWS Lambda and feeding clean data straight to reporting dashboards.
TensorFlow Federated training settlement-prediction models across 3 data centers with encrypted model updates and cloud key management — improving accuracy without ever moving 2M+ sensitive records.
Real-time anomaly-detection layer over data-access patterns, surfacing exfiltration risk and integrity issues fast enough that the security team can act before damage compounds.
Survival-analysis pipelines modeling client lifetime and settlement risk across $200M+ in settlements — lifting retention and tightening revenue forecasts the business plans against.
Multi-touch attribution across 1.2M+ marketing touchpoints via AWS Glue ETL and TensorFlow sequence modeling — exposing channels the business had been underweighting.
I publish deep interview guides on the systems I build with — RAG, embeddings, LangGraph memory, MCP, fine-tuning, AI governance. Each one is the resource I wish I'd had when I started.
A field this fast moves in waves. These are the ones I'm sitting closest to in 2026 — the ones I think will reshape how AI gets built in production over the next eighteen months.
How short-term checkpointing versus long-term stores in LangGraph change agent behaviour as conversations stretch from minutes into weeks. The boundary between RAM and disk for an agent isn't obvious — and getting it wrong is expensive.
DPO and LoRA on sensitive corpora without leaking training data. Federated learning solved the training-distribution problem; the open question now is what happens when the model itself memorises what it shouldn't.
If Model Context Protocol holds, the N×M tool-integration problem collapses to N+M. That's not a small refactor — it's a redesign of how every internal AI system at every enterprise gets wired together.
HNSW for recall, IVF-PQ for memory, flat for ground truth — and re-ranking layered on top of any of them. Picking the wrong index for the corpus shape is the silent killer of RAG quality. Most teams discover this six months too late.
RAG quality is still mostly judged by feel. The teams that turn precision-at-k, recall, faithfulness, and groundedness into routine dashboards — not occasional spot-checks — are the ones whose enterprise GenAI actually compounds over time.
Pure semantic search misses exact-match queries — account numbers, statutes, SKUs, error codes — that BM25 finds trivially. The interesting work right now is in fusion: reciprocal rank, learned sparse encoders, cross-encoder re-ranking. The boring answer wins.
If you're building AI infrastructure, operationalising LLMs, or need ML that holds up under compliance scrutiny — I'd be glad to hear about it.