Neeraj Agarwala — AI Engineer · LLM, RAG & MLOps Systems

A short manifesto

I don't build AI to impress. I build it to do work no one else wants to do — protect sensitive records, surface revenue hiding in unstructured data, and make predictions operations teams actually act on.

The interesting question is never can the model do it. It's whether anyone uses what the model produces, whether the system holds up under audit, and whether the metric it moves is one the business already cares about. Those are the only three things that matter.

Privacy isn't a compliance checkbox. It's an architectural decision made on the first day. Models that handle people's data should be designed like banks are designed — trust is the product, and everything else follows.

Neeraj Agarwala · Dallas, TX

01 / Principles

Six rules I engineer by.

These aren't slogans. They're the filter every system passes through before I'd put my name on it. Most failed AI projects I've seen broke one of these six rules first — usually rule five.

Compliance is architecture.

Privacy, access control, and auditability are not bolt-ons. They live in the system diagram alongside the model — or they don't live at all. The federated learning architecture I designed didn't add privacy to a model; it made the model possible because of privacy. Data never moved. That's the entire point.

Every model has a number.

A model that doesn't move a metric the business already tracks is a research project, not a product. Settlement success rate. PII exposure incidents. Churn. Forecast variance. Pick the number first. Build backwards. If you can't name the number, don't build the model.

Production is the only proof.

Notebooks don't ship. The bar is a system that survives real traffic, real edge cases, and the slow drift of real-world data — with monitoring honest enough to admit when it stops working. Ten production systems beats a hundred decks. Every time.

Ship the smallest useful version first.

A 70% solution shipping in week three beats a 95% solution presented in month six. The 70% solution tells you what 95% should actually look like — and half the time the answer is not what you thought. Iteration on real data is the only feedback loop that matters.

Talk to the people who use it.

Operations teams, analysts, marketing leads — they know the failure modes you'll never see in evaluation. They know the inputs that look fine and produce nonsense. Embedded work beats handoff every time. This is the rule most teams break first, and most don't recover.

Models drift. Keep them honest.

The world the model was trained on isn't the world the model lives in. Drift detection, calibration checks, and evaluation harnesses aren't optional infrastructure — they're the difference between a system that ages well and one that quietly poisons decisions for six months before anyone notices.

02 / About

I build AI systems that do things that matter — systems that protect millions of sensitive records, surface revenue hiding in unstructured data, and make predictions operations teams actually act on.

Over eight years, I've shipped production ML across the full stack: BERT-based NER pipelines that auto-tag PII at 97.3% accuracy across 5M+ records, LLM-powered RAG tools that cut manual acquisition effort for marketing teams, federated learning architectures that train across siloed data without moving a single sensitive record, and anomaly-detection systems that slashed breach detection time from 14 days to under 2 hours.

My engineering instinct is to build things that scale and comply — AWS SageMaker, Lambda, Step Functions, KMS. My product instinct is to connect model outputs to decisions that move numbers: churn, conversion, settlement success, cost. Compliance isn't a constraint — it's a design requirement.

I publish what I learn. The AI Career Radar series on LinkedIn — eight guides, 250+ pages — is the resource I wish I'd had when I started. If something in my head might save someone a week of debugging, it belongs in writing.

Based inDallas, Texas

Experience8+ years

FocusLLMs · RAG · NLP · MLOps

RecognitionOutstanding · Making IT Happen Awards

EducationM.S. Business Analytics — UT Dallas

LanguagesPython · SQL · PL/SQL · JavaScript

LLM EngineeringRAG PipelinesVector SearchHNSW IndexingHybrid RetrievalSentence EmbeddingsCross-encoder Re-rankingAWS BedrockSageMakerFederated LearningNERMLOps LLM EngineeringRAG PipelinesVector SearchHNSW IndexingHybrid RetrievalSentence EmbeddingsCross-encoder Re-rankingAWS BedrockSageMakerFederated LearningNERMLOps

FAISSPineconeWeaviatepgvectorChromaDBBM25 + VectorSurvival AnalysisMarkov AttributionspaCyPyTorchPresidioA/B TestingPython FAISSPineconeWeaviatepgvectorChromaDBBM25 + VectorSurvival AnalysisMarkov AttributionspaCyPyTorchPresidioA/B TestingPython

03 / Experience

Where I've shipped.

Data Scientist / AI Engineer

Own the full AI lifecycle — problem framing to production — having built and deployed 10+ ML & GenAI systems protecting sensitive data, optimizing lead channels, and improving settlement outcomes.
Architected a GPT-based PII redaction pipeline (Microsoft Presidio + AWS Bedrock, serverless + SageMaker) processing 50K+ documents/month — 92% fewer PII exposure incidents while keeping documents analytically usable.
Built a BERT-based NER model (spaCy) detecting SSNs, account & routing data across intake records — 97.3% accuracy, 80% less manual review, deployed via AWS Lambda into reporting dashboards.
Designed a TensorFlow Federated system training settlement-prediction models across 3 data centers with encrypted updates — +18% accuracy with zero transfer of 2M+ sensitive records.
Stood up an anomaly-detection system cutting breach-detection time from 14 days to under 2 hours.
Built Markov-chain attribution on 1.2M+ touchpoints (AWS Glue + TensorFlow), uncovering a 35% lift from underweighted channels.
Built survival-analysis pipelines that improved client retention 24% and revenue-forecast accuracy 31% across $200M+ in settlements.

Outstanding AwardMaking IT Happen Award

Data Scientist

Built predictive models (ML with Python) that deepened customer relationships, strengthened longevity, and personalized interactions.
Ran n-gram analysis on Google AdWords data for Marketing — saved the company >$500K.
Cut operating cost >$100K by matching legacy products to new SKUs and optimizing on-hand inventory.

Analytics, Reporting Infrastructure

Optimized ETL stored procedures so reporting infrastructure stayed within SLA.
Built KPI dashboards across Oracle, SAP BW & QVDS; scheduled Qlik Sense QVD extracts.
Wrote PL/SQL stored procedures supporting cross-functional data integration.

Analytics Delivery Services

Missing-value imputation & outlier detection (Pandas, PySpark) to reduce customer complaints.
Real-time analysis & visualization (Hive, Zeppelin, SparkSQL, Seaborn, Plotly) to root-cause complaints.
Built CI/CD pipelines (Jenkins, GitHub, Docker, Maven, Ansible) to speed product releases.

Senior Technical Analyst → Technical Analyst

Recommended an inventory solution on a telecom project cutting processing time 20%; automated client-side validation in Advanced Excel for $15K annual savings.
Built custom Tableau / Excel dashboards; delivered Big Data initiatives on the Hadoop ecosystem.
Wrote PL/SQL procedures (MySQL, SSRS, VB6) improving performance 15% and meeting SLA response times.

04 / Selected Work

Systems I'm proud of.

01 — Featured

GPT-Based PII Redaction Pipeline

A production GenAI pipeline combining Microsoft Presidio and AWS Bedrock, deployed via serverless orchestration and SageMaker, automatically redacting PII from 50K+ debt-settlement documents per month — keeping records analytically usable while sharply reducing exposure risk in a heavily regulated domain.

92% fewer PII exposure incidents · 50K+ docs/month

GPTAWS BedrockPresidioSageMakerServerless

Semantic Search & Vector Retrieval

A production RAG retrieval layer using sentence-transformer embeddings, an HNSW-indexed vector store, and hybrid (vector + BM25) search with a cross-encoder re-ranking pass over the top 50 candidates. Chunking tuned for legal–finance docs so boundaries don't break across clauses.

Powers the RAG that cut manual acquisition effort for marketing

RAGHNSWSentence EmbeddingsHybrid SearchBM25Cross-encoder Re-ranking

BERT NER for PII Detection

A spaCy / BERT named-entity model detecting SSNs, account and routing numbers across intake records, deployed via AWS Lambda and feeding clean data straight to reporting dashboards.

97.3% accuracy · 80% less manual review

BERTspaCyAWS LambdaNER

Federated Learning System

TensorFlow Federated training settlement-prediction models across 3 data centers with encrypted model updates and cloud key management — improving accuracy without ever moving 2M+ sensitive records.

+18% over single-source baselines

TF FederatedPrivacyKMSDistributed ML

Anomaly Detection for Data Security

Real-time anomaly-detection layer over data-access patterns, surfacing exfiltration risk and integrity issues fast enough that the security team can act before damage compounds.

14 days → under 2 hours to detection

Anomaly DetectionStreamingSecurityPython

Survival Analysis & Retention

Survival-analysis pipelines modeling client lifetime and settlement risk across $200M+ in settlements — lifting retention and tightening revenue forecasts the business plans against.

+24% retention · +31% forecast accuracy

Survival AnalysisForecastingPythonRetention

Markov-Chain Attribution

Multi-touch attribution across 1.2M+ marketing touchpoints via AWS Glue ETL and TensorFlow sequence modeling — exposing channels the business had been underweighting.

35% lift uncovered from underweighted channels

Markov ChainsAWS GlueTensorFlowAttribution

05 / Writing & Thought Leadership

Teaching the field forward.

I publish deep interview guides on the systems I build with — RAG, embeddings, LangGraph memory, MCP, fine-tuning, AI governance. Each one is the resource I wish I'd had when I started.

Published guides8

Total pages250+

LinkedIn followers3.9K

Drag to browse

G/01

30 Embeddings Interview Questions

NLP · Vectors32 pages

G/02

30 LangGraph Memory Questions

Agents32 pages

G/03

30 Fine-Tuning AI Models Questions

LoRA · DPO · RLHF35 pages

G/04

30 MCP Server Questions

Architecture32 pages

G/05

30 LangChain & LangGraph Questions

Orchestration32 pages

G/06

30 AI Governance Interview Questions

Compliance32 pages

G/07

30 AI Fundamentals — Helpful in 2026

For Everyone32 pages

G/08

AI Career Radar — Ongoing Series

WeeklyLinkedIn

06 / Currents

What I'm thinking about right now.

A field this fast moves in waves. These are the ones I'm sitting closest to in 2026 — the ones I think will reshape how AI gets built in production over the next eighteen months.

Active study

Agent memory at scale

How short-term checkpointing versus long-term stores in LangGraph change agent behaviour as conversations stretch from minutes into weeks. The boundary between RAM and disk for an agent isn't obvious — and getting it wrong is expensive.

Building toward

Privacy-preserving fine-tuning

DPO and LoRA on sensitive corpora without leaking training data. Federated learning solved the training-distribution problem; the open question now is what happens when the model itself memorises what it shouldn't.

Watching closely

MCP as the integration layer

If Model Context Protocol holds, the N×M tool-integration problem collapses to N+M. That's not a small refactor — it's a redesign of how every internal AI system at every enterprise gets wired together.

Daily decision

The vector-index trade-off

HNSW for recall, IVF-PQ for memory, flat for ground truth — and re-ranking layered on top of any of them. Picking the wrong index for the corpus shape is the silent killer of RAG quality. Most teams discover this six months too late.

Persistent obsession

Retrieval evals, not vibes

RAG quality is still mostly judged by feel. The teams that turn precision-at-k, recall, faithfulness, and groundedness into routine dashboards — not occasional spot-checks — are the ones whose enterprise GenAI actually compounds over time.

Quietly maturing

Hybrid search beats pure vector

Pure semantic search misses exact-match queries — account numbers, statutes, SKUs, error codes — that BM25 finds trivially. The interesting work right now is in fusion: reciprocal rank, learned sparse encoders, cross-encoder re-ranking. The boring answer wins.

07 / Toolkit

What I build with.

A LLM & Generative AI

LLMs (GPT, BERT)RAGAWS BedrockLangChainLangGraphNERLLM EvaluationPrompt EngineeringFine-tuning (LoRA, DPO)spaCyPresidio

B Vector Search & Retrieval

Sentence EmbeddingsOpenAI EmbeddingsFAISSPineconeWeaviateChromaDBpgvectorQdrantHNSWIVF / IVF-PQHybrid SearchBM25Cross-encoder Re-rankingSemantic Chunking

C ML, Stats & Modeling

PyTorchTensorFlowTF Federatedscikit-learnSurvival AnalysisMarkov ChainsAnomaly DetectionForecasting

D Cloud & MLOps

AWS SageMakerBedrockLambdaStep FunctionsGlueKMSS3DockerJenkinsCI/CDMLflowAnsible

E Analytics & Experimentation

A/B TestingCausal InferenceAttributionKPI DefinitionCohort AnalysisPySparkHadoop

F Languages & Data

PythonSQLPL/SQLJavaScriptJavaTableauPower BIQlikOracleMongoDBSAP BWETL

08 / Education & Recognition

Where it started.

Education

The University of Texas at Dallas

M.S. Business Analytics — Data Science

2017 — 2019 · GPA 3.72

Teaching Assistant — Web Analytics, under Prof. Amit Mehra

B. M. S. College of Engineering

B.E. Computer Science

2011 — 2015 · GPA 3.8

Recognition

Outstanding Award

Landmark Management Group

Recent

Making IT Happen Award