Neural Food Search
Production-Grade GenAI Search Pipeline
Search 13,591 restaurants across 4 US cities with natural language
About
Neural Food Search is an end-to-end GenAI search pipeline that transforms natural language queries like “cozy ramen locals love, not too loud” into ranked restaurant results. It demonstrates how to build, evaluate, and operate a production ML system using modern retrieval and ranking techniques.
The pipeline combines BM25 keyword search, dense vector retrieval (BGE-M3), sparse retrieval, Reciprocal Rank Fusion, cross-encoder reranking, and LLM listwise reranking into a 4-stage architecture deployed on Google Cloud Run.
Beyond the search pipeline itself, the project showcases a complete MLOps lifecycle: rigorous offline evaluation with ablation studies, custom model training experiments (LambdaMART reranker, DistilBERT/T5 analyzer distillation), and production monitoring with drift detection frameworks.
Key Numbers
Pipeline Architecture
Parses natural language into structured intent, HyDE document, filters, and negative constraints.
Three parallel paths (BM25 + dense + sparse) fused with Reciprocal Rank Fusion (k=60).
Pairwise relevance scoring on top-50 candidates using cross-attention transformer.
Holistic reordering with comparative reasoning and natural-language match explanations.
What We Found
Combining BM25, dense, and sparse retrieval with a zero-parameter rank fusion formula outperformed every learned model we trained. With only 30 eval queries, simplicity beats complexity.
Our first T5 training run produced 0% valid JSON. Switching to pipe-delimited output gave 100% parse rate with the same model and data. The bottleneck was representation, not model capacity.
Cross-encoder and LLM stages appear to decrease NDCG because they promote ungraded documents into the top-10. We documented this honestly rather than hiding it — the eval methodology matters as much as the numbers.
End-to-End Process
This project covers the full lifecycle of building, measuring, and operating a production ML system.
Data pipeline (Yelp → spaCy NER → Claude Batch API → BGE-M3 embeddings → Elastic indexing) processing 13K restaurants and 7M reviews.
3 Cloud Run services (models, API, UI) with service-to-service auth, secret management, and scale-to-zero infrastructure.
100 eval queries across 10 types, 7-stage ablation study, per-query-type breakdown, and honest failure analysis on the worst-performing queries.
LambdaMART custom reranker (26 features, boost tradeoff analysis) and T5 analyzer distillation (2,768 training examples, V1 classifier → V2 generator).
Component contract monitoring for frozen-weight models, dual NDCG tracking, deployment decision frameworks, and concrete next-step actions.
Tech Stack
Explore the Demo
The live demo includes 5 pages that tell a complete portfolio story:
Try natural language queries with a live two-panel layout showing pipeline steps
How each stage works with input/output examples and technical decisions
Ablation study, per-query-type breakdown, failure analysis, and production considerations
Custom LambdaMART reranker and T5 analyzer distillation with real results
Observe, Detect, Experiment, Decide — the full operational lifecycle