🍷 CellarChat™ RAG Agent for 1M+ Users
Built CellarChat™, an AI wine assistant using RAG, agents, OpenAPI tools, and LLM evals for CellarTracker's 1M+ member platform.
Built CellarChat™, an AI wine assistant using RAG, agents, OpenAPI tools, and LLM evals for CellarTracker's 1M+ member platform.

Shipping CellarChat required solving several production constraints that were easy to miss in demos. Early model context windows were much smaller than they are now, so retrieval and orchestration quality had to carry real load. We also needed a way to prove quality movement to product and executive stakeholders as the system changed week to week.
The core challenge was building an assistant that felt genuinely useful instead of generic. Members wanted personalized recommendations, not broad wine advice. That meant the system had to combine structured collection state with unstructured tasting language and return grounded responses that mapped to real actions inside the app. It also had to handle broad question types, from what to drink tonight to collection analysis and food pairing. Because this shipped as a production feature, reliability and behavior consistency mattered as much as model quality. A brittle demo would not survive real usage patterns.
CellarTracker is a wine cellar management platform where members track inventory, decide what to open, monitor drinking windows, and learn from community tasting notes. The user problem sounds simple, but the data is not. Wine data spans structured records like bottle counts, vintages, storage locations, and dates, plus unstructured content like tasting notes, preferences, and free-form prompts. Good answers depend on both data types together. The assistant also has to reason in product context: what the member owns, where bottles are stored, and what is actionable right now.
I worked across the stack to turn CellarChat from concept into a production AI surface during my August 2023 to July 2025 engagement. I led and implemented core RAG architecture, agentic workflows, OpenAPI tool integration, and LLM observability. I also built CI/CD-connected evaluations to catch regressions before release, and I contributed to the Python backend that powered AI orchestration and retrieval flows. On the product side, I integrated the AI system into TypeScript frontend experiences and React Native mobile surfaces so the assistant was available where members already manage their collections. I treated this as a systems problem, not a prompt-only problem.

For retrieval-augmented generation, I focused on grounding responses in member-specific context and trusted wine data. The retrieval path combined personal cellar information with broader tasting intelligence so the model could answer recommendation and exploration questions with context, not guesses. Query handling needed to support both direct lookups and fuzzy intent because users asked for wines by variety, pairing, vintage, readiness, region, and price constraints. The key engineering decision was to optimize retrieval quality and relevance first, then model behavior. In practice, better retrieval design delivered bigger gains than prompt tweaking alone.
I helped architect OpenAPI-based tool-calling workflows so the model could execute reliable product-aware actions instead of only generating text. Tool access gave the assistant a controlled interface into collection and product capabilities, which improved response usefulness and reduced hallucinated behavior. I also worked on fine-tuning and orchestration patterns that improved how the system selected tools and handled multi-step requests. The objective was predictable behavior under real user prompts: select the right tool path, gather the right context, and produce a clear answer tied to what the user can actually do next.
I built a structured evaluation system so we could answer a core leadership question every week: are our AI changes improving the product or regressing it? I designed a golden set made of both canonical user prompts and golden set users so each run tested the model against realistic cellar profiles, not synthetic averages. I then split golden questions into quantitative and qualitative tracks and graphed both over time in an internal dashboard to make trend lines, inflection points, and regressions visible before broad rollout.
I connected AI workflows to real user surfaces across web and mobile. CellarChat was not a side experiment; it was integrated into the core CellarTracker experience, including the collections flow where members decide what to drink and manage inventory. I worked in Python on backend AI services, in TypeScript on frontend integration, and in React Native to deliver cohesive mobile experiences. I also contributed beyond AI-only surfaces, including subscription and payment-related improvements, because shipping successful AI products requires alignment with broader product, platform, and business workflows.
CellarChat brought AI assistance into CellarTracker's core wine experience for a platform with 1M+ members. It connected personalized recommendations and collection intelligence to daily user decisions, not just novelty chat interactions. Through evaluations and observability, we improved system reliability and created a stronger release process for AI behavior changes. Through full-stack integration, we delivered AI capabilities directly in product surfaces across web and mobile, increasing practical adoption opportunities. The work also supported broader product and revenue efforts, including subscription and payment improvements.
This project reinforced a few practical engineering principles for production LLM systems:
CellarChat™ represents the kind of AI engineering I enjoy most: turning ambiguous product problems into reliable, evaluated, production-ready LLM systems.