RYAN ZERNACH

Senior AI Systems Engineer

Ryan_Zernach_2025_Senior_AI_Systems_Engineer_Remote_United_States
🍷 CellarChat™ RAG Agent for 1M+ Users
We engineered a new cross-platform mobile app for an online wine community that is 25+ years old and has 1M+ members.

Development Challenges & Solutions

Shipping CellarChat required solving several production constraints that were easy to miss in demos. Early model context windows were much smaller than they are now, so retrieval and orchestration quality had to carry real load. We also needed a way to prove quality movement to product and executive stakeholders as the system changed week to week.

  • Context window limits: at the time, the practical ceiling was around 150k tokens, so I used embeddings and retrieval strategies to compress and prioritize relevant context instead of trying to pass everything directly. This reduced token cost and fit critical context into a single request window.
  • Evaluation discipline for investment decisions: leadership wanted to know whether weekly system changes were improving or regressing behavior, so I implemented CI/CD-connected evals and observability to compare versions and make go or no-go release calls with evidence.
  • Deterministic math for agentic workflows: for questions like bottle counts and inventory arithmetic, I extended tool-calling paths with math-capable tools so the agent could compute reliably instead of depending on non-deterministic LLM-only calculations.
  • Mendocino ambiguity edge cases handling: "Mendocino" can mean region, subregion, or appellation depending on user intent. I generated thousands of synthetic conversation examples that taught disambiguation behavior, then fine-tuned with Azure AI Foundry LoRA on an OpenAI model to improve how the assistant asked clarifying questions instead of assuming.

Problem

The core challenge was building an assistant that felt genuinely useful instead of generic. Members wanted personalized recommendations, not broad wine advice. That meant the system had to combine structured collection state with unstructured tasting language and return grounded responses that mapped to real actions inside the app. It also had to handle broad question types, from what to drink tonight to collection analysis and food pairing. Because this shipped as a production feature, reliability and behavior consistency mattered as much as model quality. A brittle demo would not survive real usage patterns.

Context

CellarTracker is a wine cellar management platform where members track inventory, decide what to open, monitor drinking windows, and learn from community tasting notes. The user problem sounds simple, but the data is not. Wine data spans structured records like bottle counts, vintages, storage locations, and dates, plus unstructured content like tasting notes, preferences, and free-form prompts. Good answers depend on both data types together. The assistant also has to reason in product context: what the member owns, where bottles are stored, and what is actionable right now.

My Role

I worked across the stack to turn CellarChat from concept into a production AI surface during my August 2023 to July 2025 engagement. I led and implemented core RAG architecture, agentic workflows, OpenAPI tool integration, and LLM observability. I also built CI/CD-connected evaluations to catch regressions before release, and I contributed to the Python backend that powered AI orchestration and retrieval flows. On the product side, I integrated the AI system into TypeScript frontend experiences and React Native mobile surfaces so the assistant was available where members already manage their collections. I treated this as a systems problem, not a prompt-only problem.

  • RAG architecture and retrieval quality tuning for structured and unstructured wine data
  • Agentic workflows with OpenAPI tool-calling and multi-step orchestration paths
  • LLM observability and CI/CD evaluations for release quality and regression control
  • Python backend delivery plus TypeScript and React Native product integration
🍷 CellarChat™ RAG Agent for 1M+ Users
I also shipped core mobile features beyond AI, including custom barcode scanning and bottle label capture workflows.

Technical Approach: Retrieval-Augmented Generation

For retrieval-augmented generation, I focused on grounding responses in member-specific context and trusted wine data. The retrieval path combined personal cellar information with broader tasting intelligence so the model could answer recommendation and exploration questions with context, not guesses. Query handling needed to support both direct lookups and fuzzy intent because users asked for wines by variety, pairing, vintage, readiness, region, and price constraints. The key engineering decision was to optimize retrieval quality and relevance first, then model behavior. In practice, better retrieval design delivered bigger gains than prompt tweaking alone.

Technical Approach: Agentic Tool Use / OpenAPI Workflows

I helped architect OpenAPI-based tool-calling workflows so the model could execute reliable product-aware actions instead of only generating text. Tool access gave the assistant a controlled interface into collection and product capabilities, which improved response usefulness and reduced hallucinated behavior. I also worked on fine-tuning and orchestration patterns that improved how the system selected tools and handled multi-step requests. The objective was predictable behavior under real user prompts: select the right tool path, gather the right context, and produce a clear answer tied to what the user can actually do next.

Technical Approach: Evaluation and Observability

I built a structured evaluation system so we could answer a core leadership question every week: are our AI changes improving the product or regressing it? I designed a golden set made of both canonical user prompts and golden set users so each run tested the model against realistic cellar profiles, not synthetic averages. I then split golden questions into quantitative and qualitative tracks and graphed both over time in an internal dashboard to make trend lines, inflection points, and regressions visible before broad rollout.

  • Golden set design: I curated stable, high-value prompts and paired them with representative user accounts covering different cellar sizes, data quality patterns, and usage behaviors.
  • Quantitative track: questions with objectively verifiable answers, where expected outputs were computed from hard-coded SQL queries against user profile and cellar tables. This gave deterministic pass/fail signals for metrics like bottle counts, holdings by region or variety, and other inventory facts.
  • Qualitative track: recommendation and interpretation prompts where correctness is not a single database value. For these, I used an LLM-as-judge framework with explicit rubrics to score dimensions like relevance, grounding, clarity, and actionability.
  • Dashboard and release decisions: I graphed both tracks over time to compare model versions and prompt or retrieval changes, then used those trends in CI/CD gating and release reviews to decide when a change was safe to ship.

Technical Approach: Full-Stack Product Integration

I connected AI workflows to real user surfaces across web and mobile. CellarChat was not a side experiment; it was integrated into the core CellarTracker experience, including the collections flow where members decide what to drink and manage inventory. I worked in Python on backend AI services, in TypeScript on frontend integration, and in React Native to deliver cohesive mobile experiences. I also contributed beyond AI-only surfaces, including subscription and payment-related improvements, because shipping successful AI products requires alignment with broader product, platform, and business workflows.

Impact

CellarChat brought AI assistance into CellarTracker's core wine experience for a platform with 1M+ members. It connected personalized recommendations and collection intelligence to daily user decisions, not just novelty chat interactions. Through evaluations and observability, we improved system reliability and created a stronger release process for AI behavior changes. Through full-stack integration, we delivered AI capabilities directly in product surfaces across web and mobile, increasing practical adoption opportunities. The work also supported broader product and revenue efforts, including subscription and payment improvements.

  • Shipped production AI assistance for CellarTracker's 1M+ member platform.
  • Connected AI workflows to real React Native mobile product surfaces and allowed members to thumbs up, thumbs down, and provide custom feedback about AI responses.
  • Improved reliability using LLM evaluations and observability practices in CI/CD.
  • Contributed to broader growth work: app rating improvements, Apple Pay subscriptions, and recurring revenue initiatives.

Lessons Learned

This project reinforced a few practical engineering principles for production LLM systems:

  • RAG quality depends on data modeling and retrieval strategy more than prompt iteration alone.
  • Agentic systems need constraints, fallback behavior, and evaluation coverage to stay reliable at scale.
  • Production LLM apps require observability and regression tracking, not just strong prompts.
  • AI features are only valuable when integrated into real workflows where users already make decisions.

Closing

CellarChat™ represents the kind of AI engineering I enjoy most: turning ambiguous product problems into reliable, evaluated, production-ready LLM systems.

CellarChat support documentation