✦ Available for consulting · Bay Area & Remote
I design, build, and evaluate AI systems for production — RAG, agents, multilingual pipelines. You'll always know whether it's working, because measurement is part of how I build.
Getting AI to work in a controlled test is the easy part. Getting it to work reliably, measurably, and consistently for real users — that's where most projects stall. Here's where I can help.
Real users ask questions differently than your test cases. I build measurement systems that catch these gaps before users do — and tell you exactly what's going wrong and why.
→ AI quality & measurementContracts, manuals, reports, policies. I build systems that let AI reliably find and use the right information from your documents — in English or other languages.
→ Document AI & knowledge systemsData lookup, form completion, routing, multi-step workflows. I design AI that acts — and fails gracefully instead of silently doing the wrong thing.
→ AI automation & workflowsI've cut AI infrastructure costs by 70% while making systems faster. The bottleneck is rarely where you think — I diagnose first, then fix the actual problem.
→ Performance & cost optimisationMost AI is built and tested in English. For global deployments, quality silently drops in other languages — nobody notices until users complain. I find and fix those gaps.
→ Multilingual AIBefore a major launch, or when something isn't working as expected. I'll tell you what's actually broken, what risks you're carrying, and what to fix first.
→ AI audit & due diligenceMost AI projects fail because teams build before they've defined what success looks like. I always flip that order.
Every figure below comes from a live production system. Happy to talk through the context behind any of them.
Starting from zero, I designed and built the system that lets a top-five global consulting firm's internal AI answer questions from thousands of business documents — in English and Chinese. It now powers multiple client deployments worldwide.
Still running todayA supplier stopped sending certain product categories. Total row counts looked normal. The AI would have trained on bad data and given wrong predictions for weeks. My monitoring system caught it in time — before a single model retrained.
Adopted by 20+ engineers across 4 teamsEveryone assumed the AI models needed tuning. Two weeks of profiling showed the real problem: loading data took 60 minutes and blocked everything else. After the fix: under 5 minutes. Training went from weekly to daily. Costs dropped 70%.
Production standard · <200ms · 99.9% uptime"I've found problems clients didn't know they had — a multilingual AI silently giving worse results to half its users, a data feed corrupting model training for weeks undetected. The most valuable thing I do is often the measurement, not the fix."
"Before I write a line of code, I want to know how we'll measure whether it worked. Most teams skip that step. It's why most AI projects disappoint."
"I'll tell you if something isn't worth building. A consultant who scopes every problem into a large engagement isn't working in your interest."
"Lana is extremely trustworthy and reliable. She possesses a remarkable ability to grasp complex situations quickly, allowing her to make informed decisions and drive projects efficiently. Her leadership was instrumental in driving our projects forward and ensuring timely and successful deliveries."
"I highly recommend consulting Lana for projects involving data analysis. Her expertise guided me in selecting the right tools and methods, resulting in a very successful outcome. I was extremely pleased with both the results and Lana's professionalism as well as responsiveness."
Employer details anonymised at their request.
I take on a small number of projects at a time. Every engagement gets my direct involvement — not a junior team with my name on it. Typical engagements range from $5,000 for an audit to $20,000–50,000 for a full build. Retainers from $4,000/month.
Build AI that reliably answers questions from your company's documents — contracts, manuals, reports, policies. Works in English and other languages. Includes accuracy measurement so you know it's actually working.
Design and build AI that takes actions — not just answers. Data lookup, form completion, routing, multi-step workflows. Designed so failures are visible and recoverable, not silent and catastrophic.
Build the testing and monitoring that tells you whether your AI is actually working — and catches problems before users do. Includes dashboards, alerts, and a repeatable testing process.
When your AI system is too slow, too expensive, or too fragile to scale. I diagnose the actual bottleneck (rarely where you expect) and fix it — with every decision explained and documented.
Independent review of an existing AI system or planned build. I'll tell you what's actually broken, what risks you're carrying, and what to prioritise. Useful before a major launch or when something isn't working.
For teams that need senior AI judgment on an ongoing basis without a full-time hire. I join as a technical advisor or part-time AI lead — architecture decisions, team mentorship, strategic direction.
CTOs and engineering leads — this section is for you. Click any area to expand.
Built core RAG infrastructure for an enterprise-scale agentic platform — NV-Ingest for multimodal document parsing, dual vector DB architecture (Milvus + Azure AI Search), bilingual retrieval with Recall@N/Precision@N evaluation, and a Chinese-English concept mapping layer that recovered a 32pp recall gap. Benchmarked HippoRAG2 vs MS GraphRAG; 14× speedup demonstrated, kept as PoC given maintenance tradeoffs for frequently-updated corpora.
Built a 5-dimension agentic eval framework (tool selection, parameter accuracy, artifact generation, reference accuracy, output quality) tested against the live agent API — not mocked responses. Langfuse integration for cross-release regression tracking. LLM-as-judge calibrated against human judgments. Replaced ad-hoc spot-checking and caught regressions before production.
Re-architected end-to-end training and inference stack for a global retail analytics platform — pushed joins to Cassandra storage layer to eliminate Spark shuffle (weekly → daily training, 70% cost reduction), DuckDB for partitioned local loading (60min → 5min), FastAPI + Kubernetes HPA for inference, RabbitMQ for async persistence decoupled from the prediction critical path. Two-stage data quality monitoring with semaphore alerting adopted by 20+ engineers.
I take on a small number of engagements at a time. If you have a project in mind, reach out — even if you're not sure yet what shape it should take.
Send a message
Or book a call
Book a 30-min Intro CallNo commitment. I use the call to understand your situation and tell you honestly whether I can help — and if so, how.