Your RAG works in the demo. Users break it in five minutes.

Build retrieval that holds up under real queries. We design ingestion pipelines, chunking and embedding strategies, and evaluation harnesses that keep retrieval useful when it matters.

Most RAG systems look fine in a demo, then fail on real questions. The data is stale, the chunks are wrong, citations are missing, and retrieval gets you vaguely related content instead of the thing the user asked for.

We build production-ready retrieval. Ingestion and data pipelines that actually update. Chunking and embedding strategies that preserve context. Citations that point to the right source. Evaluation harnesses that tell you when quality drops.

The goal is retrieval that works when it is under pressure, not just when it is on a slide.

Ingestion pipelines and data freshness

Reliable ingestion that keeps data up to date. No more retrieval from last quarter.

Chunking and embedding strategies

Chunking that preserves context and embedding strategies that actually retrieve what users mean.

Citations and source attribution

Citations that point to the right source. No more hallucinated or vague attribution.

Evaluation harnesses and quality testing

Measure retrieval quality with evaluation harnesses, regression tests, and acceptance criteria.

Ready to discuss your needs?

Book a 30-minute call

Services offered

Things That is a product engineering practice focused on building AI systems that help people make sense of complexity. For over 20 years, we've worked with teams at Google, IBM, Air New Zealand, Kpler, EE, News UK, Tesco, and The Economist, sitting in the space between product strategy and hands-on engineering. When specialist help is needed, we work with a network of senior consultants and product designers.

We're not consultants who hand off to developers. We're product engineers and designers who think strategically about what users need, then build it, from architecture to APIs to interfaces to production deployment. Our work is hands-on, writing code, reviewing pull requests, designing schemas, and testing edge cases, but always with the human experience in mind.