· Waheed Zarif · Applied AI · 5 min read
What Building an AI Tool for Regulated Teams Taught Me About Trust
I spent the past year building an AI tool for compliance and risk teams — largely solo, from the AWS infrastructure to the retrieval pipeline.
Over the past year I built and shipped an AI product called Egret — a retrieval-augmented (RAG) advisory platform for teams working in business continuity, risk, and cyber resilience. I did it largely solo: the AWS infrastructure, the retrieval pipeline, the full-stack app, the billing. Part of the motivation was honest curiosity — I wanted to keep my hands in the modern stack rather than just manage people who use it. But the deeper reason was a problem I’ve watched play out for 14 years across biotech, IoT, and the built environment: in regulated work, a confident wrong answer is worse than no answer at all.
That single constraint changed almost every technical decision I made. Here’s what I learned.
1. In regulated domains, citation isn’t a feature — it’s the product
Most people experience AI as a black box that produces fluent, plausible text. That’s fine when you’re brainstorming. It’s disqualifying when someone is preparing for an audit or interpreting a regulatory requirement, because the cost of being confidently wrong is measured in fines, failed examinations, or worse.
So the core of what I built wasn’t the language model — it was the retrieval and grounding layer in front of it. Every answer Egret generates is tied back to a specific source: the originating document, the exact excerpt, a relevance score, and a link to the original file. The model isn’t asked to know things; it’s asked to synthesize from cited material. The trust comes from the user’s ability to verify every claim, not from the model’s eloquence.
The lesson generalizes far beyond compliance: as AI moves into high-stakes work, “show your sources” stops being a nice-to-have and becomes the entire value proposition.
2. “I don’t have enough to answer” is a feature you have to build on purpose
Language models are trained to be helpful, which means their default failure mode is to fill silence with something plausible. In a regulated context, that’s exactly backwards.
One of the design choices I’m most glad I made was building an explicit “insufficient context” path: when the system can’t retrieve enough relevant material to answer confidently, it says so — and explains what it could and couldn’t find — rather than generating a guess. That sounds simple. It is not the default. It takes deliberate engineering to make a system comfortable admitting the limits of what it knows.
I think this is one of the most underrated problems in applied AI right now. The teams that win in regulated markets won’t be the ones with the most fluent models; they’ll be the ones whose systems are honest about uncertainty.
3. Data isolation is a trust problem before it’s a technical one
The moment you ask a risk or compliance team to upload their internal policies, you’ve made a promise — and you have to architect that promise, not just assert it.
For Egret, that meant a few non-negotiables: each organization gets its own dedicated retrieval index, not a filtered view of a shared one, so there’s no architectural path for one customer’s documents to surface in another’s results. Storage is isolated per organization. Everything is encrypted at rest (AES-256) and in transit (TLS 1.3). And critically, I built on Amazon Bedrock specifically because of its guarantee that customer data is never used to train or improve foundation models — queries are processed and discarded, not absorbed.
None of that is glamorous. All of it is the reason a regulated team would consider trusting the tool at all. In this market, your security architecture is your marketing.
4. Be honest about what you haven’t done yet
When I wrote up Egret’s security posture, I made a point of stating plainly that the platform is in public beta and has not yet pursued formal certifications like SOC 2 or ISO 27001 — and that VPC-level network hardening is on the roadmap, not done.
It would have been easy to imply more maturity than the product has. But the audience for regulated software is professionally skeptical; they read between the lines for what you’re not saying. Counterintuitively, stating your limitations precisely builds more trust than projecting false completeness. It’s the same instinct that makes a good risk register valuable: naming the gap is the first step to managing it.
Why this matters beyond one product
I’ve spent most of my career on the deployment end of technology — getting complex systems to actually work in environments where failure isn’t an option: thousands of sensors across public-sector sites, hardware in military housing, risk frameworks for high-stakes partnerships. Building Egret put me on the other side of that same problem, as the person designing the system rather than deploying someone else’s.
What struck me is how much the two have in common. Whether it’s a sensor network or a language model, the hard part is never the core technology. It’s trust — making something reliable, verifiable, and honest enough that a serious professional will stake their work on it. AI hasn’t changed that. If anything, it’s raised the stakes, because the technology is now fluent enough to be convincingly wrong.
The companies and teams that internalize this — that treat citation, calibrated uncertainty, and data isolation as first-class design goals rather than afterthoughts — are the ones who’ll earn the right to operate where the cost of a wrong answer is too high to guess.
That’s the problem I find most worth working on right now.
Waheed Zarif builds and deploys technology in regulated and high-reliability environments — IoT, PropTech, and applied AI. He’s the creator of Egret and writes about program management, risk, and AI.