The Biggest Problem With AI Nobody Talks About
I‘ve been building an AI-powered pipeline monitoring system. Ask it a question, get a report. Cool concept, right?
But here’s the thing nobody in the AI hype cycle wants to admit: how do you actually trust the output?
Prefer to watch instead of read? I covered this hands-on in my latest YouTube video:
This Is Not a ChatGPT Problem
I’m not talking about asking ChatGPT a question and wondering if the answer is correct. That’s a different thing.
I’m talking about production systems. You have a BI report, a sales report, a reliability dashboard, and an AI agent is generating it. Your business makes decisions based on that report. Can you trust it?
That is the real problem.
Here’s What I’m Actually Building
I’m working on a system that analyzes pipeline failures. I have DAGs, ingestion jobs, aggregation, partitioning, and so on. I create metadata around runs and failures, I store logs, and I use embeddings and vector search to make everything queryable.
The idea: instead of digging through logs manually, you just ask a question. “How many pipeline failures in the last 30 days?” And you get an answer.
When it’s a SQL query under the hood, I can verify it. I can look at the generated SELECT statement and check if the logic is right. That part I trust.
But when it’s a vector search against log embeddings? That’s where it gets uncomfortable.
The Trust Problem With Vector Search
The system found schema drift issues on specific dates. It found quality check failures with exit code one. It generated a full reliability report based on all of this.
But how do I know that’s the complete picture? Maybe my semantic search didn’t match everything relevant. Maybe the embedding didn’t surface the right logs. The system doesn’t tell me what it missed. It just tells me what it found.
That’s the problem. With SQL, you can read the query. With vector search and LLM reasoning on top of it, the logic is invisible.
You Can Test It in a Controlled Environment
If you set up a controlled environment where you know exactly which errors exist, you can write tests. Does the system return the right results for this specific scenario? Yes or no.
That works in development. But in production? With real, messy data? With errors you didn’t anticipate? You can’t write tests for every possible failure mode.
And if this is generating a monthly sales report that executives act on, “it worked in dev” is not good enough.
The Old Way Was Predictable
Before all this, you had SQL tables. You ran queries. You got exact results. Before that, Excel with formulas. You could trace every number back to its source. It was deterministic. Predictable. Auditable.
Now you ask a question in natural language and get an answer that might be different the next time you run it. The same question, a different day, slightly different results depending on what the vector search surfaces. That’s not a bug, that’s how it works. But it makes trust really hard.
One Tool I Know That’s Working on This
One company I’ve worked with that’s actually tackling this is arato.ai. They basically compare what you gave the LLM against what it returned and measure how well the output fits the input. They also flag things like hallucinations and off-topic responses at scale.
But honestly, I don’t know of a solid, widespread solution for this yet. And I find it suspicious that most people selling AI-powered analytics tools aren’t talking about it at all.
This Needs a Real Answer
If you’re building AI systems that people rely on for real decisions, this isn’t a nice-to-have. Trust and verifiability are core requirements. Not features. Requirements.
I don’t have a clean answer here. I don’t think many people do.
If you know of libraries, frameworks, or approaches for testing and validating AI-generated reports at scale, drop them in the comments. I’ll be looking into this more and I want to know what’s actually working out there.
***
Ready to become a Data Engineer? Then join my Learn Data Engineering Academy today!
If you want to build real platforms, master the full stack, and close your skill gaps, check out my Data Engineer Coaching program.
If you are interested, but still have a few burning questions on your mind: feel free to contact me via hello@learndataengineering.com.
For more information and content on Data Engineering, also check out my other blog posts, videos and more on Medium, YouTube and LinkedIn!


