👑 One database to rule them all | 🤖 AI you can't trust | 🎛️ The harness matters more than the model
Why harness engineering is the real unlock. How do you verify what an AI agent actually returns? Plus a hands-on Oracle pipeline agent build!
Hey folks,
I shared a few things this past week that I think are flying under the radar right now. A hands-on build, and two topics that I believe matter a lot more than the industry is admitting.
Here’s what you might have missed:
The biggest problem with AI that nobody talks about
I ran a quick poll this week: do you trust AI-generated reports? 10% said yes. 59% said only if they can verify it.
That’s exactly the problem I’ve been running into while building an AI-powered pipeline monitoring system. And it raised a question the AI hype cycle doesn’t want to answer: how do you actually trust the output?
Not ChatGPT answers. I’m talking about production BI reports and reliability dashboards that executives base decisions on. With SQL you can verify the query. With vector search and an LLM on top? The logic is invisible. That’s a real problem.
I Built an AI Agent You Can Ask Why Your Pipeline Is Failing - With Oracle AI Database 26ai
Most pipeline observability setups need 3 separate systems: a metadata store, a log store, and a vector database. That means 3 connections, 3 things to break, 3 places to debug.
In my latest video I show you how to consolidate all of that into one using Oracle’s AI Database 26ai, and build a LangGraph ReAct agent on top that answers natural language questions about your pipeline failures.
Ask why a DAG keeps failing on Mondays. It figures out whether to run SQL, do a semantic log search, or both. Runs completely locally via Docker, for free.
I learned about AI harnesses and you should too
When an AI system fails, the first reaction is always: we need a better model. But very often, especially in data work, that’s not the issue.
The real issue is that the system has no proper structure around it. That’s what a harness is. It controls which tools the model uses, what it can do, where guardrails are enforced.
I think it’s way more important than most people realize right now.
Now on YouTube and Udemy: Spark Declarative Pipelines on Databricks
My free 2-hour Databricks lab is now available as a full course on both platforms.
It covers Spark Declarative Pipelines and Lakeflow Designer: how to define Bronze → Silver → Gold pipelines with SQL, let Databricks handle dependencies and orchestration, and build visual pipelines almost no-code.
If you’re still managing pipeline logic in notebooks, this is worth two hours of your time.
That’s it for this time 🙌 If any of this hit home, especially the trust question around AI outputs, I’d love to hear how you’re thinking about it in the comments.
Talk soon,
Andreas
***
Ready to become a Data Engineer? Then join my Learn Data Engineering Academy today!
If you want to build real platforms, master the full stack, and close your skill gaps, check out my Data Engineer Coaching program.
If you are interested, but still have a few burning questions on your mind: feel free to contact me via hello@learndataengineering.com.
For more information and content on Data Engineering, also check out my other blog posts, videos and more on Medium, YouTube and LinkedIn!



I'm building something similar for our project, although using snowflake.
Thanks for sharing details..