The Future of Data Observability

Why Data Observability is Becoming Essential for Modern Data Engineering

Sep 25, 2024

In today’s data-driven world, maintaining the health of data pipelines is a crucial responsibility for data engineers. If you're one of them, you probably know the constant challenge of ensuring data flows smoothly, without interruptions or inconsistencies. This is where data observability comes in—a concept that is rapidly gaining importance in the field of data engineering.

Recently, I had the opportunity to sit down with Ryan Yackel, CMO at IBM Databand, to unpack the idea of data observability and its impact on the modern data engineering landscape. Below, I’ll share some key insights from our conversation, along with practical takeaways for engineers and teams looking to improve the reliability of their data pipelines.

What is Data Observability?

Data observability is essentially about having visibility over the health and status of your data and its movement across pipelines. Whether you’re working with data ingestion, transformation, or analytics, being able to monitor and predict pipeline failures is critical. Ryan explains that Databand helps teams “produce data faster but with more reliability.” The goal isn't just to see if the pipelines are up and running, but also to proactively detect issues before they become major problems.

This is particularly relevant as data engineering teams work with complex systems that include tools like Airflow, Snowflake, or Databricks. The interconnectedness of these systems means that a single failure can cascade through the entire pipeline, affecting everything from real-time analytics to machine learning models.

Ryan gave an example where Databand's observability layer sits over the entire pipeline, helping identify when “something breaks and how to go fix it,” which ultimately leads to more reliable data delivery.

The Role of IBM and the Data Fabric Strategy

IBM's acquisition of Databand was part of a broader strategy to enhance their data fabric offerings. Ryan explained that IBM has always been a leader in the AI space—long before the buzz around tools like ChatGPT. For them, acquiring Databand was about filling a gap in their data quality and reliability monitoring capabilities.

A data fabric, as Ryan describes it, is a set of architectures and tools that help organizations manage their data more efficiently. The acquisition of Databand allowed IBM to offer customers proactive monitoring tools to ensure the reliability of their data pipelines, which is essential when dealing with the massive amounts of data that modern enterprises handle daily.

This fits into IBM's broader strategy of offering a comprehensive suite of solutions for AI, data analytics, and machine learning.

Monitoring vs. Observability: What's the Difference?

While monitoring tells you if something went wrong, data observability takes this a step further by answering why it went wrong and how to fix it. Ryan elaborates on this distinction with an example:

Let’s say you’ve set up a simple Airflow DAG to process data. Monitoring would notify you if the DAG fails to run or complete. Observability, however, provides insights into the entire workflow—from the ingestion of raw data to the transformation steps and the final output—highlighting where exactly the failure occurred and why. It could be that an external API failed, or maybe a schema change caused the issue.

In Ryan’s words, data observability is about “a holistic view” of your pipelines, helping you understand the “what, where, and why” of failures.

Key Takeaways for Engineers Getting Started with Data Observability

For engineers looking to start their data observability journey, Ryan offers some practical advice:

Identify Critical Pipelines: Start by identifying the most critical pipelines in your data stack and set up simple alerts around them. This could include state alerts (is the pipeline running or not?) and duration alerts (is it taking longer than expected to complete?).
Set Proactive Alerts: Instead of reacting to problems after they happen, set up proactive alerts that can notify you of potential issues before they cause a full-blown outage. This could include schema changes, pipeline latency, or unusual data trends.
Look Beyond Monitoring: Don’t stop at basic monitoring. Implement tools that give you end-to-end visibility into how your data pipelines are behaving, and more importantly, tools that help you identify the root causes of failures.
Make Use of Modern Tools: Tools like Databand integrate easily with existing data stacks, whether you're using Airflow, Spark, Snowflake, or others. This makes it easier to get started without having to overhaul your entire infrastructure.

Observability is a Team Effort

One of the key points Ryan emphasizes is that observability isn’t just the responsibility of data engineers. It involves collaboration between different teams, from data scientists to operations and finance teams. Observability tools help break down these silos by giving everyone the same level of visibility into the data and pipelines.

For example, with Databand, both engineers and analysts can see how a failure in one part of the pipeline affects downstream processes, such as dashboards or machine learning models. This shared visibility helps teams collaborate more effectively and resolve issues faster.

Final Thoughts: The Future of Data Observability

As data engineering becomes more complex, the need for reliable, proactive observability tools will only grow. Ryan and I both agreed that data observability is more than just a trend—it’s becoming a necessity for any organization that wants to ensure the reliability and accuracy of their data.

Whether you’re just starting out as a data engineer or you’re part of a mature team, investing in data observability tools like Databand can help you ensure that your data pipelines are resilient, reliable, and ready for whatever challenges lie ahead.

Plumbers of Data Science Podcast

By the way, this blog post is based on a Plumbers of Data Science podcast Hero Talk episode. You can watch the complete conversation with Ryan on YouTube here. Or listen to it on all major podcast platforms here.

Best,

Andreas

🍀

Read my free 80+ pages Data Engineering Cookbook on GitHub: Read the Cookbook

Follow me on: LinkedIn | Instagram | X (Twitter) | YouTube |

Learn Data Engineering at my Data Engineering Academy, trusted by over 1,500 students 💪: Click here to learn more

Alice Guo

Sep 25

Hi Andrew, I have followed you for years and thanks for great content. For data observability, is there common metric you suggest to set up beside runtime per task, success status of airflow, table records count as such? it will be very helpful if you can share some example that these observerbility metric helps provide debug insight?

Expand full comment

The Data Engineering Insider

Discussion about this post