Why Adding Azure and Databricks to Your Skill Set Is Super Important These Days
Understand the importance of Azure and Databricks and get to know a great guy who can teach you working with them
Hey everyone! A few days ago I had an exciting chat with Nnaemezue Obi-Eyisi. Mezue is a seasoned Data Engineer with almost 10 years of experience. He comes from an electrical engineering background and holds a master’s degree in Data Science. Currently, Mezue is working as Senior Data Engineer at Capgemini, where he is specialized in Databricks.
Having worked across various industry domains, Mezue is driven by a strong desire to teach, mentor, and guide individuals into the field of Data Engineering. That’s why he created his intensive bootcamp ‘Azure Data Engineering with Databricks’. Let’s dive into his insights on Data Engineering and education in the following interview!
Andreas: Hello Mezue, great to have you here. Let's start with an introduction. Can you tell us a bit about yourself, what you do, and where you're from?
Mezue: Sure, my name is Mezue and I'm a Senior Data Engineer at Capgemini, a top IT consulting firm. I specialize in the Databricks Practice Center of Excellence. I'm originally from Nigeria, which you might guess from my accent. I am currently based in Houston, Texas in the US. I enjoy teaching Data Engineering and working in the field.
Andreas: What do you like to do in your free time?
Mezue: In my free time, I enjoy working out and dancing. I actually started dance classes a few months ago.
Andreas: Do you have a dance partner?
Mezue: Not yet, I just practice by myself for now.
Why Azure and Databricks should be part of every Data Engineer’s skill set
Andreas: You mentioned Databricks and Azure. Could you explain what those are?
Mezue: Sure. Let's start with Azure. Azure is a cloud computing platform built by Microsoft that helps a lot of companies maintain their applications and data analytics all in the cloud. It's a service model that lets businesses scale efficiently without worrying about the upfront investment of building their own data centers or infrastructure. As for Databricks, it's one of the most popular computing and data processing platforms in the Azure space. It's built on Spark and is used as a distributed framework to process and analyze data, helping businesses get valuable insights. Besides Data Engineering, it also supports Machine Learning and AI use cases.
Andreas: Why is this such an important skill to have? Can you give a few examples?
Mezue: It's really important because, take the healthcare system for example, they use data warehouses to keep track of patient stays in hospitals and to assist their finance teams in processing claims and managing payments. As a Data Engineer, it's crucial that we can use these skills to support clients like healthcare systems because they need to process a lot more data nowadays.
With the advent of wearable devices, there's more data being generated, and traditional on-prem data warehouses cannot handle this. That's why we need cloud-based tools like Databricks to help these businesses migrate their data analytics to a scalable infrastructure that can handle large amounts of data and also perform more advanced tasks like Machine Learning to predict diseases or other metrics that businesses need to track.
Andreas: How would you describe your experience with Azure so far?
Mezue: My experience with Azure has been very versatile. It offers so many different tech stacks that can help any business gain valuable insights on their data. In the Data Engineering space, I have worked on building pipelines using tools like Data Factory, which is a highly configurable ETL tool that helps you create a framework rather than doing tedious work.
For example, as clients migrate to the cloud, they often need to move large amounts of data, such as hundreds of tables to their cloud data analytics platform. As a Data Engineer, you wouldn't want to create hundreds of different pipelines, which would be too tedious. With a tool like Data Factory, we make it configurable so that once we build it, all we have to do is update a configuration file with the table names we want to pull, and it can automatically pull all that data for us without much change to the original pipeline.
The core pillars of Azure Data Engineering
Andreas: What are some core pillars for everyone who wants to get into Azure Data Engineering?
Mezue: To get into Azure Data Engineering, you really need to understand some basic fundamentals. First, you have to master fundamental programming languages, my favorites are SQL and Python. SQL is the language for all databases; literally, all databases speak SQL. Not only databases, but other processing engines in data warehousing, like Snowflake or Databricks, also use SQL to process data. Python is a highly effective programming language for building data transformation logic and controlling complex transformations.
Another pillar is learning about data modeling and data warehousing. This topic is really important because businesses need an easy way to analyze their data. When they're analyzing data, there are certain core principles that need to be in play. You need to know how to model data in a dimensional way, like using a star schema, and you need to learn about Kimball data modeling and some other best practices about slowly changing dimensions. I don't want to get too technical, but these principles help businesses keep a historical track of their data and be able to answer questions about what happened in the past.
Then, another core is ETL, which stands for Extract, Transform, and Load. We do that all the time as Data Engineers. We're extracting data from a source, many sources, be it a SQL database, a web API, or a CSV file, text file, whatever it is. We have to extract the data, store it in a central repository that can be centralized for analytics, and then transform that data using the tools I talked about like Databricks and SQL-based tools. Another pillar is Azure cloud experience. Since we are working in the cloud, you need to understand certain principles about working in the cloud.
Then finally, Databricks that I've spoken a lot about. It's hugely important because it gives you the foundation for distributed processing, which is super important in this day and age because of the massive volume of data, how frequently we're getting the data, and the different structures of data that we have to process.
Andreas: What are your prospects as someone who has these skills?
Mezue: Once you have the skills, some of the prospects are, number one, you can get a very good high-paying Data Engineering job. Data Engineers are one of the highest-paid fields in the tech industry. Junior Engineers, at least in the USA, make at least $100K, six figures just starting. And not only do you get a big salary and financial benefits, but also you get a lot of flexibility. For a lot of jobs, because of the nature of the work you do as a Data Engineer, you don't have to necessarily be onsite or go to the office every single day. You could do that at home, you can find fully remote jobs, you could be hybrid. It's very flexible.
And then another benefit I like the most is you can actually progress. When you start your journey and start as a Junior Data Engineer, within a year or two, you can really be a mid-level or senior, and if you continue to advance, you will gain even more financial rewards for that. And it all depends on you, how much motivation you have, how much you're willing to put in the work. You definitely progress, the sky is your limit. That's why I love this field; it doesn't limit you, you can be your own boss in a way.
An Azure and Databricks curriculum to success
Andreas: That's true. I've also worked a lot remotely, and it's a great feature of the Data Engineering job. Now, for these highly motivated people you were mentioning, you are also teaching this. How are you doing that?
Mezue: Yes, I'm teaching them all these things I talked about, the core pillars. Normally, I start with the SQL and Python review because my expectation is as someone that's trying to learn how to be a Data Engineer, you need to have some technical background. One of the best places that you have to prep yourself is in learning some basics on SQL and Python. That way, when you come into my program, I can take you to the more advanced concepts.
So, I will start with teaching you that SQL and Python review, making sure you understand, you internalize the core fundamentals that are really important for the beginning use cases. And then I start reviewing some of the cloud fundamentals because I know that a lot of people might not know what exactly is the benefit of the cloud, what's cloud computing. Some of the important concepts to learn are how do we deal with the network inside of the cloud, how do we deal with access control, how do we deal with the security of our data, what are the key benefits of the cloud, what are the different types of service offerings in the cloud. So, these are things that you get to learn.
And we also start diving into the core part of ETL, which is where we're going to learn a tool called Data Factory. It's one of the best tools to learn as a beginner because it really does a lot for you. Not only is it very versatile in connecting to multiple hundreds of data sources, but we also learn to use it to build frameworks that will help you make your life super easy in the real world to ingest as much data as possible without really spending a lot of time doing that.
Then we will also learn the data warehousing concepts, some basics of data modeling that I talked about that really help you having that grounding in what it means to build an analytic platform.
And we're going to have a lot of time that we spend with Databricks because Databricks is huge in the industry and it has a lot of use because that's where a lot of the processing, a lot of the complexity comes in. We learn from fundamentals around Spark, we learn about using the Databricks features, how it can help us process all sorts of data - structured, semi-structured, process streaming data - and really make our pipelines more efficient.
And finally, I'll give you a capstone project where you can now use all the things that we have learned. This capstone project is based on a personal real-world experience. I'm using my own experience from over 10 years and now compile a real great amazing capstone project that once you work on it, you get the real feel of what it is to be a Data Engineer in the real world. And after that, I'll support you with the resume, like preparing your CV, and give some interview tips.
Job-ready in a few months
Andreas: How long is this going, so what's the setup of this? Is this a one-on-one coaching for a few weeks or how is it structured?
Mezue: The coaching approach is like a form of a boot camp where it's at least minimum of three months. But as we're working on the content right now, it's actually looking to be close to four or five months because it's going to be covering a lot of materials. That way, when you are finished you are ready to hit the ground running. You are ready to start working in the industry and providing value to your client.
Andreas: And they're getting one-on-one help from you?
Mezue: Exactly, you get one-on-one help from me. I'll always be there. We have sessions every single week multiple times and we have office hours, a working hour session where you can work on the projects on the weekly assignments that I give and I'll be there to answer any question. But it's also for you to collaborate with your teammate and work on learning the materials.
Andreas: Awesome. Where can people find you, where can people find the boot camp?
Mezue: People can find this boot camp on my website. They can also follow me on LinkedIn, I'm very active there. You can follow my medium blogs as well where I write a lot of great articles that help even experienced Data Engineers with their difficulties in their day-to-day tasks. I'm also working on my YouTube channel so that will come soon to help people there with my videos as well.
Andreas: All right, I'm going to share your LinkedIn, your boot camp website and also your YouTube channel here for everybody available.
Mezue: Thank you so much, Andreas. It was a pleasure to be here.
Start Your Azure Data Engineering Journey Now
Already on fire to get into Azure Data Engineering with Databricks? Then get in touch with Mezue on LinkedIn or check out his teaching platform. With his well-founded boot camp you will learn how to become an Azure Databricks Engineer and acquire highly in-demand skills for a successful career.
You can also watch the complete interview recording with Mezue and me on YouTube.
By the way, Mezue is part of my newly established Turbo Mentors program that I offer in my Academy. Here I train the next generation of online coaches. You want to be part of it? Learn more here!
🍀
Read my free 80+ pages Data Engineering Cookbook on GitHub: Read the Cookbook
Follow me on: LinkedIn | Instagram | X (Twitter) | YouTube |
Learn Data Engineering at my Data Engineering Academy, trusted by over 1,500 students 💪: Click here to learn more