As an expert and coach for Data Engineering I get asked a lot about Python skills for Data Engineers. Many of my students, and also potential students, get in touch with me via LinkedIn or Email and ask: Andreas, what skills do I need to work efficiently with Python?
Our Python Introduction Course
We have a course in the Academy about the Python fundamentals — perfect for someone who never really coded before. It’s called “Introduction to Python” and is held by Amit Jain.
Here, Amit is going through the absolute basics like how to create a main.py, mathematical expressions, strings, variables, loops, function lists and tuples dictionaries and sets. He also shortly dives into reading and writing JSON and CSVs. So as you can see, the course is about the bare minimums you need to have as a Python beginner.
Why I Think a Basic Course Is Not Enough
In my opinion, this basic course is not enough for those people who get in touch with me. And I didn’t like the way I had to interact with them.
When we had only the basics course in the Academy available, I had to send you out to other external sources for further information, with no specific path at hand. It’s like sending you down a rabbit hole where you spend a lot of time on things you don’t really need instead of just focusing on the very important things to become a data engineer.
That is why I created a course that will actually help them and answer their questions: “Python for Data Engineers”. With this course I wanted to dig a bit deeper into what the Data Engineer actually does and what Python tools he actually needs and frequently uses. There are a few things on the list.
The Skills You Need
First of all connecting to outside sources, to databases and APIs for instance. The connection to databases I think is one thing that is extremely important for postgres databases, psychopg2 and also some mysql languages or standard mysql packages. And then the requests for sending out data to APIs or communicating with APIs.
Then comes the whole processing part, like processing JSONs, reading in strings, making JSON objects out of them, and also JSON validation which is very important.
Datetime is also a crucial topic, because most of the time you’re working with timestamps and knowing how to process them is key.
Also Numpy is part of the processing area and essential for mathematical topics.
But the most important thing are pandas. You are going to use them all the time when reading JSONs, for example, or CSV files with a tabular form. With pandas, you can join the rows and columns of tables together and modify them, for instance.
Get Right Into It
I think these are the main things that every data engineer needs: connecting to outside data sources like databases, talking to APIs and then transforming the data and/or processing the data.
And this is exactly what I brought into the Python for Data Engineers course in my Academy as further addition to the introduction course. I think this is very helpful for people who want to evolve within Data Engineering and develop their skills in Python — so it will be worth working through this course.
Already on fire? Then get right into it at LearnDataEngineering.com!
Watch the Live Stream Recording on YouTube
In this live stream I was going through the Python course content step by step. Perfect for everyone who wants to get a better overview.
🍀
Read my free 80+ pages Data Engineering Cookbook on GitHub: Read the Cookbook
Follow me on: LinkedIn | Instagram | X (Twitter) | YouTube |
Learn Data Engineering at my Data Engineering Academy, trusted by over 1,500 students 💪: Click here to learn more