The no-nonsense take on data engineering — part 2
The role, the skill gap in the job market, the challenges around the knowledge transfer. No sugar coating.
We caught up with Peter Fabian and Daniel Molnar, founders of Pipeline Data Engineering Academy to learn about the first data engineering oriented boot camp out there. We learned so much more, getting great overall insights on the current most in demand field in data.
This is part 2 where we discuss Daniel’s and Peter’s approach towards skills transfer and how it shapes the boot camp. You can check out part 1 here.
Aleks: Let’s focus on the boot camp and how you teach data engineering. Can you tell us about your approach?
Peter: So before we got to teach, first, we had to create a curriculum. And we spent last year developing it. Data engineering is something like five years behind data science when it comes to educational opportunities. Whether you looked at university education or college education, or online courses, there was a lack of structured learning in data engineering. As we already established, defining what a data engineer is and what the skill set should look like was the actual first challenge.
Daniel: And we very much try to be incredibly hands-on. If someone walks in the door, they should be able to load the CSV file, create and drop a table, you know, do very practical stuff. It’s great if you can do TensorFlow in the cloud. You might not need it at your company at the moment. Maybe you will later, maybe you won’t. But right now, you have to make sure that your data pipelines are running. You have to maintain the code that runs in the production environment. And you should be able to talk to all these different stakeholders based on this: analysts, salespeople, and product owners, who have different agendas. At the end of the day, somebody has to make sure that this whole thing is running. What Peter likes very much is this metaphor: this is like civilization that can turn the tap and the hot water comes out of it. And a lot of things are happening in the back to make it happen. You will start shouting when it stops flowing. You wonder what’s going on. Well, something is going on. So I think we try to take this down-to-earth approach with our boot camp.
Aleks: How did you decide on the curriculum?
Peter: On the one hand, there is the traditional software engineering perspective, working in teams, and building software. And then you have what Daniel calls the Depeche Mode mindset, meaning what is trendy and hyped. Two months ago, there was a new hot thing on the market and two months from now, there will be another new thing. So data engineers are expected to be up-to-date. However, they also need to know that the tool that they’re using today might be gone in three years from now. So it’s always about understanding trade-offs, selecting what’s timeless and what works. At the same time, you need to deal with the latest stuff, because companies are trying it out. So this is the balance that you have to strike.
Aleks: I can see that’s a tricky balance to strike. How do you walk a fine line like that?
Peter: Our goal should be to give our students or participants a compass instead of a map because maps are easily outdated. And in this data tooling and data methodology landscape, a map might be outdated in two weeks, or a year. But if you have a compass, if you have an understanding of how to use it and how to navigate that ever-changing landscape to create value, to create maintainable systems and infrastructure, that’s what should be your guiding light.
And this was the main challenge when we defined our curriculum or syllabus and the whole approach to it. Our main goal with Pipeline Academy is to give people an opportunity to find a job as data engineers, or to help those working as, say data scientists, data analysts or even product people, understand the data domain better. For example, we see that there’s a huge need for product owners and product managers who understand the data engineering process or the fundamentals of infrastructure.
Aleks: There’s one staple part of the curriculum I would like to touch upon, namely sustainability. Can you elaborate on this?
Peter: Something that’s very unique to us is the fact that we include sustainability as a vital part of the data engineering process. The example that I usually use is that if you’re an architect today, in 2021, you’re expected to know about the latest materials, the latest technologies, and how to build a building that is efficient and doesn’t harm the environment. It’s a basic thing. But for some reason, we ignore these factors or we have ignored these factors a lot of times when it comes to software and engineering. If I think about the European Union putting out, I don’t know, a tender in 2025 for a company to work with or get some funding, they’re going to ask how efficient your data infrastructure is. How much energy does it use? How can you make that less and reduce the environmental impact? We teach our students about this. If I have to decide between two cloud providers, I’m going to take the one that uses less energy. However, data infrastructure can easily become a big cost driver. You have to understand that every decision you make about infrastructure will have an ecological and social impact.
Aleks: This is very true.
Peter: Just to give you an example, if you don’t write maintainable code and you leave a company, the next person who comes there will have the so-called “spaghetti code” on their plate to deal with. It’s not fun. It’s important to work in a manner that can be understood by others so that your weekends are not spent figuring out what this person did with the code. And there are various other social aspects of dealing with data, of course, such as how you deal as a responsible company with other people’s data.
Aleks: And how is the boot camp structured? How long does it take to go from “0” to enough applicable knowledge?
Peter: There’s a primer we very strongly recommend students do before the boot camp starts. The boot camp itself runs for 12 weeks and it’s designed for a group of no larger than twelve people. We set ourselves quality standards. We want to be able to spend enough time with each of our students and be able to personalize the experience the right way for them. Last year, of course, online education had an amazing boom. Because of the pandemic and the time that people could spend at home. And 2021, you’re seeing the results. So the outcomes and the success rates in online education haven’t changed. People who are on Kurzarbeit [in Germany cutting someone’s hours, with the government paying the salary difference to the employee], or who are in-between jobs, can receive a voucher, the so-called Bildungsgutschein, to join our boot camp, basically free of charge. So the federal government takes care of the funding. I think this is a tremendous opportunity for people who are in this situation to learn something very valuable on the job market right now. The course is still fully online, so there’s a different dynamic to a classroom experience. But I love it. It’s amazing to see people and how they grow and how they learn. Of course, we’re going to keep our curriculum updated because the tooling and the landscape are changing, and we want to be able to respond to the market.
Aleks: Are you planning to scale the boot camp up and increase the number of students in each class?
Peter: I’m reluctant to just scale up and say that, okay, from now on, we’re going to have 40 people, let’s say, and with some increased staff and more teacher assistants. I don’t want to jeopardize the students’ learning experience. We’re trying to focus way more on quality than on quantity. I don’t expect that online education will go away for Pipeline Academy. I just want to find the right balance between meeting people and catering for those who are not necessarily in our geographical region or even in our time zone. In January and February, we did some mentoring work with some people in Silicon Valley, and they understood how acquiring data engineering skills could bump up your salary to six figures. The demand is just insane and the market there values data engineering very differently when it comes to actual salary.
Aleks: Going back to online education, I couldn’t help but notice a massive spike in people posting their newly finished courses through various online educational platforms. What is your take on certifications vs practical skills and knowledge transfer?
Peter: What we are witnessing working in education is that there are two distinct businesses within this realm. There’s an education business where people transfer skills and knowledge from one person to the other person. And there is also the certification business where people want to have not just approval, they also want to have another “star” on LinkedIn. And you see that these two businesses are often not related to each other. You provide some seal of approval after an educational process, but often only the final certificate is required, not what you’ve learned. We’re super hands-on with the boot camp. It’s not solely focused on theory. You have to have some theory. But the boot camp’s very much about exploring, writing code, failing, learning from that, and then succeeding.
Aleks: There’s one thing, though, about certification that I want to ask, and it also ties in a little bit to what we said about management earlier. How do we demonstrate to people who are not data people our ability to do the job that they are asking us to do?
Daniel: I have two perspectives. Let’s say that you have to restructure your company in terms of teams because of the pandemic. You need to make some tough decisions and you want to make sure that you keep the people that bring the most value. Put your marketing team on Kurzarbeit for a month or put your data engineers on Kurzarbeit for a month and then see what happens.
Aleks: So you’re saying that way a manager would be able to see the value data engineers bring to a company?
Daniel: Exactly. The other thing I have been discussing with Peter lately is being able to present some kind of a project portfolio based on which people outside the data realm can gauge the skills. We are still working out the “what” and “how”, but at the end of the day, you need to be able to demonstrate you can ship a data product.
Aleks: Your 3rd and final cohort of 2021 starts in October. What is the best way to follow you or to get in touch in case someone would like to grab the last remaining seats?
Peter: I encourage everyone to check out www.dataengineering.academy, our website. We’re active in social media and we also produce podcasts. If you’re interested in data engineering, check out our blog on our website. We try to share as much with the community as possible. Even if you are not interested in joining the boot camp, you will find recommendations for good reads and online courses we’ve tested. We try to share insights about the job market, what data engineering is about, and what are the new things that we consider important. And if you want to talk to us, we’re trying to make ourselves as accessible as possible. So every first Tuesday of the month, we have a virtual open house on Zoom and everybody can join to ask questions.
Aleks: Thank you for such a fascinating insight into the impactful and meaningful work you’re doing.