The no-nonsense take on data engineering — part 1

The role, the skill gap in the job market, the challenges around the knowledge transfer. No sugar coating.

Aleksander Kruszelnicki
7 min readAug 27, 2021
Photo by christian buehner on Unsplash

We caught up with Peter Fabian and Daniel Molnar, founders of Pipeline Data Engineering Academy to learn about the first data engineering oriented boot camp out there. We learned so much more, getting great overall insights on the most in demand profession in data.

This is part 1 where we dive into the data engineering role itself.

Aleks: So tell me about yourselves and why you started Pipeline Data Engineering Academy.

Daniel: I’ve been involved in data for twelve years and in start-ups for twenty. During that time, I’ve worn different hats, such as data analyst, data scientist, data engineer, product owner. But somehow I kept returning to data engineering because I feel that’s where you can add a lot of value somehow. When I was trying to hire for open positions in my teams, I realized that it’s super hard to find data engineers, especially junior data engineers. So I decided to team up with Peter who has experience in product/service development in the digital communication and innovation space.

Peter: Like Daniel, I’ve worn different hats in the past. I’m a teacher and economist. Over the last ten years, I’ve seen the need for people who have the right skills and competence to deal with data in the new products that are being built, whether these are hardware products with sensors or digital interfaces where user content is being generated. It’s been very exciting over the last five or six years to see the roles working with data that have developed. Data is at the forefront of innovation, as it generates tremendous value for an organization. There’s a huge demand for data engineers right now. But companies can’t fill their vacancies.

Aleks: The struggle is real. Finding data engineers at the moment is an incredible problem. It makes our life a little bit easier because being a two-person agency, we can’t complain about the lack of work. A lot of companies are turning to alternatives other than hiring on a full-time basis.

Peter: So Daniel and I figured there was a need for a way of teaching data skills. And this is how the boot camp idea was born. It is important to understand that the reason why Pipeline Academy exists is the demand we felt from the market: the challenge not just to hire people, but to hire people with the right skill-set. We wanted to focus on the engineering part of the data puzzle as it is incredibly underserved from the education perspective.

Aleks: Speaking of the shortages, I’ve run some quick LinkedIn searches and I found in Europe alone that there were 48,000 data engineer positions available, yet 32,000 people with that title, who have at least one year of experience in the field. There’s even a bigger gap in the US. Why do you think this is?

Daniel: I think the reason is executives’ incompetence. I mean, it’s very simple. They are in charge of allocating the budgets, figuring out the challenges and plans to address them. And so all of a sudden it’s like it’s snowing. We didn’t expect it. It’s winter. Up until now, everyone was riding the “machine learning this, AI that” train, hiring tons of data scientists. But to do machine learning you need to cover the basics first. You need data and a reliable infrastructure to process and utilise that data. This is where data engineering comes into play. And that part was ignored.

Peter: I would agree with Daniel. I think there’s some confusion about what the role of a data engineer is. And the definition is challenging. It starts with asking what that person should do in my team. What are the executable tasks that end up adding value? Then you need to transfer this into a job ad. And the reason it is hard is that it might differ from company to company. Except for the engineering part, it might include some analytics, maybe some data science stuff. Maybe it is much less engineering but more data modelling. And I also see that the hype around certain professions, tools, and programming languages distract people from what’s useful and creates value.

Aleks: Great segue. Let’s define then what is data engineering and what data engineers do.

Daniel: I believe data engineering is integration engineering, first and foremost. I mean, there are going to be systems around you, there are going to be service providers, cloud providers, infrastructure that is already in place, and you should try to figure out how to make sure that every piece of this massive puzzle can “talk” to other pieces when it’s needed. Besides that, I think it’s mostly about communication. It’s one of the most communicative engineering fields, I think, again, because of its nature of integrating things. If you are a front-end developer, for example, you’re very much focused on your domain. Data engineers should have more of a generalist approach. They should understand what’s happening in the front-end, the back-end, and around the infrastructure, so being familiar with the dev-ops topics. They have to speak all the “languages”.

Aleks: So they “just” need to understand the whole landscape, and be able to talk to machines and humans?

Daniel: Yes. They need to understand data storage, data acquisition, data management, tracking, cloud providers, and much more. And they need to understand how data is passed between the machines and how humans interact with it. There’s no way around it if you want to be successful in the field. And these people don’t grow on trees…

Aleks: If they don’t grow on trees, where do they grow? What kinds of people do well in the data engineering role? How can you tell that someone has a knack for it?

Daniel: Great question. Data engineering is a blue-collar job. There, I said it.

Aleks: I like that. Very interesting perspective.

Daniel: It’s true. The best system administrator — who today you would call a data engineer — that I ever worked with had been a car mechanic. And I’ve seen someone coming from that background more than once. They simply like to solve problems. It’s more about the mindset. Poking here and there, patiently checking where things break or could break. And if something works, don’t touch it.

Peter: Data engineering is about having a commonsensical approach to problems. A good data engineer can solve a problem just with good old communication, without writing any code. Now, of course, a data table won’t move from A to B if you don’t write the code and don’t maintain that pipeline, but if you can solve it without new or additional code and understand that adding more complexity doesn’t generate value to anyone, you did a good job. Don’t build a bridge if there’s no need for a bridge. Getting things done and being results orientated should define both data and software engineers.

Aleks: I couldn’t agree more. I think that building and maintaining the data infrastructure is essentially product development. And coming from that background as a product manager, I can vouch that there’s no greater satisfaction than cracking a problem without building anything.

Peter: It’s also important to understand the differences between a data analyst, a data scientist, and a data engineer. They are substantial and crucial to wrap your head around if you are considering a career move into the data field. We have a lot of free resources online on our blog that explain all this.

Daniel: Our lives today are not like 50 years ago, where you entered a profession and stayed in it for the rest of your working life. You now need to be constantly learning, especially if you work in data engineering. But if you invest time in learning, it will pay off.

Aleks: And how do you think the pandemic has affected data engineering jobs?

Peter: We have found that data engineering has been resilient to the pandemic because companies need to get data about their products and customers, even when revenues are low. Having people who are skilled at using data, whether in tech companies or non-tech companies, is essential more than ever. You mentioned yourself, despite the lockdown and the lay-offs caused by it there’s no lack in demand for the role. There’s a significant shortage in supply though. That’s where we can support. We want to help people to take advantage of the opportunities that are out there.

Aleks: I know you are tool-agnostic, which is something I not only respect but also really like about you two. Regardless, I will ask about the data engineering toolkit. Mostly because the landscape is massive and I believe you can offer a piece of advice to make it easier to navigate. What are the tools that you think are here to stay?

Daniel: Great question. A good rule of thumb when thinking about it is this: when you pick a technology, check how long it has been around. You can then expect it will stay around for at least the same amount of time. Think about the QWERTY keyboard. We have had Unix for 50 years now, SQL for over 40 years and Python for 30. I expect them to stay here for another few decades.

Software these days is a bit like a religion. Everyone has an agenda. It’s all fine. You just need to evaluate whether you need that complexity in your architecture, especially when you don’t have the slightest idea why someone made their solution this way. And it also comes back to the fact that it has to be maintainable. You have to be able to understand it. If you have a simple and dumb solution that’s not elegant, but it does the job and you can handle it, that’s the good solution.

--

--

Aleksander Kruszelnicki
Aleksander Kruszelnicki

Written by Aleksander Kruszelnicki

“A problem well stated is a problem half solved” — Charles Kettering | Bridging the gap between business and data | Co-founder of leukos.io

Responses (1)