How to hire a data engineer before Q5 2000-and-never

Look for data engineers where no one is looking

Aleksander Kruszelnicki
8 min readMar 17, 2021
Berghain, the infamous Berlin club, located in former power plant. Photo by Simon Tartarotti on Unsplash

This is Part 2 of a three-part series. Part 1 focuses on a data stack definition in the SME context and the fundamentals of using data effectively and efficiently. I’d suggest starting there as it explains the key concepts, and provides a good introduction to the problem at hand:

SMEs struggle to go beyond Excel reporting quickly, intuitively, and reliably enough without spending a “fortune”. Data analysts are the ones that feel the heat as they can’t perform the job they were hired for in the best possible way.

In Part 2 we will address the following questions:

  • Why is data delivery a problem for a lot of SMEs?
  • How do you hire a data engineer when everyone else wants one too?

How SMEs end up in the data delivery pickle

Imagine you start a company (maybe you did). In the vast majority of the cases, you won’t have any quantitative data to look at at the beginning. You will be making decisions based on qualitative data (e.g talking to your customers) and your experience, i.e. gut feel, i.e. “calculated guess”.

Things are going well, you manage to get customers, you hire first team members. You start collecting quantitative data. Some of it lives in spreadsheets, some of it in the different tools you use to help your business (e.g. Google Analytics, CRM, SaaS applications, etc.). If your business is tech-enabled chances are you are building software in-house that also collects data.

Your data is all over the place, but this is ok at this stage because you’re still figuring things out. Your distribution channels, the product / service itself, processes and tools to support those, etc., keep changing frequently. So do your data points and data models as a result. You want to help your growing team’s reporting needs, hence you hire your first data analyst.

Fast forward and you either reached or are just about to reach product-market fit. This is where things start to get interesting. By now you probably have questions you can’t answer reliably or quickly enough with the current data setup in place.

If you want to keep driving your business forward at scale, you need to. If you are curious what those questions might be, a case study on one of our clients has real-life examples.

No need to panic. You have a data analyst or two, so you should be fine, right? Well, if your data is spread across multiple sources and difficult to extract then you’re not.

Data analysts are great with SQL, specifically when it comes to analysing data with it. However, to do that they need easy and reliable access to organised data.

Remember the data interpretation pillar from Part 1 of the series? You have people on your team that are experts in that. But your data delivery needs have become much bigger by now. You need to take a step back and address those.

You need someone who knows how moving data from A to B works beyond the conceptual level. How to do it repeatably, as fast, and as cost-efficient as possible. How to “transform data into a useful format for analysis”.

You need a data engineer, or a person with an equivalent set of skills, to unlock a data analyst’s potential, and as a result to unlock the company’s data potential.

And this is where SMEs hit the wall. Taking that very first step towards having a reliable data infrastructure is freaking tough. You are perfectly aware you need a data stack and why you need it. But you don’t know how to go about putting one in place. Usually, it’s because at this stage you don’t have the people on the team with this particular set of skills (or you are unaware you do).

This sucks for the data analyst. They are unable to give their best due to something beyond their control. Can you imagine how frustrating this is? Been there, done that, and rest assured, it ain’t pretty.

This sucks even more for the whole company. To successfully compete these days you need to be able to extract value out of your data. Failing to do so can lead to, well, failing completely. Given the current pandemic situation, it is even more crucial, not to thrive, but to just survive.

Well, hire a data engineer… Duh?

There is an infamous techno club in Berlin called Berghain. It’s infamous because of its door policy and the fact that (almost) always there’s a massive queue to get in. The wait time can take up to a good couple of hours. The guarantee to get in? 0%. People still queue up and wait. Most of them will end up hearing the classic “Heute leider nicht” (“Unfortunately not today”).

How is that relevant? If you want to hire a data engineer today, you are in that queue and the data engineer is like Berghain. The only difference is the wait time — it’s weeks, if not months.

According to Linkedin, as of February 1, 2021, the total data engineering market in Europe, defined as people with a data engineer title and at least 1 year of experience in the field = 32K people. Current demand in Europe? Roughly 48K open positions for a data engineer role… The demand constitutes 150% of the total market available… When we look at the US it is even worse. TAM = 41K and the demand = 79K job postings. The demand constitutes 193% of the total market available. Let me say it again…

In Europe there are 32K data engineers and 48K open positions to hire one. In the US the ratio is 41K to 79K…

According to Dice 2020 Tech Job Report, data engineer job postings in the US grew 50% YoY in 2019, and the average time to fill out the position was 46 days with a prediction to significantly increase in 2020 due to increasing demand. One of the potential clients told me in February this year, and I quote: “We are fucked… We thought we hired a data engineer but the candidate pulled out last minute. Twice…” They’ve been trying to recruit since November 2020.

According to Stackoverflow Developer surveys the global median salary for a data engineer position in 2020 was 65K USD (4th position on the list). In 2017 it was 55K USD. So it grew 18% over 3 years. What is interesting here is that according to the report “data engineers command a disproportionately higher salary compared to developers within a similar level of experience in different roles”. As an example, in 2020 there was around a 20% difference in median salary between data engineers (65K USD) and full-stack developers (54K USD). Salaries are usually a good indicator of how scarce a certain role is.

If you are planning to solve your data delivery challenges before Q5 of 2000-and-never, you need to get a little bit more creative.

Don’t hire a data engineer. Hire “Liam Neeson” instead.

Wait, what? Yes, you read that right. Technically speaking you don’t need a data engineer. You need L̵i̵a̵m̵ ̵N̵e̵e̵s̵o̵n̵ a person with a very particular set of skills. Set aside the titles, the tools, and the buzzwords. Examine the role closely (start here). Then put it in the context of what you are trying to achieve and you’ll see it too.

Data engineering is like any other kind of engineering. It is more about a mindset, an approach, a certain level of stubbornness. It’s problem-solving. Software engineers are excellent problem solvers because this is what they do all day, every day for a living. You’re looking for a Python or Java proficient backend developer who is interested in transitioning into data engineering (or just likes building big systems that work reliably). This is very common these days, especially when you consider the salary differences. And you want to catch that person before they make the transition.

Backend developers might not be solving data-specific problems daily. That being said, solving problems around building production-ready systems is their bread and butter. This includes:

  • Architecting a system that goes beyond the most immediate use case
  • Adding new features without breaking anything
  • Monitoring the entire thing to detect when something breaks and fixing it before anyone else notices
  • Dealing with a lot of moving pieces at every stage of the product development cycle

And on top of that, they know how to create an agile working environment allowing them to ship improvements incrementally, with a minimum amount of manual work. If you want a reliable data infrastructure — whoever builds it will need to cover all of the above.

Another thing to consider is that you will be building a data stack from scratch. The emphasis here is on: “from scratch”. There is a huge difference between greenfield projects and working on already existing products or services (this includes data infrastructure). The reason being is a completely different set of problems to solve and decisions to make. The most important question that needs to be constantly answered is: “What won’t we do and still add value?”

Therefore you’re looking for someone who has done greenfield projects before, ideally multiple times. Someone who has the 0 to 1 mentality and knows that consistency beats perfection. Ask yourself this: Who will have a steeper learning curve? A python developer with multiple greenfield projects behind the belt, but zero data engineering experience? Or a data engineer who only worked on a pre-existing infrastructure?

Last but not least, you want a good communicator. Unless you’re planning to add a product manager into the mix (not a bad idea but at this stage an overkill). Great communication skills always go a long way but in this scenario are key. There will be shareholders to manage. Data analysts, decision-makers, people who will look at the metrics and teams who produce data points for those metrics. Garnering trust and managing expectations is hard. Being able to get to the bottom of company needs and gather requirements the right way is even harder.

“Liam Neeson’s” profile in a nutshell

  • Seasoned Python / Java developer
  • Built production-ready systems from scratch
  • Exceptional communication skills (successfully led teams in the past)
  • Bonus points: experience with integrating different systems via APIs

Now mix in some of the boilerplate bullet points based on your particular use case, and you got yourself a job posting that might attract just the right crowd.

Finding a seasoned data engineer, experienced in setting up data stacks from scratch is ideal. Your chances of landing one within a reasonable time frame? Extremely low.

Flip the script instead and look for a data engineer where very few are. Give the above profile a go. It will take them a little bit of time to catch up on all things data engineering and brushing off their SQL skills. But you’d be surprised how quickly good developers can figure things out. And if you have an in-house tech team building your product, chances are you already have a good candidate on board. Just ask around.

Bonus (general) recruiting tip: talk to folks in your network who very recently made a hire (and who you trust). Ask them whether getting contact details of their second and third bests in the process is an option. Sometimes the reasons candidates didn’t make the cut can be completely irrelevant from your perspective. Or be exactly what you’re after.

But does it make sense to make a full-time hire after all?

That is a perfectly viable question to ask. Depending on your exact needs, how many data sources you have, how unstructured the data is, what kind of metrics you want to calculate etc. etc., you might not need a full-time hire to take care of your data stack.

In Part 3 we will go through a simple data setup that goes a long way and doesn’t require a full-time data engineer to implement and maintain.

--

--

Aleksander Kruszelnicki

“A problem well stated is a problem half solved” — Charles Kettering | Bridging the gap between business and data | Co-founder of leukos.io