Data stacks are useless. Unless…
The 4 pillars of using data effectively and efficiently
Here’s some tough love: if you’re a SME¹ and you need a data stack, you’re kind of fucked… Yes, sounds a bit dramatic and harsh. Yet the struggle is real and a lot of SMEs are in a tough spot right now in terms of their analytics capabilities. But it doesn’t have to be this way.
The problem statement:
SMEs struggle to go beyond Excel reporting quickly, intuitively, and reliably enough without spending a “fortune”. Data analysts are the ones that feel the heat as they can’t perform the job they were hired for in the best possible way.
“If I had more time I would have written a shorter l̵e̵t̵t̵e̵r̵ blog post”. However, this is a multifaceted problem. How well it is solved can make or break a company. Hence you’re getting a 3 part series on the topic:
- Part 1: What’s a data stack and what is needed to create value out of data
- Part 2: How SMEs end up in a data delivery pickle and why not looking for a data engineer can increase your chances of hiring one
- Part 3: What a simple setup that goes a long way looks like and how to build one
What’s a Data Stack?
Data stack goes by many names. Data infrastructure, data setup, data pipeline are all viable synonyms.
If you work in data or tech most likely you are familiar with the term. If you are not, on the very fundamental level think of it the following way:
A data stack is a set of tools you use in order to help an organisation make better decisions with data.
Technically an Excel spreadsheet would count. Data stack can also take the form of a very complex system of many different tools connected with each other, where each tool is responsible for a very specific part of the process. In the context of SMEs’ needs, we are talking about something in between. A simplified modern data stack architecture will look as follows:
There is a lot of good resources² out there describing in detail what a modern data stack is (go check them out). Therefore let’s focus on the basics so that we are all on the same page.
So how does the data stack help you make decisions with data? Imagine data flowing through the pipeline while “stuff” is happening to it. This data flow is called the ELT³ process (Extract, Load, Transform).
You have different places that store data of your interest. You want to take (extract) that data in its raw format, and send (load) it to a central storage place.
Now that you have a single source of truth (a data warehouse) you can start blending, cleaning, and modeling (transforming) that data into something you can, potentially, make sense out of.
Once this is done, you can interpret the data and deliver the insights needed to make decisions.
You also want to make sure that this process happens automatically with the desired frequency — this is called orchestration.
In a nutshell, your data stack will include a tool (in one form or another) for:
- Getting the data out of the source (extraction + load)
- Storing it somewhere and being able to shape it to your liking (storage + transformation)
- Being able to analyse the data and share the results (analytics + visualization)
- Orchestrating the entire thing in a timely and reliable fashion (scheduling + monitoring)
The 4 Pillars of Using Data Effectively and Efficiently
The total amount of value a data stack adds on its own = 0. At a certain stage, it is however necessary to have one to even think about creating value out of data.
What does it mean to use data effectively and efficiently? And where does the data stack fit in that context?
If you want to create value out of data, there are 4 key links in the “chain” that lead to it: context framing, data delivery, data interpretation, and the process of creating the value itself.
Context Framing
First, you need to set a goal or hypothesis and figure out what is the right question to ask. Then you need to understand why do you want that question answered in the first place, what is the metric that will answer it and how it is defined. Finally, make sure you have access to the required data, and if not, figure out how to get it. I cannot emphasise the significance of this step. Sadly, decision-makers tend to skip it way too often.
Data Delivery
This is where the big chunk of the data stack “sits”. Now that you framed your context, you need to get your hands on the right data and get it ready for interpretation. Remember the ELT process described above? This is where it’s happening.
Data Interpretation
This is the realm of the data analyst where the magic happens. You analyse the data and (hopefully) get the answer to your question. Possibly you end up with even more questions, which isn’t necessarily bad. When it comes to the data stack, the tools responsible for analytics and visualisation are the key here. The T part of ELT, so transformation, will also “spillover” here from the data delivery pillar.
Value Creation
Last but not least, now that you derived all the knowledge and insights, you need to take certain steps and actions to add the value that you’re looking for. Otherwise, what’s the point of the whole “exercise”? Of course, doing nothing can be a perfectly viable solution in many cases. If it is, just be clear about it and communicate accordingly.
How All the Pieces Fit Together
It is important to note that this process is multiplicative, which is a fancy word to say that if one single link is zero, your impact will be zero. You can’t be good at 3 out of 4 and succeed. They are all essential and they all impact each other.
The Data Delivery Space Is Broken
It might seem it is not, because of all the tremendous advancement in the last years when it comes to available tech and engineering skill sets out there.
If examined closely, those benefit mostly large companies with deep pockets or deep tech startups that can attract the certain talent required to create value out of data.
However, from the perspective of 9X% of regular startups / SMEs, it’s not all rainbows and unicorns. Quite the opposite.
Now that we have the context framed, let’s focus on how data delivery has become the bottleneck.
If you find this relatable or just interesting, stay tuned for the next parts of the series. If you disagree, drop a comment below. I’d love to hear your thoughts on the topic.
[1]: I chose the European Union definition of an SME. Also, full disclosure: the companies I will be referring to operate within the startup ecosystem. That being said, startups fall into the SME category.
[2]: Here’s the list of resources going into the details on the modern data stack:
- Emerging Architectures for Modern Data Infrastructure; from A16Z
- The 3 Things to Keep in Mind While Building the Modern Data Stack
- Resilience and Vibrancy: The 2020 Data & AI Landscape
- Data Integration: The Definitive Guide
[3]: Traditionally the process is called ETL (Extract, Transform, Load). ELT is an evolution of the ETL which is becoming the new standard. If you are interested in the differences, I can recommend referring to the resources listed above.