How to increase the ROI of your data stack

Your data stack is an expensive investment. Here are six ways to make sure that investment pays off.

October 13, 2022
Matt Hammel

The pendulum has swung and times are changing, fast. Gone are the days of 24-hour turnaround term sheets and 200x revenue multiples.

With uncertain economic conditions and tightening investor funding, startups must do what they can to maximize their runway. The common prescriptions: reduce headcount, cut back on perks, or raise prices (when possible) – whatever it takes to get to default alive (or at least default investable). 

But, there’s another option that’s often overlooked: Improving the ROI of your data stack.

For the purposes of this post, “data stack” refers to…

👥 Team members, including full-time employees and contractors
⚙️ Tooling, aka everything in your modern data stack
💵 Services costs to support and maintain the above

How much does a typical startup spend on data analytics?

While data budgets will vary widely from one company to another, it’s not uncommon for a Series A to Series C startup to spend between 10 - 20% of ARR annually.

Below is an approximate range for what a Series B company that “fully” invests in its data stack might spend annually and how that breaks down. (The data below is a composite of a few companies at this stage that the AirOps team has worked with):

Example investment outlay for a Series B company's data stack
Stage: Series B
ARR: ~$8m
Layer
(Example vendors)
What they do
(When setup properly…)
Annual Cost Range
ETL
(Fivetran, Airbyte)
Extract, transform, and load data into your warehouse $10-60k
Warehouse
(Redshift, Snowflake)
Centralize your data $50-300k
Modeling
(DBT, LookML)
Standardize queryable metrics and tables $0-$30k
BI Tool
(Tableau, Mode, Looker)
Visualize your data $20-100k
CDP
(Segment)
Consistent customer data in all your marketing tools $0-50k
Product Analytics
(Amplitude)
Understand product usage $30k
Reverse ETL
(Hightouch, Census)
Transform and load data from your warehouse to SaaS tools $0-20k
Head of Data x1 Data strategy, team building, and oversee build/management of stack $225k
Data Eng x0.5-1 Build and manage data pipelines $90-180k
Data Analyst x1-2 Create / manage tables and metrics, build dashboards $150-300k
Total $0.6-1.4m
Annual data stack cost
(as % of ARR)
8-18%
Portion that is headcount 50-80%
Portion that is tooling 20-50%

When compared to other line items in your budget, like marketing, sales, R&D, and G&A, data stacks up similarly. Here are some operating expense benchmarks for SaaS startups:

  • Sales & Marketing: 45-65% (20-40% each)
  • Research & Development: 20-30%
  • General & Administrative: 10-20%

As we’ve written previously, it’s difficult to rigorously measure the hard ROI of your data stack investments. This presents a Catch-22: Spend 10 - 20% of your ARR on data without clear ROI, or forgo well-instrumented metrics and fly blind – or worse semi-blind – at life or death milestones, including:

  • Fundraising rounds
  • Board meetings
  • Ramping up marketing and sales spend
  • Ramping up headcount

Well-defined metrics are also critical ingredients to transparency for your team, especially in All Hands and weekly business reviews / leadership meetings.

Unfortunately, we’ve seen many companies invest in a data stack and find themselves underwater on that investment. While measuring data ROI is difficult, there are concrete ways you can improve your ROI from wherever you sit now.

How can you generate more value from your data stack?

There are two primary ways to squeeze additional value from the investments that you make in data: 

📈 Generate more value from your data stack (all else being equal)

or…

📉 Reduce the cost of your data stack (overall or as a percentage of revenue)

This simple formula sums it up:

The data stack ROI equation.

This calculation offers a helpful way to frame the data ROI issue because, mathematically, there are only two ways to increase ROI:

⬆️ Increase the numerator

or…

⬇️ Decrease the denominator

There are different ways to move those figures – levers you can pull in your startup – to increase the ROI of your data stack.

A diagram that shows how companies can increase the ROI of their data stacks: Increase value or reduce cost.

How can you increase the value of your data stack?

There are three levers you can pull to increase the value your data stack generates:…

  1. Improve organization-wide data accuracy and increase trust in data
  2. Build analytics products that generate direct business value
  3. Improve adoption and usage of data products

Let’s dive into each.

Lever #1: Improve data accuracy and increase trust

When your team knows the data they’re using is accurate and fresh, they’ll be empowered to make better, more data-informed decisions. To gauge whether you have a problem here, ask your Head of Marketing and Head of Sales–the two most paid for performance individuals in your company–“how confident are you in our data?”

To improve here, you have a few options, depending on your level of investment to date:

If you’re at square one (i.e., you haven’t made investments to build out your stack yet):

Make sure you fully consider the time and resourcing needed to get a data function up and running; it’s a large cross-functional commitment of resources. Until you reach product-market fit, the reporting features in your SaaS tools may even be a better way to measure performance.

Once you’ve decided to make initial investments in your “0 to 1” data stack, there are lots of decisions to be made: data storage, data modeling, data orchestration, business intelligence (BI), and data operationalization.

If you’re looking for a place to start your research, these resources will be helpful:

🥞 How to build a modern data stack 

☁️ Best cloud data warehouses: Top provider comparison

🖼️ How to build your foundational data models

◀️ What is Reverse ETL?

📊 Best BI tools for startups: How to choose a BI tool

If you’ve already invested in your 0 to 1 data stack:

If you have a modern data stack in place, there are a few things you can do to improve data accuracy and trust.

First, make it easier for everyone who uses data to understand the quality and freshness of your data assets.

Then, improve data quality, data visibility, and data alerting for your data team. There are a growing number of metadata products that can bring critical context and quality signals to data assets. Some examples include data catalogs like Atlan, Amundsen, and Collibra, plus features in tools like Fivetran, dbt, and Astronomer, which help generate the metadata related to quality and freshness.

Don’t stop at the data team – data quality and visibility are important to non-data team members, too. 

As I mentioned earlier, people need to trust data. They need to know that it’s accurate and up-to-date, otherwise, they won’t use it in their work (which will have negative impacts on the business and the ROI of your data stack).

Our view at AirOps is that there’s a massive opportunity to bring data quality and data visibility signals to non-data team users who just want to know, “Can I access the right data quickly and easily, is it the latest and greatest version, and will it help me accomplish what I need to accomplish?”

If you want to learn more about how we help individuals and teams answer those questions, click the link below to book a demo 👇. 

Lever #2: Build analytics products that generate direct business value

Not all analyses generate the same amount of business value. Some analyses even generate negative value, but that’s a discussion we can save for another day.

At the risk of stating the obvious, analytics products that have a direct impact on core business outcome metrics are more valuable than those that don’t. They typically increase revenue, improve margin, reduce CAC/improve ROAS, improve sales efficiency (e.g., burn ratio) or reduce churn.  Some examples might include:

  • Marketing: custom audiences and multi-touch attribution models
  • Sales: dynamic lead scoring and sentiment analysis
  • Customer success/support: churn risk scoring and dynamic SLA monitoring for customer experience and customer support

There are also entire categories of SaaS products that are dedicated to function-specific data, like Heap, Funnel.io, and Supermetrics for Growth/Marketing analytics, Amplitude and Mixpanel for Product analytics, and Gainsight for Customer Success analytics.

These tools make it easy to generate high-value insights for individual functions. They can also help improve data’s speed-to-value trajectory and avoid adding headcount on your data team. But, these benefits aren’t without tradeoffs.

First, it’s not easy to combine data from multiple sources and functions. Marketing data is siloed within the marketing analytics tool, customer data is in another tool, sales has its own tool, too… and, well, you can see how it’d be difficult to get a 360° view of business performance (and data performance).

Your tooling ecosystem can also balloon out of control. Fast. When every function needs a dedicated “transformation and BI tool,” IT overhead and overall IT costs will skyrocket if you aren’t extremely careful.

Lever #3: Improve adoption and usage of data products.

If people aren’t using data to guide their decisions and improve performance, all of the investments you’ve made in tools, processes, and people won’t be worth much. Improving adoption is the next lever for driving more ROI from your data stack.

The best way to do this is to make it easier for your teams to find and access the right data.

The Eckerson Group, a data analytics consulting and research organization, tracks data adoption rates amongst organizations around the world. A 2022 survey found that  only 25% of employees actively use BI and analytics tools – which reflects minimal growth during the seven years they’ve tracked this metric. Clearly, dashboards alone aren’t the answer.

Typically, organizations will adopt a BI tool and push dashboards as the main interface that users should rely on to interact with data. 

This approach leads to low adoption rates for BI-generated data products. If teams do use dashboards, the most common use case is downloading CSVs and doing their own ad hoc analyses from there. This isn’t the best use of anyone’s time. It also creates unnecessary risks for data quality and freshness.

Why do dashboards fail so frequently? Our hypothesis at AirOps is that they don't meet users at their preferred workflows, which likely include tools like Google Sheets, Airtable, and SaaS tools like Salesforce, Hubspot, and Zendesk. When teams can access high-quality data inside of the tools they’re already using, data adoption generally improves.

We’re building a tool that will make it easier for non-data team employees to get data into their preferred workflows and tools, in a way that keeps the data team in the know – click here to learn more

How can you reduce the cost of your data stack?

Remember the Data Stack ROI formula from the beginning of this article? 

(Just in case you need a refresher it’s: Data Stack ROI = Value generated from your data stack ÷ Investment in your data stack.)

The data stack ROI equation.

We just covered the “increase the value generated from your data stack” side of the equation, so now let’s review some of the options you have for reducing your data and analytics-related expenses.

The story is similar here – each option will make sense at different points in your data journey. 

Lever #4: Increase the leverage of your data team

To give your data team more leverage, increase the ratio of data customers per data team member (e.g., Product Manager to Data Analyst ratio, business unit to Data Analyst ratio). 

There are two possible ways I suggest doing this:

Shift fixed headcount to variable consumption-based services. Basically - forgo hiring FTEs and instead procure consumption-based services for data support. This avoids the administrative cost and ramp time associated with hiring a dedicated data team and mitigates the risk that someone might not work out. One great option is Mozart Data. You can work with their data analysts, and their data platform is an all-in-one service that includes ETL, data warehousing, and the basics for data transformation and reverse ETL. You’ll save on tooling and can even delay hiring a full-time data lead.

Reduce the headcount of your data team or slow down hiring relative to other functions. You need to be careful here, though – removing too many data team members can have a negative impact on data’s ROI. To strike the right balance, make sure to do the following:

  • Eliminate time wasted on rework or redundant analyses
  • Increase user self-serve in a safe and scalable way (self-service data is difficult to get right, so proceed with caution)
  • Adopt mechanisms for continuous documentation to ensure that all data team members are in the loop, regardless of staffing changes
  • Manage expectations as the organization changes

That last point is the most important – when a business becomes accustomed to concierge-style data analytics (e.g., Slacking the data team for any and all requests), it can be a challenging habit to change. Education and communication are critical.

Lever #5: Optimize your data warehouse’s storage and compute

There’s a lot of nuance in how data warehouse storage and compute costs are calculated. 

For the purposes of this article, here’s a quick roundup of some of the things you can do to pull this lever, reduce warehouse costs, and reduce the cost of your data stack:

  • Use incremental materialization when working with larger data
  • Optimize query plans. By adding sort keys and partitions to a table, join and filter predicates can be optimized. Optimized queries will execute more efficiently, scan less data, and decrease total compute consumption.
  • Materialize frequently executed queries. This is especially true for queries that scan large amounts of data or perform aggregation. Pre-materializing the data set can significantly reduce the number of scanned rows and improve query throughput. The Snowflake query planner can even recognize when a materialized view should be used and automatically rewrite the query on execution. You can learn more about working with materialized views in Snowflake here
  • Optimize warehouse storage with a data lake strategy. By storing raw data in a data lake, the amount of data queried by the data warehouse can be greatly reduced.
  • For Amazon Redshift and Postgres, regularly vacuum analyze the tables in the warehouse to recover space from deleted records.

Lever #6: Replace, bundle, and eliminate tools

According to data from Productiv, the average company relies on 254 third-party SaaS apps. Their research also found that most teams use 40 - 60 SaaS apps on average. So, if you’re looking to reduce the cost of your data stack, taking inventory of your tool library is a fantastic avenue to explore.

There are three primary ways to reduce tooling bloat:

1. Eliminate low-value tools and consolidate redundant tools

You know it’s a problem when you hear, “We’re spending $2,000 a month on this tool but we have absolutely no idea who is using it.”

You can avoid this problem with a quarterly review of your software spend. Go through each tool one at a time, assess its value, and assign priority levels. 

More often than not, you’ll identify a few tools that no one is using (or that can easily be switched to another tool that offers more value).

2. Switch from point-to-point solutions to bundled solutions

In Lever #2: Build analytics products that generate direct business value I recommended function-specific analytics tools as one way to develop high-value analytics products. As I alluded to, though, this can lead to problems. If you aren’t careful, soon enough accounting will ask with frustration, “Why do Marketing, Growth, Product, and CS all have their own BI tool?!”

Solving this one can be trickier: If your company doesn't have a data team and analytics infrastructure, getting all your marketing metrics in one easy-to-use tool is tremendously valuable and creates leverage. The same is true for Product Analytics, Sales, Customer Experience, and other functions. 

To determine whether an analytics tool has a positive impact on the company’s bottom line, pose the following question: 

“If we removed this tool from our workflow, would there be a measurable negative impact to a business outcome that’s tied to our P&L?”

If the answer is no, nix that tool. If there would be a measurable negative impact, you need to ask a follow-up question: 

“How quickly (and how much time, effort, and money will it take) to offset the loss in value if we remove this tool from our ecosystem? Can we afford to take the hit?”

3. Take a scalpel to your seat licenses.

“Wait a minute… we have 100 BI licenses and only 10 active users?”

This scenario is more common than you might think. Here’s what you can do:

  • If you're on a month-to-month plan with no annual minimum, then reduce the number of unused seat licenses or downgrade users to a less expensive tier.
  • If you have a BI tool with a fixed seat license requirement or a minimum annual spend (I’m looking at you, Looker 👀), consider switching to a more flexible and scalable tool that will better suit your team’s needs. This is an especially solid choice if you’ve already moved your modeling and transformation work into a tool like dbt.
  • Ditch the BI tool. This might sound crazy, but hear me out: The primary outputs of BI tools are dashboards and other data visualizations. This isn’t a helpful format and it doesn’t help people make better, more data-driven decisions. There are better ways to spend your data budget.

Each of the above solutions will help you pull the “replace, bundle, and eliminate tools” lever. I’m biased toward option number three because it has the greatest impact. Full disclosure, though: AirOps does have a horse in this race. Our tool reduces (and potentially even eliminates) the need for a BI tool.

I want to increase the ROI of my data stack – which lever should I pull first?

No matter which stage of the data journey your organization is at right now, there’s always a “lever” that you can pull to generate more value from your data stack.

But which lever should you pull first? Honestly, it depends on the state of data in your organization, where you are in your data maturity journey, the investments you’ve made to date, and which parts of your stack are ripe to wring efficiency.

In our next post, we’ll dive into how to prioritize which levers to pull based on your data maturity and how much relative value you can expect to generate from each action.