5 reasons dbt sets the standard for data transformation tools
dbt is the de facto data transformation tool for today's modern data stack (and here's why).
If you’ve been involved in the data community over the last few years, you may have noticed an uptick in conversations about dbt and dbt labs (formerly Fishtown Analytics).
Not to be confused with the other DBT (dialectical behavior therapy), this dbt is a command line tool that enables data analysts and engineers to transform data inside of their warehouses more effectively. If you can write SQL, you’ll be able to use dbt…
…and that’s why its popularity has skyrocketed so quickly. Between mid-2016 and Q4 of 2021, dbt went from zero users to more than 9,000 companies using the tool.
In the press release linked above, dbt describes itself as the “industry standard for data transformation in the cloud.” I’m inclined to agree. So is the rest of Team AirOps – and since we’re such big fans, we want to help other users get the most out of their dbt accounts.
Today, I’m going to zoom out and take a macro look at dbt, with an emphasis on why it’s such a great tool for startups and other high-growth companies.
The rest of this blog will assume you’re already semi-familiar with dbt and its place in the modern data stack. If you aren’t, here are some of my favorite intro resources:
⮕ What Is dbt and Why Are Companies Using It? This rundown from Benjamin Rogojan (aka the Seattle Data Guy) is a great introduction to dbt’s core features and how it works.
⮕ ETL vs. ELT: Understanding the differences & benefits. dbt takes care of the “T” part of ELT/ETL, so understanding how both processes work is helpful if you want to know why the data team is suddenly all gung-ho about dbt.
⮕ How to build your foundational data models.
Whether you’re part of a data team, the manager of a data team, or are interested in how to get the maximum ROI from your company’s data function, read on to learn how dbt empowers data teams and makes them more effective.
1. dbt isn’t “business user-friendly” (and that’s a good thing).
(Sure the meme is talking about the other DBT, but it’s relevant here, too 😉)
There’s a lot of talk in the data world about building tooling that’s easy and accessible for business users. Oftentimes, this is a Very Good Thing. It encourages business units to adopt data and use it in their decision-making, which can be a powerful catalyst for enhanced performance and increased growth.
But, there’s a time and a place for business-friendly data tooling. A drag-and-drop GUI might be ideal for non-technical users. However, for someone technical who knows SQL, a UI that’s designed for business users is more likely to slow things down.
dbt recognized this difference in the way business teams and data teams work, so they purposefully built a tool that works for technical personas. Basically, it’s the best solution out there for SQL-driven data transformations.
So, while someone unfamiliar with SQL will probably go 🤨 the first time they log in to dbt, data practitioners are more than thrilled with the interface. That’s a good thing because the data team can get to work and focus on delivering business value.
The implicit barrier to entry of dbt has another benefit for the broader data analytics community: It heightens the status and prestige of the analytics engineer role.
2. dbt turns data analysts into engineers.
Data analysts don’t always have all of the same skill sets as engineers, but dbt allows analysts to perform data engineering with the power of SQL, data modeling, and version control (e.g., GitHub).
One of the best things about dbt is that it uses standard software engineering practices. Analysts can do their jobs in a Git workflow, just like other developers in the business. More specifically, Git introduces version control and change management practices into an analytics engineer’s workflow.
This means multiple engineers can work on the same project without impacting or overwriting each other's work. As a result, an analytics engineering team can scale from a single engineer to an entire team, all while continuing the same processes. This, coupled with peer code review best practices, makes managing and scaling a large team of engineers very seamless… which is great news for companies experiencing rapid growth.
Overall, this approach also introduces additional rigor into the data analyst workflow to improve things in three main areas:
- Data documentation
- Data standardization
- Maintenance of data assets
As the organization’s data complexity grows, you’ll be glad for those three pillars.
3. dbt empowers analytics engineers to work however they want.
When developing dbt models, analytics engineers can use their preferred development environment, whether it’s a text editor, notebook, or the dbt IDE (Integrated Development Environment). dbt is an inherently CLI (command-line interface)-based tool, so any IDE or text editor works for development, and analytics engineers are free to choose whatever environment makes the most sense for them. For analysts that don't have a particularly strong opinion, I think the dbt Cloud IDE is a great way to work. Plus, there’s only minimal setup required to get started.
The current Cloud IDE isn’t without limitations, though – it sometimes takes a while to start up and interactions can be sluggish. dbt Labs recognizes this and is in the process of rebuilding the dbt Cloud IDE with a renewed focus on performance and reliability; the new version is currently in beta as of September 2022. Updates include significantly faster load and save times, an updated user interface, and easy commands (e.g. run, build) via the UI.
The flexibility doesn’t stop there, either…
☁️ dbt also activates SDLC (software development lifecycle) best practices through the support of development, staging, and production environments. Development environments can exist locally on an analytics engineer's desktop, but can also live entirely on dbt Cloud.
🏠 Unique warehouse credentials for development on a user-by-user basis help users maintain flexible and independent development workflows without conflicting with other users' work. Promoting to production is then as simple as making a pull request to the production branch of the project repository.
⏰ Within dbt Cloud, projects are scheduled to run directly within the dbt cloud UI, meaning an analytics engineer can go from their dev to prod to warehouse within minutes.
4. dbt establishes trust and confidence in data.
There are plenty of reasons data adoption is so low for so many companies, ranging from cumbersome tooling (I’m looking at you, BI tools) to lack of education. Another common problem I see that exacerbates the data adoption issue is a lack of trust and confidence in the data.
Using dbt won’t solve data adoption on its own, but it can help the data team generate more trust in data and analytics. Integration, quality, and performance testing are all built-in to dbt:
Data documentation is part of the development process. dbt has extensive documentation features, including an auto-generated documentation option and the ability to view documentation right next to the code.
You can piggyback off of dbt’s powerful documentation features with a tool like AirOps, which enables you to selectively share metrics and data sets from your data warehouse and dbt project with the entire business. That way, business users can find and create with trusted data, and data teams have full visibility into how data assets are used throughout the organization.
Built-in data lineage helps business users and technical users alike understand data context. dbt helps users visualize lineage using a directed acyclic graph (DAG), which shows the dependencies between different data models in a project. Users can see the source models / raw data on the left-hand side of their screen, and the final mart models on the right. Every mart model can be easily traced back to the raw data tables that it uses. This makes it easy to see errors and determine the root cause of any problems.
From a business user perspective, explaining projects – and the value they’ll deliver – is a lot easier, too. A graph is a lot more intuitive compared to lines of code, and it’s much easier for non-technical people to understand what’s happening to the data.
Here’s an example of what the built-in documentation and data lineage graph both look like when you’re working inside dbt:
Now if that isn’t the cleanest data model ever built, I don’t know what is.
Expose metadata to end consumers — visible in AirOps. Every time that dbt Cloud runs a dbt project, it generates metadata about the accuracy, recency, configuration, and structure of the views and tables in the warehouse. Business end-users can easily view metadata from approved metrics and datasets inside of AirOps, which gives them much-needed context and background about the data they’re using to make decisions.
5. The dbt community is 🧡.
As dbt has grown, so too has the community that surrounds it.
It’s pretty common for open source projects to have a developer community grow around them, but dbt is unique in how it has prioritized community from the start. They’ve poured resources into content, user documentation, social networks (check out the active Slack community), in-person meetups, the annual Coalesce conference, and training.
Earlier I mentioned that one of my favorite things about dbt is how it introduces software engineering best practices into the analyst workflow. While there are plenty of training resources available, the Slack community deserves a special shout-out. If you ever get stuck or have a question, head over to Slack where you’ll find an active group of dbt users at all levels. They’re a solid resource whenever you have questions or need help debugging your models.
Part of the attraction (and why I enjoy participating) stems from how successfully dbt has defined, elevated, and supported the role of the analytics engineer. dbt, and analytics engineering as a whole, are constantly evolving in a world that’s increasingly powered by cloud-driven data platforms. The various dbt communities are fantastic resources for learning more and staying up to date.
As great as dbt is, it’s just a tool 🔨
dbt automates a lot of the repetitive, manual parts of data transformation. It makes analysts more efficient and frees up time for them to focus on other tasks and projects.
However, remember that dbt is an addition to your data stack, not a replacement for a thoughtful data strategy with clear goals.
Getting your data investment “right” is really, really hard.
Hiring a data team, or even a single data leader, isn’t straightforward. There are tons of tools out there, some more useful than others. There are so many potential metrics to measure, it can feel impossible to know which ones are worth paying attention to. Even if you have the team, the tools, and access to the right metrics, that doesn’t mean the organization won’t continue to struggle with using data effectively.
So, while I wholeheartedly recommend dbt to startups and other high-growth companies, it’s worth noting that dbt is only really effective with a strong underlying foundation that includes the right teams, tools, and processes.
And if you’re not sure how to build a strong foundation for data success, AirOps can help:
📚 Check out our comprehensive blog library of how-to’s and best practices, or…
💬 Get in touch with our team to learn more about how AirOps can help your team unlock the full potential of your organization’s data assets.