dbt & Dataform are data transformation tools used to create complex data transformations using SQL. These tools enable users to model, test, and version control data. The use cases and functionality for both are similar. However, there are some key differences in terms of cost considerations and value propositions. Here’s a rundown to help you decide what works for you and your organisation.
If you're looking for a more technical guide on moving models from dbt to Dataform, we've written a practical guide around how we moved the Fivetran dbt model to dataform for our own analysis. Check it out here.
Overview
dbt Core is an open-source tool with a command-line interface that enables data teams to transform data using analytics engineering best practices. If you prefer a graphical interface, dbt Cloud is a better pick since it is a proprietary dashboard. Similarly, Dataform Core is open source; however, users have the option of using Google’s hosting. This makes it seamless since Dataform is now owned by Google.
These tools help you write SQL in a more maintainable way, with version control and data testing. If you're using GCP, Dataform will be advantageous to you. In any other use case, dbt would be a better fit. If you don't work with a data warehouse, consider using Acho or Alteryx.
Origins
Both dbt and Dataform are tools for transforming data using SQL. They are typically used after raw data has been ingested into a warehouse-like BigQuery. This data is often unusable without transformations to make it useful for analysis. The “dbt” is the "T" in ELT (Extract, Load, Transform).
Fishtown Analytics created dbt as a solution for their data transformation needs. Tristan Handy, the CEO of Fishtown garnered interest from the data engineering community by writing blogs about his frustrations with existing technologies that weren't equipped to manage SQL transformations. This resonated with the audience, and the tool was built with the help of an enthusiastic and investible community.
Fishtown developed a way to manage SQL transformations; thus, dbt began as a command-line interface (CLI) tool that was later updated to include a graphical user interface (GUI). Dataform was created as a front-end to make dbt more accessible to less technical analysts. Later, Dataform was further developed as a replacement for the newly built dbt backend, prompting dbt to launch its own GUI SaaS tool, now called dbt Cloud.
dbt core is owned by dbt labs; however, it is open source. Similarly, Dataform is owned by Google, but it is also open source. If you prefer to take advantage of the infrastructure provided by an external source, you can choose to have Google or dbt Labs host these solutions for you. If you are an advanced user, the open-source features would work for you. However, when deploying in an organisation, it is helpful to have GUIs for less experienced users.
Cost
There are two ways in which dbt can be deployed: self-hosting and cloud-hosting. If you opt to self-host, you will need to cover your computing expenses. Alternatively, you can acquire the cloud-hosted version of dbt from dbt Labs, which includes a dashboard, and costs $100 per user per month.
In contrast, Dataform is entirely free to use. It runs queries in BigQuery to generate new tables, and views, and execute other SQL commands. However, BigQuery will charge you for running these queries.
Templating Languages
dbt utilises JINJA, a templating language commonly used by Python developers, while Dataform uses JavaScript. Both of these tools transform your project into a programmable SQL environment.
Features of dbt & Dataform
In terms of features for specific use cases, both tools prove to be quite useful. It's worth noting that Dataform offers direct integration with BigQuery, making it an excellent choice for organisations that use it as their primary data warehouse. On the other hand, dbt may be more advantageous if you have data warehouses spread across multiple locations due to its flexibility in extracting data from multiple sources.
The command line versions of the software are on par with the industry standards. They’re simple scheduled SQL runners with the ability to integrate with git, manage separate projects, and create documentation. You also get some pretty aesthetic visualisations on both if you’d like.
dbt Cloud
- Comes with a dashboard for monitoring dbt runs and schedules.
- Provides a web-based interface for managing and deploying dbt projects.
- Supports version control and collaboration features.
- Offers an integrated development environment (IDE) with a SQL editor and autocomplete functionality.
- Provides a built-in documentation generator.
- Offers advanced analytics features, such as data lineage and lineage visualizations.
Dataform
- Has a more seamless integration with BigQuery and other GCP services.
- Offers a web-based interface for managing and deploying Dataform projects.
- Provides a built-in SQL editor with autocomplete functionality.
- Offers a built-in testing framework for data validation.
- Provides a built-in documentation generator.
- Offers advanced analytics features, such as data lineage and lineage visualizations.
dbt is more mature in terms of features, but Dataform is more lightweight and has better GCP integration.
Ecosystem and Community
dbt has a large and active community, which has contributed to its growth and success. There are many resources available, including documentation, tutorials, and blog posts. dbt also has an official Slack channel and a community forum where users can ask questions and get help from other users. This was the slack that started this whole thing, after all!
Dataform's community is smaller, but it is growing quickly. Dataform was recently acquired by Google, which is very quickly boosting visibility in organisations that are running on the Google stack. Dataform also has a Slack channel and a community forum where users can get help and ask questions, albeit a bit smaller than the dbt one.
Conclusion
In conclusion, both tools have similar functionality and use cases. Choosing one over the other comes down to personal and organisational preferences. If you have data spread across multiple warehouses, try dbt. If you are comfortable using Google Stack, try Dataform.
At Cobry, we help our clients with digital transformation and provide ongoing support. We love the simplicity and seamless integration of Dataform with the Google stack. If you are looking to get more from your data, get in touch and we will guide you all the way 🙂