gravatar
 · 
July 31, 2024
 · 
5 min read

Should I model my data in Looker or BigQuery?

The short answer - you should do both. To answer this question further, you need to know the difference between a semantic model and a data/dimensional model. These are two core parts of a functioning data warehouse and analytics strategy and will help us answer our question in more detail.

The Dimensional Model - the BigQuery Side

The short answer: here's where you do your data cleaning, standardisation, unions, case statements, and unit joins.

Every organisation worth its salt building a data warehouse uses some sort of raw, staging and 'analytics-ready' areas. You can call it a medallion architecture, a multi-stage environment, or whatever. The layers are usually something along the lines of:

  • Raw: The dataset 'as is' from the source. You may store this data in blobs (JSON, parquet, Avro) before, but when it reaches your warehouse, it's still considered raw. You haven't applied any transformations to it. Store everything as a varchar, and keep your schema loose.
  • Staging: The fields have been renamed and cast to the right types. You may do some 1-1 joins here to make tables wider and 1-M joins to bring in context around some of your dimensions. The data is now ready 'to be analysed'
  • Intermediate: If you need to encode any business logic at this stage, such as complex SQL queries or unioning data, here's where you do it.
  • Mart: These would be the tables where you'd 'finish'. The ones you'd send downstream to your analysts to query. We recommend using Kimball's data modelling techniques and building fact and dimension tables.

At this point, you may be tempted to join your customers and sales to get your data or start creating your fact_sales and dim_customer tables to turn them into a star schema on the warehouse layer. Intuition starts telling us to denormalise to help our team get the most out of their analytics.

If you were running PowerBI, Tableau or Looker Studio, this would be standard practice. But you have Looker! And with Looker, you have a semantic model, which comes with a host of benefits that we won't discuss. We assume you've bought in.

At the point that you hand over to Looker, you want your data to be as normalised as possible. This means lots of tables and joins and little redundancy. Your grain should be as little as possible to allow Looker to do the aggregations.

The Semantic Model - LookML

The short answer is that here's where you do your metric calculations, add business descriptions, define your relationships between your models, and create data access rules. This is all done in your LookML. Let's take a look at how that looks. Within your LookML, we see the following types of files:

  • Documentation - These markdown files tell the analysts how they should use this data model. This would contain your own standard style guides of LookML and processes, and high-level information about the Looker Project/Models
  • Models: These are your models. You should be categorising these based on
  • Views: These are the connections to your underlying datastore, as well as your derived and extended tables. More on those later.
  • Dashboards: These are your dashboard files - ways you can make sure that the dashboards your team are consuming are version-controlled.

Most of the work here is done in the models and views. In the views, you define your dimensions and measures, as well as mark them up with descriptive labels and business context. Remember, the semantic model is the translation layer between database speak and business speak. It's easier for business users to understand "Monthly Sales (GBP)" over "mon_sales_standardised".

Above is what your LookML project should look like. We're working on a one-click-deploy boilerplate, so you don't have to spool this up every time you set up a project. This was derived from this blog.

Final Architecture

This is a gross oversimplification about how the architecture should look like, but it should give you the right steer and guidance behind whether you should do you modelling within your data warehouse (dbt/dataform), or within Looker (LookML)

You can stop reading here. If you're still stuck, contact us. If you want to learn a bit about derived tables, read on.

Elephant in the room: (P)DT's

Looker has a feature called persistent derived tables, or PDTs. These are 'temporary' tables that are defined in LookML using either SQL or LookML, and they are written back to the database. How they work is:

  1. You write a query in Looker using SQL or LookML
  2. Looker executes that query and saves the results back to BigQuery on a schedule or a trigger. (This can be done incrementally for tables that use timestamps).
  3. Whenever someone queries that table, the results from that persistent table get returned.

Sound familiar? Yeah. You're basically replicating dbt or Dataform behaviour in Looker. Why may you want to do this? Few reasons:

  • Leveraging LookML: PDTs are tightly integrated with LookML, allowing for advanced Looker-specific features like parameters, Liquid templating, and native Looker functions, which may not be directly supported in BigQuery.
  • Complex Business Logic: When your data transformations require complex LookML functionality, such as nested derived tables or complex joins, PDTs allow these to be embedded directly within the LookML layer.
  • Looker-Specific Testing and Validation: the data validation and quality checks are primarily designed to be run within Looker, PDTs can accommodate these requirements directly.
  • Localised Data Transformations: If certain transformations are specific to a Looker project and are not intended for use outside of Looker, it might be simpler to manage them as PDTs rather than creating models that must be maintained separately.

Data engineering is this funny business of trade-offs. You may want to use PDT's but we recommend KISS (Keep it simple, stupid!), and starting without them, and using them when you find a usecase for them.

Conclusion

The dimensional model in BigQuery focuses on cleaning, standardising, and organising data into an 'analytics-ready' format, including staging, intermediate, and mart layers. It emphasises normalising data and using techniques like Kimball's data modelling to create fact and dimension tables. The idea is to prepare the data comprehensively before handing it over to the semantic layer in Looker.

In contrast, the semantic model in Looker, implemented through LookML, focuses on defining business metrics, adding descriptive labels, and setting up relationships between data models. This layer acts as a bridge between technical data and business-friendly terminology, making it easier for business users to understand and use the data.

We didn't touch on data governance, performance, cost management, and how BigQuery and Looker integrate because we didn't think we needed to in this blog. If you need answers to that - get in touch 👇

Stay Social

© Cobry Ltd | 0333 789 0102
24 Sandyford Place, Glasgow, Scotland, UK, G3 7NG
167/169 Great Portland Street, 5th Floor, London, W1W 5PF
Newsletter
Compliance
Privacy Policy

Care for a towel? 👀

logo-established-large