BigQuery Clustering Optimization

Table of Contents

BigQuery - This article is part of a series.

Part 2: This Article

Clustering
#

SELECT *
FROM user_signups
WHERE
  country = 'Lebanon'
  AND registration_date = '2023-12-01'

Through clustering, BigQuery can perform less work in accessing data, thereby increasing query speed. However, before clustering, it’s good to consider the amount of data in the table and whether the cost of processing clustering might be worse.

For example, if a BigQuery column-based table has only 10 rows of data, you should recognize that the cost of clustering will be higher than the cost of doing a full scan.

Quoting a former Google engineer, if the data group for clustering is less than 100MB, doing a full scan might be better than clustering.

Reference: Google BigQuery clustered table not reducing query size

Important Note

Additionally, if you don’t filter on the clustered base column during query time, it provides no help to query performance whatsoever.

Example of Creating Clustered Tables
#

CREATE TABLE `myproject.mydataset.clustered_table` (
  registration_date DATE,
  country STRING,
  tier STRING,
  username STRING
) CLUSTER BY country;

Clustering Features

Can cluster up to 4 columns maximum
Unlike partitioning, not limited to INT64 and DATE types only
Can also use types like STRING and GEOGRAPHY

Combine Clustering with Partitioning
#

Using partitioning and clustering together enables more efficient data access.

Combination Strategy

Partitioning: Date-based data division
Clustering: Additional sorting within partitions
Query performance and cost optimization

References
#

BigQuery - This article is part of a series.

Part 1: What is BigQuery?

Part 2: This Article

What is BigQuery?

2023-12-18· loading · loading

Backend Database BigQuery Google Cloud OLAP Column Database Distributed System

Key features of BigQuery as a column-based database and its Colossus-based distributed processing architecture

Springdoc and OpenAPI (Annotation Usage Guide)

2023-10-17· loading · loading

Backend Spring Springdoc Swagger OpenAPI API Documentation

Complete guide to Springdoc’s @ParameterObject usage and Swagger annotation reference

Mastering OpenAPI Generator

2023-10-16· loading · loading

Backend Swagger OpenAPI API Documentation Code Generation

API documentation automation and code generation using OpenAPI Generator

What is Nginx? Evolution and Architecture of Web Servers

2022-10-25· loading · loading

Web Server Infrastructure Nginx Apache Web Server Reverse Proxy Load Balancing SSL Performance

A comprehensive guide covering high-performance web server Nginx concepts, Apache comparison, Event-Driven architecture, and practical features used in production

Clustering#

Example of Creating Clustered Tables#

Combine Clustering with Partitioning#

References#

Related

Clustering
#

Example of Creating Clustered Tables
#

Combine Clustering with Partitioning
#

References
#