- Gavin In The Cloud
- Posts
- Automating Creation of a BigQuery Dataset and Table in GCP with Terraform and GitLab CI/CD
Automating Creation of a BigQuery Dataset and Table in GCP with Terraform and GitLab CI/CD
Streamlining Data Management for Efficient Cloud Workflows
Automating Creation of a BigQuery Dataset and Table in GCP with Terraform and GitLab CI/CD
Introduction: In today's data-driven world, managing data efficiently and securely is crucial for organizations of all sizes. In this blog post, we will explore how to automate the creation of a BigQuery dataset and a table in Google Cloud Platform (GCP) using Terraform. Furthermore, we will leverage GitLab CI/CD to establish a continuous integration and deployment pipeline that automates the entire process, enabling seamless and reliable data operations.

Prerequisites: Before we dive into the implementation, ensure you have the following prerequisites in place:
A Google Cloud Platform (GCP) account with the necessary permissions to create BigQuery datasets and tables.
A GitLab account with a repository set up to manage your Terraform code.
Repo Structure: To maintain a well-organized project structure, we will follow this directory structure within our GitLab repository: GitLab-Repo

You can quickly clone my public repository: GitLab-Repo
Terraform Configuration: Let's explore the details of each component of our Terraform code:
main.tf: The main.tf
file contains the core Terraform configuration, defining the resources and their properties to be provisioned in the target cloud environment. In this context, it sets up a Google Cloud Platform (GCP) BigQuery dataset and a table for data storage and analysis.
resource "google_bigquery_dataset" "dataset" {
dataset_id = "my_gitlab_dataset" //Replace with your dataset-id
friendly_name = "test"
description = "This is a dataset from Terraform script"
location = "US"
default_table_expiration_ms = 3600000
labels = {
env = "default"
}
}
resource "google_bigquery_table" "default" {
dataset_id = google_bigquery_dataset.dataset.dataset_id
table_id = "my-gitlab-table" //Replace with your table-id
time_partitioning {
type = "DAY"
}
labels = {
env = "default"
}
deletion_protection=false
}
provider.tf: The provider.tf
file specifies the configuration for the Terraform provider, defining the target cloud platform and its necessary details, such as backend bucket, project ID, region, and zone. It allows Terraform to interact with GCP and manage resources in the designated project.
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "4.58.0"
}
}
backend "gcs" {
bucket = "your-backend-bucket" // Replace with your backend bucket name
prefix = "terraform/state"
}
}
provider "google" {
project = "your-project-id" // Replace with your project ID
region = "us-central1" // Replace with your desired region
zone = "us-central1-c" // Replace with your desired zone
}
GitLab CI/CD Configuration: The .gitlab-ci.yml
file sets up the CI/CD pipeline for automating the infrastructure deployment process. It defines stages, jobs, and associated scripts to perform tasks such as validation, planning, applying, and destroying Terraform changes.
---
workflow:
rules:
- if: $CI_COMMIT_BRANCH != "main" && $CI_PIPELINE_SOURCE != "merge_request_event"
when: never
- when: always
variables:
TF_DIR: ${CI_PROJECT_DIR}/terraform
STATE_NAME: "gitlab-terraform-gcp-tf"
stages:
- validate
- plan
- apply
- destroy
image:
name: hashicorp/terraform:light
entrypoint: [""]
before_script:
- terraform --version
- cd ${TF_DIR}
- terraform init -reconfigure
validate:
stage: validate
script:
- terraform validate
cache:
key: ${CI_COMMIT_REF_NAME}
paths:
- ${TF_DIR}/.terraform
policy: pull-push
plan:
stage: plan
script:
- terraform plan
dependencies:
- validate
cache:
key: ${CI_COMMIT_REF_NAME}
paths:
- ${TF_DIR}/.terraform
policy: pull
apply:
stage: apply
script:
- terraform apply -auto-approve
dependencies:
- plan
cache:
key: ${CI_COMMIT_REF_NAME}
paths:
- ${TF_DIR}/.terraform
policy: pull
destroy:
stage: destroy
script:
- terraform destroy -auto-approve
dependencies:
- plan
- apply
cache:
key: ${CI_COMMIT_REF_NAME}
paths:
- ${TF_DIR}/.terraform
policy: pull
when: manual
Implementation Steps: Now that we have our code and pipeline set up, let's walk through the implementation steps to automate the creation of a BigQuery dataset and a table in GCP using Terraform and GitLab CI/CD.
Set up GitLab Repository: Create a new repository on GitLab or use an existing one to host your Terraform code. If you haven't already, clone the repository from the following link: GitLab-Repo
Configure GCP Provider: In the provider.tf file, configure the GCP provider by specifying your GCP backend bucket, project ID, region, and zone.
Set Secrets in GitLab: In your GitLab repository, navigate to Settings > CI/CD > Variables. Add a new variable named "GOOGLE_CREDENTIALS" and paste the contents of your Google Cloud service account key file into the value field. This securely provides the necessary credentials for Terraform to authenticate with GCP.

Note: Make sure to remove any white spaces in your token content before pasting it.
Run the Pipeline: Commit and push your Terraform code to the GitLab repository. This action will trigger the GitLab CI/CD pipeline. Monitor the pipeline execution in the CI/CD section of your repository to ensure it completes successfully.
Verify Resource Creation in GCP: Verify Dataset and Table Creation in GCP After the pipeline is finished, verify the creation of the BigQuery dataset and table in the Google Cloud Platform (GCP) Console. Ensure that the dataset and table have been provisioned accurately.
Conclusion: In this blog post, we successfully automated the creation of a BigQuery dataset and a table in Google Cloud Platform using Terraform and GitLab CI/CD. By following the steps outlined above, you can now efficiently manage and automate your GCP data operations. Remember to regularly update your Terraform code and pipeline to reflect any changes in your data requirements. By combining Terraform and GitLab CI/CD, you automate data infrastructure management, improve consistency, and minimize errors. Stay agile by updating code, leveraging version control, and fostering collaboration for a secure and auditable data workflow. Happy automating!
References: GitLab-Repo