Gavin In The Cloud
Posts
Automating Creation of a BigQuery Dataset and Table in GCP with Terraform and GitLab CI/CD

Automating Creation of a BigQuery Dataset and Table in GCP with Terraform and GitLab CI/CD

Streamlining Data Management for Efficient Cloud Workflows

Gavin Singh
July 26, 2023

Automating Creation of a BigQuery Dataset and Table in GCP with Terraform and GitLab CI/CD

Introduction: In today's data-driven world, managing data efficiently and securely is crucial for organizations of all sizes. In this blog post, we will explore how to automate the creation of a BigQuery dataset and a table in Google Cloud Platform (GCP) using Terraform. Furthermore, we will leverage GitLab CI/CD to establish a continuous integration and deployment pipeline that automates the entire process, enabling seamless and reliable data operations.

Prerequisites: Before we dive into the implementation, ensure you have the following prerequisites in place:

A Google Cloud Platform (GCP) account with the necessary permissions to create BigQuery datasets and tables.
A GitLab account with a repository set up to manage your Terraform code.

Repo Structure: To maintain a well-organized project structure, we will follow this directory structure within our GitLab repository: GitLab-Repo

You can quickly clone my public repository: GitLab-Repo

Terraform Configuration: Let's explore the details of each component of our Terraform code:

main.tf: The main.tf file contains the core Terraform configuration, defining the resources and their properties to be provisioned in the target cloud environment. In this context, it sets up a Google Cloud Platform (GCP) BigQuery dataset and a table for data storage and analysis.

resource "google_bigquery_dataset" "dataset" {
  dataset_id                  = "my_gitlab_dataset" //Replace with your dataset-id
  friendly_name               = "test"
  description                 = "This is a dataset from Terraform script"
  location                    = "US"
  default_table_expiration_ms = 3600000
  labels = {
    env = "default"
  }
}

resource "google_bigquery_table" "default" {
  dataset_id = google_bigquery_dataset.dataset.dataset_id
  table_id   = "my-gitlab-table" //Replace with your table-id

  time_partitioning {
    type = "DAY"
  }

  labels = {
    env = "default"
  }
  deletion_protection=false
}

provider.tf: The provider.tf file specifies the configuration for the Terraform provider, defining the target cloud platform and its necessary details, such as backend bucket, project ID, region, and zone. It allows Terraform to interact with GCP and manage resources in the designated project.

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "4.58.0"
    }
  }
  backend "gcs" {
    bucket  = "your-backend-bucket" // Replace with your backend bucket name
    prefix  = "terraform/state"
  }
}

provider "google" {
  project = "your-project-id" // Replace with your project ID
  region  = "us-central1" // Replace with your desired region
  zone    = "us-central1-c" // Replace with your desired zone
}

GitLab CI/CD Configuration: The .gitlab-ci.yml file sets up the CI/CD pipeline for automating the infrastructure deployment process. It defines stages, jobs, and associated scripts to perform tasks such as validation, planning, applying, and destroying Terraform changes.

---
workflow:
  rules:
    - if: $CI_COMMIT_BRANCH != "main" && $CI_PIPELINE_SOURCE != "merge_request_event"
      when: never
    - when: always

variables:
  TF_DIR: ${CI_PROJECT_DIR}/terraform
  STATE_NAME: "gitlab-terraform-gcp-tf"

stages:
  - validate
  - plan
  - apply
  - destroy

image:
  name: hashicorp/terraform:light
  entrypoint: [""]
  
before_script:
  - terraform --version
  - cd ${TF_DIR}
  - terraform init -reconfigure

validate:
  stage: validate
  script:
    - terraform validate
  cache:
    key: ${CI_COMMIT_REF_NAME}
    paths:
    - ${TF_DIR}/.terraform
    policy: pull-push

plan:
  stage: plan
  script:
    - terraform plan 
  dependencies:
    - validate
  cache:
    key: ${CI_COMMIT_REF_NAME}
    paths:
    - ${TF_DIR}/.terraform
    policy: pull

apply:
  stage: apply
  script:
    - terraform apply  -auto-approve
  dependencies:
    - plan
  cache:
    key: ${CI_COMMIT_REF_NAME}
    paths:
    - ${TF_DIR}/.terraform
    policy: pull

destroy:
  stage: destroy
  script:
    - terraform destroy  -auto-approve
  dependencies:
    - plan
    - apply
  cache:
    key: ${CI_COMMIT_REF_NAME}
    paths:
    - ${TF_DIR}/.terraform
    policy: pull
  when: manual

Implementation Steps: Now that we have our code and pipeline set up, let's walk through the implementation steps to automate the creation of a BigQuery dataset and a table in GCP using Terraform and GitLab CI/CD.

Set up GitLab Repository: Create a new repository on GitLab or use an existing one to host your Terraform code. If you haven't already, clone the repository from the following link: GitLab-Repo
Configure GCP Provider: In the provider.tf file, configure the GCP provider by specifying your GCP backend bucket, project ID, region, and zone.
Set Secrets in GitLab: In your GitLab repository, navigate to Settings > CI/CD > Variables. Add a new variable named "GOOGLE_CREDENTIALS" and paste the contents of your Google Cloud service account key file into the value field. This securely provides the necessary credentials for Terraform to authenticate with GCP.

Note: Make sure to remove any white spaces in your token content before pasting it.

Run the Pipeline: Commit and push your Terraform code to the GitLab repository. This action will trigger the GitLab CI/CD pipeline. Monitor the pipeline execution in the CI/CD section of your repository to ensure it completes successfully.
Verify Resource Creation in GCP: Verify Dataset and Table Creation in GCP After the pipeline is finished, verify the creation of the BigQuery dataset and table in the Google Cloud Platform (GCP) Console. Ensure that the dataset and table have been provisioned accurately.

Conclusion: In this blog post, we successfully automated the creation of a BigQuery dataset and a table in Google Cloud Platform using Terraform and GitLab CI/CD. By following the steps outlined above, you can now efficiently manage and automate your GCP data operations. Remember to regularly update your Terraform code and pipeline to reflect any changes in your data requirements. By combining Terraform and GitLab CI/CD, you automate data infrastructure management, improve consistency, and minimize errors. Stay agile by updating code, leveraging version control, and fostering collaboration for a secure and auditable data workflow. Happy automating!

References: GitLab-Repo