Gavin In The Cloud
Posts
Effortless Management of BigQuery Analytics Hub and Secure Dataset Sharing with Terraform and GitHub Actions

Effortless Management of BigQuery Analytics Hub and Secure Dataset Sharing with Terraform and GitHub Actions

Automate Your GCP Infrastructure for Streamlined Analytics and Collaborative Data Access

September 04, 2023

Effortless Management of BigQuery Analytics Hub and Secure Dataset Sharing with Terraform and GitHub Actions

Introduction: As cloud engineers working with GCP, we often encounter the need to automate complex cloud workflows. In this blog post, we will explore how to automate the creation of a BigQuery Analytics Hub and the sharing of a dataset with another service account across different projects using Terraform and GitHub Actions. We'll guide you through the step-by-step implementation of provisioning the required resources, setting up the CI/CD pipeline, and achieving a seamless and efficient automation process.

Prerequisites: Before diving into the technical details, ensure that you have the following prerequisites:

A Google Cloud Platform (GCP) account with the necessary permissions to create resources like BigQuery Analytics Hub and datasets.
A GitHub account with a repository set up to manage your Terraform code.

Project Structure: To maintain a clear project structure, we'll organize our repository as follows: GitHub-Repo

You can clone this public repository as a starting point: GitHub-Repo

Terraform Configuration: Let's explore the details of the Terraform configuration in the main.tf and variable.tf files within the src directory.

main.tf: This file contains the main Terraform configuration for creating and managing cloud resources. It defines the infrastructure components, such as resources, providers, and data sources, and specifies their configuration details. In your example, main.tf is used to define resources like BigQuery Analytics, Hub data exchanges and datasets.

# Create BigQuery Analytics Hub Data Exchange
resource "google_bigquery_analytics_hub_data_exchange" "data_exchange" {
  location         = var.region
  data_exchange_id = var.data_exchange_id
  display_name     = "product_vendor"
  description      = "data exchange"
}

google_bigquery_analytics_hub_data_exchange_iam_policy
data "google_iam_policy" "admin" {
  binding {
    role    = "roles/viewer"
    members = [
  "serviceAccountA", #Replace with your publisher service account 
]
  }
}

resource "google_bigquery_analytics_hub_data_exchange_iam_policy" "policy" {
  project          = google_bigquery_analytics_hub_data_exchange.data_exchange.project
  location         = google_bigquery_analytics_hub_data_exchange.data_exchange.location
  data_exchange_id = google_bigquery_analytics_hub_data_exchange.data_exchange.data_exchange_id
  policy_data      = data.google_iam_policy.admin.policy_data
}

# Create BigQuery Analytics Hub Listing
resource "google_bigquery_analytics_hub_data_exchange" "listing" {
  location         = var.region
  data_exchange_id = "data_exchange"
  display_name     = "my_data_exchange"
  description      = "example data exchange"
}

resource "google_bigquery_analytics_hub_listing" "listing" {
  project = var.project
  location         = var.region
  data_exchange_id = google_bigquery_analytics_hub_data_exchange.data_exchange.data_exchange_id
  listing_id       = var.listing_id
  display_name     = "my_listing_data"
  description      = "data exchange"

  bigquery_dataset {
    dataset = google_bigquery_dataset.listing.dataset_id
  }

   # Add publisher and request access
  publisher {
    name = "your-name" #Replace with your publisher name
    primary_contact = "serviceAccountA", #Replace with your publisher service account 
  }

  request_access = "serviceAccountB", #Replace with your  requester service account 
}

resource "google_bigquery_dataset" "listing" {
  dataset_id      = "product_vendor_data_123"
  friendly_name   = "product_vendor_data"
  description     = "my dataset"
  location        = var.region
}

# Create google_bigquery_analytics_hub_listing_iam_policy
resource "google_bigquery_analytics_hub_listing_iam_policy" "policy" {
  project          = google_bigquery_analytics_hub_listing.listing.project
  location         = google_bigquery_analytics_hub_listing.listing.location
  data_exchange_id = google_bigquery_analytics_hub_listing.listing.data_exchange_id
  listing_id       = google_bigquery_analytics_hub_listing.listing.listing_id
  policy_data      = data.google_iam_policy.admin.policy_data
}

variables.tf: This file is used to declare input variables that can be used throughout your Terraform configuration. These variables allow you to parameterize your infrastructure code, making it more flexible and reusable. In your example, variables.tf declares variables like region, data_exchange_id, listing_id, and dataset_id.

variable "project" {
  description = "The region where the resources will be created"
  default     = "your-project-id" #Replace with your project ID
}

variable "region" {
  description = "The region where the resources will be created"
  default     = "us-central1" #Replace with your region
}

variable "data_exchange_id" {
  description = "ID of the data_exchange"
  default     = "my_data_exchange"
}

variable "listing_id" {
  description = "ID of the listing_id"
  default     = "my_listing"
}

variable "dataset_id" {
  description = "ID of the dataset_id"
  default     = "purchasing"
}

terraform.tfvars: the terraform.tfvars section includes variable assignments for region, data_exchange_id, listing_id, dataset_id, and project, where values like region are set to "us-central1" and data_exchange_id is "my_data_exchange," among others.

region = "us-central1"
data_exchange_id = "my_data_exchange"
listing_id = "my_listing"
dataset_id = "purchasing"
project = "project-id"

Note: Replace above data with your desired values.

module.tf: This file is typically used when you're defining a Terraform module. A module is a reusable collection of Terraform resources and configurations that can be used in different parts of your infrastructure code. It encapsulates a specific set of resources with their configurations. In your example, it appears to be a standalone configuration file for your backend and provider settings.

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "4.58.0"
    }
  }
  backend "gcs" {
     bucket  = "your-backend-bucket" #Replace with your backend bucket
     prefix  = "bigquery-analytics-hub" 
  }
}

provider "google" {
  project     = "your-project-id" #Replace with your project id
}

module "bigquery_analytics_hub" {
  source = "./src"
}

GitHub Actions Configuration: The .github/workflows/terraform.yml file sets up the GitHub Actions CI/CD pipeline for automating the infrastructure deployment process. This pipeline will perform tasks such as validation, planning, applying, and destroying Terraform changes.

name: "BigQuery Analytics Hub"

on:
  push:
    branches:
      - main

jobs:
  terraform:
    name: "Terraform"
    runs-on: ubuntu-latest
    env:
      GOOGLE_CREDENTIALS: ${{ secrets.GOOGLE_CREDENTIALS }}
    defaults:
      run:
        working-directory: ./src
    steps:
      - name: Checkout
        uses: actions/checkout@v2

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v1
        with:
          terraform_version: 1.0.1
          terraform_wrapper: false

      - name: Terraform Init
        id: init
        run: terraform init

      - name: Terraform Format
        id: fmt
        run: terraform fmt 

      - name: Terraform Plan
        id: plan
        run: terraform plan 

      - name: Terraform Apply
        id: apply
        run: terraform apply -auto-approve

      #- name: Terraform Destroy
      #  id: destroy
      #  run: terraform destroy -auto-approve

Implementation Steps: With the configurations in place, let's walk through the implementation steps.

Set up GitHub Repository: Create a new repository on GitHub or use an existing one to host your Terraform code. You can clone the provided public repository as a starting point: GitHub-Repo
Configure GCP Provider: In the main.tf file, ensure that you've configured the GCP provider with your project ID and other required values.
Set Secrets in GitHub: In your GitHub repository, navigate to Settings > Secrets. Add a new secret named "GOOGLE_CREDENTIALS" and paste the contents of your Google Cloud service account key file. This secure secret will allow Terraform to authenticate with GCP.

Note: Make sure to remove any white spaces in your token content before pasting it.

Push to GitHub: Commit and push your Terraform code to the GitHub repository. The GitHub Actions CI/CD pipeline will automatically trigger, executing the defined stages and jobs.
Verify Resource Creation: Check the Google Cloud Console to confirm the successful creation of the BigQuery Analytics Hub and dataset.
Check Dataset Transfer Status: After initiating the dataset sharing process, periodically check the status of the dataset transfer to ensure it is completed successfully. You can use GCP monitoring tools or APIs to monitor the progress and verify that the shared dataset is accessible to the intended service account in the target project.

By incorporating this step, you can ensure that the dataset is fully transferred and available for use in the destination project, enhancing the reliability of your automation process.

Conclusion: In this blog post, we have demonstrated how to automate the setup of a BigQuery Analytics Hub and dataset sharing across different GCP projects using Terraform and GitHub Actions. By following the steps outlined above, you can streamline complex cloud workflows, ensure consistency, and minimize manual intervention. Regularly update your Terraform code and pipeline to adapt to changes in your cloud infrastructure requirements. The combination of Terraform and GitHub Actions empowers cloud engineers to achieve efficient and automated cloud resource management. Happy automating!

References: GitHub-Repo