General information:
We use terraform agents to provision our resources in our environments(brave, stage, etc.) and in our CICD workspace.
Terraform agents are used because they run directly in GCP and can modify resources to which TFC itself would not have access. More info why we choose to use terraform agents can be found in Terraform Agents - PoC .
Deployment:
The agents are deployed as a vm which is running in a project, for example in brave it’s running in the sharedvpc (safi-env-brave-sharedvpc
).
Deployment steps, first we need to select if the environment needs agents and how many it needs, that’s selected in our shared_variables.tf
brave = { domain_name = "smallog.tech" features = { ably = true, confluent = true, confluent-mp = false, gke_release_channel = "REGULAR" environment_type = "dev" } tfc_agents = 2 }
Once this change is done, the dispatcher is used to create the agent_pool and agent_token and distributes it to the necessary workspace as variables.
resource "tfe_agent_pool" "this" { for_each = toset(concat(keys(local.safi_environments), ["cicd"])) name = format("%s-%s-tfe-agent-pool", local.prefix, each.key) organization = var.tfe_organization } resource "tfe_agent_token" "this" { for_each = toset(concat(keys(local.safi_environments), ["cicd"])) agent_pool_id = tfe_agent_pool.this[each.key].id description = format("%s-%s-tfe-agent-token", local.prefix, each.key) }
key = "tfc_agent_token" value = tfe_agent_token.this[each.key].token category = "terraform" sensitive = true description = local.managed_by_terraform }],
Those variables, are then used to provision the necessary VMs.
locals { tfc_agent_machine_type = { dev = "n1-standard-2" brave = "n1-standard-2" stage = "n1-standard-2" prod = "n1-standard-2" } } # SA # ----------------------------------------------- resource "google_service_account" "tfc-agent-sa" { provider = google.sharedvpc account_id = "${local.prefix}-${var.env_name}-tfc-agent-sa" display_name = "${local.prefix}-${var.env_name}-tfc-agent-sa" } # Subnetwork networkUser in sharedVPC resource "google_compute_subnetwork_iam_member" "tfc-agent-private-default-network-user" { provider = google-beta.sharedvpc subnetwork = google_compute_subnetwork.private-default["tms"].id role = "roles/compute.networkUser" region = var.google_region member = format("serviceAccount:%s", google_service_account.tfc-agent-sa.email) } # KMS # ----------------------------------------------- # Create a KMS key ring to store the key resource "google_kms_key_ring" "tfc_agent_key_ring" { provider = google.sharedvpc location = var.google_region name = "${local.prefix}-${var.env_name}-tfc-agent-key-ring" } # Create a crypto key in the generated above key ring resource "google_kms_crypto_key" "tfc_agent_crypto_key" { provider = google.sharedvpc name = "${local.prefix}-${var.env_name}-tfc-agent-crypto-key" key_ring = google_kms_key_ring.tfc_agent_key_ring.id rotation_period = "100000s" lifecycle { prevent_destroy = true } } data "google_project" "sharedvpc_project" { provider = google.sharedvpc } # Giving permissions to Service account to use the key resource "google_kms_crypto_key_iam_binding" "tfc_agent_crypto_key_iam_binding" { provider = google.sharedvpc crypto_key_id = google_kms_crypto_key.tfc_agent_crypto_key.id role = "roles/cloudkms.cryptoKeyEncrypterDecrypter" members = [ format("serviceAccount:service-%s@compute-system.iam.gserviceaccount.com", data.google_project.sharedvpc_project.number), ] } # Reserve IP # ----------------------------------------------- resource "google_compute_address" "tfc-agent-private" { provider = google.sharedvpc count = local.safi_environments[var.env_name].tfc_agents name = "${local.prefix}-${var.env_name}-tfc-agent-${count.index}-v2" subnetwork = google_compute_subnetwork.private-default["tms"].id address_type = "INTERNAL" region = var.google_region } # Instance # ----------------------------------------------- resource "google_compute_instance" "tfc-agent" { provider = google.sharedvpc count = local.safi_environments[var.env_name].tfc_agents name = "${local.prefix}-${var.env_name}-tfc-agent-${count.index}-v2" machine_type = local.tfc_agent_machine_type[var.env_name] zone = var.google_zone tags = ["tfc-agents"] boot_disk { initialize_params { image = "cos-cloud/cos-97-16919-103-28" } kms_key_self_link = google_kms_crypto_key.tfc_agent_crypto_key.id } labels = { container-vm = "cos-97-16919-103-28" } metadata = { google-logging-enabled = "true" gce-container-declaration = yamlencode({ spec = { containers = [ { image = "docker.io/hashicorp/tfc-agent:1.3" name = "${local.prefix}-${var.env_name}-tfc-agent-${count.index}" env = [ { name = "TFC_AGENT_TOKEN" value = "${var.tfc_agent_token}" }, { name = "TFC_AGENT_NAME" value = "${local.prefix}-${var.env_name}-tfc-agent-${count.index}" }, { name = "TFC_AGENT_SINGLE" value = true } ] } ] stdin = false tty = false restartPolicy = "Always" } } ) } network_interface { network = google_compute_network.shared_vpc.id subnetwork = google_compute_subnetwork.private-default["tms"].id network_ip = google_compute_address.tfc-agent-private[count.index].address } service_account { email = google_service_account.tfc-agent-sa.email scopes = ["cloud-platform", "userinfo-email"] } }
Integration:
The agents are added to firewall rules and authorized networks as necessary so they can run the necessary terraform commands for modification the resource. For example for GKE we add them to authorized networks.
authorized_cidrs = concat([ { cidr = local.safi_network.cicd.k8s.nodes display_name = "Argo CD K8s nodes" }, ], [for index, address in data.terraform_remote_state.common_workspace.outputs.tfc_agents_ip_addresses : { cidr = "${address}/32" display_name = "TFC Agent #${index}" }])
Limitations:
With our setup of multiple workspaces it takes a while the agents to provision a whole new environment. The more agents we have in an environment the faster it goes.
Starting / Shutting down the agents takes a long while.
If you provision resources (for example GKE) with one agent, and then add another you might have issues because of the way we add them to authorized_networks.
New agents are really expensive.
Scalability:
The agent scale pretty well the more you have the faster they implement the resources.
The scalability is dependent on buying more agent licenses.