Backup for GKE is a service for backing up and restoring workloads in GKE clusters. It has two components:

  • A Google Cloud API that serves as the control plane for the service.

  • A GKE add-on (the Backup for GKE agent) that must be enabled in each cluster for which you wish to perform backup and restore operations.

Backups of your workloads may be useful for disaster recovery, CI/CD pipelines, cloning workloads, or upgrade scenarios. Protecting your workloads can help you achieve business-critical recovery point objectives.

(warning) Backup for GKE requires full privileges to read and write every object in the cluster. The Backup for GKE agent that runs in GKE cluster versions prior to 1.24 runs as a workload in the GKE user cluster. Users or workloads with root access to the underlying node on which the Backup for GKE pod is scheduled, such as through pod hostpath mounts or SSH, can gain these root-in-cluster privileges. To avoid this potential node to cluster escalation, we highly recommend that you run Backup for GKE in GKE clusters running version 1.24.4-gke.800 or higher, where the agent runs on an inaccessible host in the GKE control plane. (warning)

Once enabled, the Backup for GKE service integrates with the GKE UI, Google Cloud CLI and REST APIs, providing consistent workflows for development and operations. Two forms of data are captured in a backup:

  • Config backup: a set of Kubernetes resource descriptions extracted from the API server of the cluster undergoing backup, capturing the cluster state.

  • Volume backups: a set of volume backups that correspond to PersistentVolumeClaim resources found in the config backup.

You can choose which workloads that you want to back up or restore, or you can back up or restore all workloads. You can back up workloads from one cluster and restore them into another cluster. You can schedule your backups to automatically run, so that you can respond quickly to recover your workloads in the event of an incident.

Restoring a workload involves re-creating Kubernetes resources in the target cluster. After the resources are created, restoration of workload functionality is subject to the cluster reconciliation process (for example, Pods are scheduled to nodes, and then Pods are started on those nodes). During restoration, you can optionally apply substitution rules, which are used to match a set of resources and substitute the current value of an attribute on those resources for a new value.

The combination of selective backup and restore with substitutions is designed to enable and support many different backup and restore scenarios, for example:

  • Back up all workloads in a cluster and restore them into a separate cluster for disaster recovery.

  • Back up all workloads, but selectively roll back a single workload in the source cluster.

  • Back up the resources in one namespace and clone them into another namespace.

  • Migrate or clone a workload from one cluster to another cluster.

  • Change the storage parameters for a workload (for example, move the workload from a zonal persistent disk to a regional persistent disk).

You must create a target cluster with the Backup for GKE service enabled before you can back up or restore any workloads.

Backup for GKE consists of two main components:

  • A service that runs in Google Cloud and supports a resource-based REST API. This service serves as the control plane for Backup for GKE. The service includes Google Cloud console UI elements that interact with this API.

  • An agent that runs in every cluster where backups or restores are performed. The agent runs backup and restore operations in these clusters by interacting with the Backup for GKE API.

The diagram that shows the relationship between the different Backup for GKE components:

Backup for GKE architecture

Service

The Backup for GKE service provides an API endpoint for clients to interact with. The Backup for GKE API, like most Google Cloud APIs, operates against application-specific cloud resources in a resource hierarchy. Backup for GKE manages a database of these application-specific resources and the service API methods mostly correspond to create, read, update, or delete operations against these resources.

There are two primary active resource types in the cloud resource model:

  • Backup: Represents the backup of a particular portion of a GKE cluster at a specific point in time. Creating a Backup resource initiates the backup process (eventually storing copies of the target Kubernetes resources and creating snapshots of the target persistent disk volumes). Deleting a Backup deletes these stored artifacts.

  • Restore: Represents the restore of a selected portion of a specific Backup into a GKE cluster. Creating a Restore resource initiates the restore process. Deleting a Restore has no side effects, and simply removes the record of the restore from the database.

Backup for GKE also includes two configuration and control resource types:

  • BackupPlan: a parent resource for Backup resources that represent a chain of backups. This resource contains a backup configuration including the source cluster, the selection of which workloads to back up, and the region in which Backup artifacts produced under this plan are stored.

  • RestorePlan: provides a reusable restore template. This resource contains a restore configuration including the target cluster in which you want to restore the backup, the source backup plan, the scope of the restore, conflict handling, and substitution rules.

Agent

The Backup for GKE agent is deployed and runs in each GKE cluster that you configure to be backed up by the Backup for GKE service. The agent is responsible for running the backup and restore activities, for example:

  • Backup:

    • Orchestrating the backup process.

    • Fetching resources from the Kubernetes API server, serializing them into an archive, and storing the archive.

    • Creating backups of underlying volumes associated with PersistentVolumeClaims.

  • Restore:

    • Orchestrating the restore process.

    • Fetching the Kubernetes resource archive from storage, extracting the selected resources, applying the appropriate modifications to these resources, and creating them in the target cluster.

    • Creating volumes and wiring them into the Kubernetes configuration of the target cluster.

The agent is packaged as a containerized Kubernetes workload and runs under a dedicated service account in a dedicated namespace in each cluster.

Administrators do not interact with the agent, as the agent is driven by custom Kubernetes resources (BackupJob and RestoreJob) automatically created in the cluster by the Backup for GKE service in response to the creation of backup and restore cloud resources. However, administrators can influence the orchestration of backups by creating optional ProtectedApplication Kubernetes resources in the cluster. These ProtectedApplication resources are unique to Backup for GKE and provide more fine-grained options for defining backup and restore scope.

What's not backed up

You can only back up Kubernetes resources and underlying persistent volumes with Backup for GKE. Backup for GKE does not back up the following:

  • GKE cluster configuration information such as node configuration, node pools, initial cluster size, or enabled features.

  • Container images referenced by a backup. Only the Kubernetes resources that describe the workload and refer to the container images are backed up. If an image referenced by a workload description in a backup is removed from its image repository, then a subsequent restore of that configuration will not successfully restore the workload.

  • Configuration information or state of services outside the cluster, such as Cloud SQL or external load balancers.

Next moves

Guides:

  1. Install Backup for GKE - how to enable Backup for GKE

    1. Plan a set of backups - how to create a Backup for GKE backup plan, which is used for backing up your workloads in GKE

    2. Back up your workloads - how to create a backup of your workloads in GKE using the Backup for GKE service

    1. Plan a set of restores - how to create a Backup for GKE restore plan, which is used for restoring your backups in GKE

    2. Restore a backup - how to restore a backup into a cluster in GKE using the Backup for GKE service

Attachments:

bfg_arch.svg (image/svg+xml)