High availability (HA) is a system feature designed to provide a consistent level of uptime for prolonged periods. Google Cloud offers a robust and highly available architecture based on 24 regions and 73 availability zones.
The purpose of an HA configuration is to reduce downtime when a zone or instance becomes unavailable. This might happen during a zonal outage, or when an instance runs out of memory. With HA, your data continues to be available to client applications.
The HA configuration provides data redundancy.
Configure HA for GCP resources
GKE - Google Kubernetes Engine (
google_container_cluster
,google_container_node_pool
)GKE offers two types of clusters: regional and zonal. In a zonal cluster topology, a cluster's control plane and nodes all run in a single compute zone that you specify when you create the cluster. In a regional cluster, the control plane and nodes are replicated across multiple zones within a single region.
Regional clusters consist of a three Kubernetes control planes quorum, offering higher availability than a zonal cluster can provide for your cluster’s control plane API.
After you create a cluster, you cannot change it from zonal to regional, or regional to zonal.Scale horizontally and vertically:
Capacity planning is important, but you can’t predict everything. To ensure that your workloads operate properly at times of peak load—and to control costs at times of normal or low load—we recommend exploring GKE’s autoscaling capabilities that best fit your needs.Enable Cluster Autoscaler to automatically resize your nodepool size based on demand.
Use Horizontal Pod Autoscaling to automatically increase or decrease the number of pods based on utilization metrics.
Use Vertical Pod Autoscaling (VPA) in conjunction with Node Auto Provisioning (NAP a.k.a., Nodepool Auto Provisioning) to allow GKE to efficiently scale your cluster both horizontally (pods) and vertically (nodes).VPA automatically sets values for CPU, memory requests, and limits for your containers. NAP automatically manages node pools, and removes the default constraint of starting new nodes only from the set of user created node pools.
Cloud Storage Bucket (
google_storage_bucket
):
Geo-redundant storage with the highest level of availability and performance is ideal for low-latency, high-QPS content serving to users distributed across geographic regions. Cloud Storage provides the availability and throughput needed to stream audio or video directly to apps or websites.You permanently set a geographic location for storing your object data when you create a bucket.
You cannot change a bucket's location after it's created, but you can move your data to a bucket in a different location.
You can select from the following location types:
A region is a specific geographic place, such as São Paulo.
A dual-region is a specific pair of regions, such as Tokyo and Osaka.
A multi-region is a large geographic area, such as the United States, that contains two or more geographic places.
All Cloud Storage data is redundant across at least two zones within at least one geographic place as soon as you upload it.
Additionally, objects stored in a multi-region or dual-region are geo-redundant. Objects that are geo-redundant are stored redundantly in at least two separate geographic places separated by at least 100 miles.
Default replication is designed to provide geo-redundancy for 99.9% of newly written objects within a target of one hour. Newly written objects include uploads, rewrites, copies, and compositions.
Turbo replication provides geo-redundancy for all newly written objects within a target of 15 minutes. Applicable only for dual-region buckets.
Bucket’s Available Locations
PostgreSQL DB (
google_sql_database_instance
)
You can configure an instance for high availability when you create the instance, or you can enable high availability on an existing instance.
A Cloud SQL instance configured for HA is also called a regional instance and has a primary and secondary zone within the configured region. Within a regional instance, the configuration is made up of a primary instance and a standby instance. Through synchronous replication to each zone's persistent disk, all writes made to the primary instance are replicated to disks in both zones before a transaction is reported as committed. In the event of an instance or zone failure, the standby instance becomes the new primary instance. Users are then rerouted to the new primary instance. This process is called a failover.
After a failover, the instance that received the failover continues to be the primary instance, even after the original instance comes back online. After the zone or instance that experienced an outage becomes available again, the original primary instance is destroyed and recreated. Then it becomes the new standby instance. If a failover occurs in the future, the new primary will fail over to the original instance in the original zone.
Kubernetes apps HA configuration
For fault tolerance configure charts with
replicaCount: 2
If chart supports
horizontalpodautoscaler
it's necessary to define pod resources requests first, e.g.:resources: requests: memory: 512Mi cpu: 200m
after that enable autoscaling, e.g.:
autoscaling: enabled: true minReplicas: 2 maxReplicas: 5 targetCPUUtilizationPercentage: 80 targetMemoryUtilizationPercentage: 80
Some nuances and details
>> Daemonset
doesn't support replicaCount
and autoscaling
>> Some apps may support HPA but require extra configurations to enable it.
For example Grafana official chart (used in SaFi project):
- By default Grafana uses SQLite3 as database, but for HA it requires to use external shared DB for multiple servers, e.g. PostgreSQL. After configuring external DB it will be possible to scale with replicas
parameter