SRE are responsible for designing, building, and operating systems that meet specified service level objectives (SLOs) for availability, performance, and scalability. SRE teams typically follow a set of best practices for designing and building systems, including:
Proactive monitoring and alerting to detect and respond to problems before they affect users
Automated deployment and testing to quickly and safely roll out changes to systems
Root cause analysis to understand and prevent recurring issues
Capacity planning to ensure that systems have sufficient resources to meet anticipated demand
Performance optimization to ensure that systems are running efficiently and effectively
Disaster recovery planning to ensure that systems can be restored quickly in the event of a failure.
Pair Programming
Presentation slides
Agreed Guidelines for Pairing
Pairing should not start until coding, If still in the research part, it is preferred to be separated
Engineer pairing should be Senior-Senior, Senior-Mid, NOT Junior-Junior, Mid-Mid?
If both pairs (or no one in the team) have no idea on the project, read number 1.
Pairs that have friction should not pair (If partners hate each other)
If pair is not around or on leave, no need for temporary pairing
Driver - Controls the keyboard and mouse - types the actual code, leads the conversation and discussions through the process, consults with the Navigator, focuses on short-term actions
Navigator - Pays attention to the driver, looks ahead for strategy, monitors code and design, does not cross the Driver's comfort zone, makes relevant suggestions.
Switch roles generally every 30mis to 2hrs
Things to avoid:
Interrupting every time your partner starts doing something in a way other than how you'd do it
Once you wrestle the keyboard away, delete your partner’s last edit immediately.
Then doing what you were thinking of instead.
If your partner objects, ignore it and talk over it
Speaking non-stop, your partner never has a moment of silence to speak, read code, or have a thought.
Task that don’t need Pairing
"trivial work" typically refers to routine, repetitive tasks that are not high-impact or critical to the operation of a system. These are the tasks that can be automated or outsourced, and don't require the attention of highly skilled engineers. Here are a few examples of tasks (but are not limited to) that a Site Reliability Engineer might perform that do not typically require pair programming:
Task | Condition |
---|---|
user creation (okta, gcp, confluent etc.) | unless newly hire |
deploying new microservice or updating app parameters in argocd | unless newly hire |
creating tf resources with templated parameters using yaml file (ex users.yaml, projects.yaml, buckets.yaml etc.) | unless newly hire |
dns create and updates | unless newly hire |
Performance monitoring and tuning: Analyzing and improving the performance of systems and services through metrics and log analysis. | unless newly hire |
Maintenance: Performing regular maintenance tasks such as software updates, security patches, and backups. | unless newly hire |
Monitoring, collecting metrics on system performance and troubleshooting Handling low-priority alerts or incidents Post-mortem analysis Deploying software updates or bug fixes Documentation |
Task that needs Pairing
Here are some examples of SRE tasks that might benefit from pair programming
Designing and building new features
Incident management: helps to ensure that incidents are handled quickly and efficiently, by allowing engineers to work together to understand and resolve problems.
Refactoring or optimizing existing code, improvement ideas to a currently working process.
New tech projects to be implemented/integrated in the current system.
security and compliance : pair programming can ensure that security and compliance guidelines are incorporated in the system design and coding. This way, a second pair of eyes can validate that the system meets the requirement and identify any potential security or compliance issues.
In summary, Pair programming can be useful in situations where two engineers need to work together to solve a problem or complete a task, and when there is a need to share knowledge or collaborate on design decisions. The idea is to have two people to share their thoughts and knowledge to tackle a problem, to increase productivity, and reduce errors