@Nadeesha Nanayakkara Good questions. SRE is a maturing area and some concepts of it are more descriptive than prescriptive in nature. Here's how I would answer your queries,
Ans1: This is related to your earlier query in a way. Yes, Error Budget is set for a given SLO you have defined. But, as an advanced practice of SRE, you can combine 2 or more SLOs also called as composite SLOs which can then be used for determining Error Budget. A similar approach can be followed for SLIs (defining composite SLIs) when you want to define a single 'larger' SLO. Either of these approaches can help you get to your end goal. The impact of breaching an SLO should be targeted more towards helping you determine your development/build focus on new features vs stability rather than directly assigning a $$ value to it.
Ans2: Determining the right error budget involves multiple factors and does not belong to just one team taking a decision. All stakeholders related to the service need to be a part of this decision. One of the considerations is selecting the correct time period. This is usually depending on factors like your development sprint durations, nature of your service, stage of your product (early product, mature product), industry and current SLA.
Having a common time period for all SLOs helps in effectively utilising error budget as you can determine how aggressively you can move forward adding more features or enhancing functionality vs reliability improvements. If your application has a common dev team for all services it helps having common time periods whereas if you have dedicated dev teams for different services you can choose to have different SLOs.
Pls keep in mind a core reason to define an error budget is to be able to make certain go / no-go decisions on development which can then become your guiding light for other decisions you have to make.
@Nadeesha Nanayakkara Good questions. SRE is a maturing area and some concepts of it are more descriptive than prescriptive in nature. Here's how I would answer your queries,
Ans1: This is related to your earlier query in a way. Yes, Error Budget is set for a given SLO you have defined. But, as an advanced practice of SRE, you can combine 2 or more SLOs also called as composite SLOs which can then be used for determining Error Budget. A similar approach can be followed for SLIs (defining composite SLIs) when you want to define a single 'larger' SLO. Either of these approaches can help you get to your end goal. The impact of breaching an SLO should be targeted more towards helping you determine your development/build focus on new features vs stability rather than directly assigning a $$ value to it.
Ans2: Determining the right error budget involves multiple factors and does not belong to just one team taking a decision. All stakeholders related to the service need to be a part of this decision. One of the considerations is selecting the correct time period. This is usually depending on factors like your development sprint durations, nature of your service, stage of your product (early product, mature product), industry and current SLA.
Having a common time period for all SLOs helps in effectively utilising error budget as you can determine how aggressively you can move forward adding more features or enhancing functionality vs reliability improvements. If your application has a common dev team for all services it helps having common time periods whereas if you have dedicated dev teams for different services you can choose to have different SLOs.
Pls keep in mind a core reason to define an error budget is to be able to make certain go / no-go decisions on development which can then become your guiding light for other decisions you have to make.