What are RPO/RTO and Why do they Matter

What are RPO/RTO and Why do they Matter

First, we need to define each of these commonly used industry abbreviations:

·        Recover Time Objective (RTO):  The targeted duration and service level within which a business process must be operational after a service disruption or disaster event.

·        Recovery Point Objective (RPO):  The maximum targeted time in which data might be lost due to a service disruption or disaster event.

RTO

In the most basic terms, RTO is how long before we get the workload back online. While this is a simple idea, understanding the time needed for this process can vary greatly from business to business, and even from workload to workload within the same business.

We need to determine if the workload is mission critical, and what the cost of the workload being off line is. This will determine one of the major factors of your Disaster Recovery (DR) architecture. If a workload affects few users, with little impact on revenue while not creating compliance or legal issues, a longer outage of the workload may be acceptable. This could be as long as a week or in some cases more. However; if the workload affects a larger user base and prevents productivity, or causes a large loss in revenue, a long outage could be potential damaging to the company. Some industries may have legal or compliance requirements that certain systems, security logging systems for example, must have high levels of uptime. These systems may require a short RTO and require a more complex architecture to allow for low to no downtime.

RPO

RPO, how much data can I stand to lose. If I must recover from a backup that is 24 hours old, what have I lost? Is the workload a transactional financial workload where loss of data is loss of revenue, or is my workload more statistical and the loss of 24 hours of metrics is an acceptable risk.

In a financial workload, it may be acceptable to stop taking transactions for a period of time so long as no transaction is lost. We may be willing to sacrifice RTO for a higher RPO.

If the service has transient data that is valued in volume, i.e. big data, and will not be affected by the loss of a few data points, then we are most concerned with getting the service back online, RTO. With a statistical workload, loosing recent data may be acceptable so long as the workload is recovered and available to give analysis of the overall data. In this case availability, RTO, is more important than ensuring we have up to the minute recovery, RPO.


What your RPO and RTO requirements are will have a direct impact on the cost of the solution. Understanding the closer to zero data loss and zero downtime you get, the greater the complexity and expense of the solution.

Understanding what the business requirements are is key to understanding how to plan any DR architecture. Once it is known what an acceptable outage time is and what our acceptable data loss is, can a plan for data replication and a system recovery strategy be planned and deployed. 

Mauricio D.

CISSP, CCSP ; Secure Cloud and Content Storage Expert

6y

Good overview. In practical situations, the biggest challenge is 'when' to declare (0 HR) the disaster/failure.

Karen Wang

Program & project management, risk management, business continuity, crisis management

6y

You can replace "DR architecture" with "BC strategy" in this statement "Understanding what the business requirements are is key to understanding how to plan any DR architecture" to see how similar business-continuity planning is to DR planning.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics