Sitecore Managed Cloud Standard — Disaster Recovery

  • The Sitecore Managed Cloud Disaster Recovery feature allows customers to maintain or quickly resume mission-critical functions following a disaster, thus supporting the customer's business continuity plan. When a disaster occurs in a region containing the production environment (primary), the Disaster Recovery tool allows the environment to be recovered into another region (secondary) or disaster recovery site.

    Sitecore currently provides three disaster recovery options:

    • Basic
    • Hot-warm
    • Hot-hot

    This article provides information on all the above mentioned disaster recovery configurations, workflows, and architectural aspects to be aware of.

  • The following prerequisites are common for all the disaster recovery options:

    1. The customer's Sitecore Managed Cloud solution should be compliant with compatibility requirements described here: /articles/768387.
    2. The customer will be eligible to request the Disaster recovery feature only if it is purchased within the Managed Cloud contract.
    3. The customer should have a valid Sitecore license file, Sitecore certificate, and password when requesting the disaster recovery setup from Sitecore Support.
  • When a disaster happens, Sitecore should receive an alert mail within 15 minutes. On the basis of the alert mail, Sitecore validates the authenticity of the alert and creates a support ticket to investigate the issue, and informs the customer about the initial investigation.

    If the issue turns out to be the result of any kind of disaster that means that the primary resource group cannot be recovered temporarily, Sitecore will start the failover process, after approval from the customer. The customer can also raise request for this through the Support Portal.

    The failover process provides the customer with a secondary environment with which they can continue business critical activities until the primary environment becomes available.

    Notes:

    • Customers can request the Disaster Recovery feature to be part of their Sitecore environment. By doing so, this feature will be explicitly mentioned in the customer's contract.
    • The customer should provide Sitecore with a valid Sitecore license and Sitecore certificate (.pfx) file with the certificate credentials while requesting the failover process.
    • The time duration required for the failover process to make the secondary environment available will vary according to the kind of Disaster Recovery setup used by the customer.
    • Please find the procedure of SolrCloud failover in the FAQ section for customers who have purchased Managed Cloud with Solr Cloud.

    Sitecore has three Disaster Recovery features as described as follows:

    Basic Disaster Recovery

    This recovery option takes a longer recovery time. This is because the secondary Sitecore XP environment will be created as part of the failover process.

    • Backup technology: SQL Azure Geo-Replication, Azure APIs.
    • Secondary environment state: Created on-demand.
    • Recovery process:
      1. Deploy
      2. Restore
      3. Customer validate
      4. Go live

    Hot-warm Disaster Recovery

    This recovery option has a shorter recovery time which is lot quicker than the Basic Disaster Recovery. This is because the secondary Sitecore XP environment is already deployed and only requires to be validated before going live.

    Below are the key specifications:

    • Backup technology: SQL Azure Geo-Replication, Azure APIs.
    • Secondary environment state: Fully deployed, but shut down.
    • Recovery process:
      1. Wake up
      2. Customer validate
      3. Go live

    Hot-hot Disaster Recovery

    This recovery option has the quickest recovery time compared to Hot-warm and Basic disaster recovery configurations. This is because the secondary Sitecore XP environment is already up and running, and during the fail-over the endpoints will be switched at the traffic manager to bring the service back online.

    Below are the key specifications:

    • Backup technology: SQL Azure Geo-Replication, Azure APIs
    • Secondary environment state: Fully deployed, Up and running, Exact replica of Primary
    • Recovery process:
      1. Switch-over
      2. Go live

    Please raise a support query for detailed information on specific Disaster Recovery features on the Sitecore Support Portal.

  • Disaster recovery introduces some new considerations when you are building a Sitecore XP solution. This section of the document tries to address some of the most common ones.

    Choosing your Azure Region

    Azure organizes its datacenters into regions with a latency-defined perimeter and connected through a dedicated regional low-latency network. When choosing a secondary datacenter, we recommended choosing one in the same region as the primary, to ensure fast backups and consistent customer delivery speeds. To find compatible regions, see the article here.

    Third party service APIs

    If the Sitecore implementation is using any third party service APIs that limit access based on IP, then it is essential to register the IPs of the secondary datacenter with the service. Failure to register the IPs could result in a delay in bringing the secondary Sitecore environment online.

    Outage page

    Managed Cloud uses Azure Functions to serve an outage page in case of an outage. Using Azure Functions means the outage page will return a 503 code to indicate the service is unavailable. It is recommended that the outage page only contains the necessary information to assure customers that the site will be back online soon, for example:

    • Mentioning that the site is temporarily unavailable.
    • Support staff are aware and are working on it.
    • The Approximate recovery time.
  • This section describes limitations to the Disaster Recovery options provided by Managed Cloud.

    No removal of Control Resource Group

    The Control Resource Group contains all resources used to restore the Sitecore XP environment successfully in a secondary datacenter. Deleting the Control Resource Group or its resources can lead to an inability to perform a successful recovery.

    List of files which are excluded while performing backup for Disaster recovery

    DR setup configures a backup process to backup all the web apps to meet DR fail-over needs. In order to achieve this there are certain files in the primary web apps that are excluded from the backup. Here is a table that describes the exclusion (applicable for Sitecore XP 9.1 Initial Release and 9.1 Update-1):

    File Topology Roles Details
    \site\wwwroot\App_Data\logs \site\wwwroot\App_Data\debug \site\wwwroot\App_Data\diagnostics \site\wwwroot\App_Data\MediaCache \site\wwwroot\App_Data\packages \site\wwwroot\App_Data\viewstate \site\wwwroot\temp * CD These are temp/log files. No backup since logs are usually in large sizes which will impact the backup duration and cost.
    \site\wwwroot\bin\Feature.HADR_PublishAPI.dll
    \site\wwwroot\bin\Foundation.HADR_WebApi.dll
    * CM HADR related API files.

    xDB is excluded while considering the recovery time

    The Recovery time needed while doing the failover process does not cover the xDB rebuild due to the significant amount of time it can take for a large content database. If the analytics indexes are not rebuilt this should only affect functionality that depends on lists (for example, EXM) and should not affect the frontend site.

    No recovery of Additonal Modules during Failover

    Additional modules such as CDN, WFFM, JavaScript Services, and so on, that have been installed during the primary Sitecore XP installation (that is, the primary environment of the customer) cannot be recovered during the Failover process. Therefore these modules have to be added separately after the Failover process has been completed.

    xConnect Search Indexer

    Sitecore XP can only have one active xConnect Search Indexer WebJob across a solution. In case of any failover and restore of service, the indexer must be shut down.

    Certificates in Azure

    Only one website certificate is supported with Managed Cloud Disaster Recovery at this time. One possible workaround for this is using wildcard certificates.

    Azure requirements and cost considerations

    All disaster recovery options are dependent on Azure WebApp Backup and Traffic Manager, which require a minimum of the Standard Tier for WebApps.

    Failover situations not supported

    There is a small set of situations where it might not be possible to restore a production site into the secondary datacenter. For example, when a global Azure service such as authentication or Traffic Manager is down.

    Azure Service Bus is not supported

    HADR does not support Azure Service Bus Synchronization, Backup/Restore or Replication. This is applicable for Sitecore XP 9.2.0 only.

  • How do customers request the Managed Cloud Disaster Recovery (DR) feature for Sitecore Managed Cloud environments?
    The customer can ask to set up Disaster Recovery (Basic or Hot-warm) through the Sitecore regional office or Sitecore sales team.

    What actions do customers need to take once the DR setup is done?
    Once the DR setup is completed, customers are requested to perform the following actions:

    • Configure the custom domain of CD instance to point to the DNS name of the traffic manager using a DNS CNAME record.
    • Configure the outage page according to the customer's specifications.

    The instructions for how to do so are provided by Sitecore engineers after provision of the DR setup. Alternatively, the customer can raise a raise a support query for detailed information on the Sitecore Support Portal.

    What are the new resources that are introduced once the DR setup is done?
    Post provision of DR setup, the customer is able to see the following resource groups according to the chosen DR type:

    • If the customer has chosen Basic Disaster Recovery, they are able to see one additional resource group along with their Primary Resource group. This resource group is called the Control Resource group, and contains resources necessary for monitoring and executing the DR setup and failover activities.
    • If the customer has chosen Hot-warm or Hot-hot Disaster Recovery, they are able to see two additional Resource groups. These resource groups are the Control Resource Group and the Secondary Resource group. The Control Resource group contains resources necessary for monitoring and executing the DR setup and failover activities. The Secondary Resource group contains the same resources as the primary Resource group.

    Do customers have limited access rights on the DR resources?
    Sitecore provides limited access to customers on the additional resource groups (Control and Secondary). This helps Sitecore to prevent any changes to the configurations related to backup policies and automations.

    How is the paired region chosen for the DR setup?
    Sitecore chooses the best paired region for our customer that complies with Microsoft's standards. More detailed descriptions are provided here.

    What is the procedure of enabling DR for SolrCloud?

    Sitecore follows below procedures while enabling Disaster Recovery setup for Managed Cloud customers who have purchased Managed Cloud instances with SolrCloud, to provide DR availability for both.

    • Basic Disaster Recovery and Hot-Warm Disaster Recovery setup
      During disaster recovery, Sitecore provisions a secondary Solr instance which will be configured with customer's secondary environment. In both scenarios, we will rebuild indexes in the secondary Solr instance while doing Disaster Recovery failover.
    • Hot-Hot Disaster Recovery Setup
      During disaster recovery, Sitecore will enable Hot Disaster Recovery feature for SolrCloud cluster as well with help of our SolrCloud provider. This feature will equip secondary Solr cluster with live replication from primary to secondary and quick failover to secondary during disaster.

Applies to:

Managed Cloud 1+

June 07, 2019
November 28, 2019