Sitecore Managed Cloud Standard — Disaster Recovery

  • The Sitecore Managed Cloud Disaster Recovery feature allows customers to maintain or quickly resume mission-critical functions following a disaster, thus supporting the customer's business continuity plan. When a disaster occurs in a region containing the production environment (primary), the Disaster Recovery tool allows the environment to be recovered into another region (secondary) or disaster recovery site.

    Sitecore currently provides two disaster recovery options:

    • Basic 
    • Hot-warm

    This article provides information on both Basic disaster recovery and Hot-warm disaster recovery configurations, workflows, and architectural aspects to be aware of.

  • The following prerequisites are common for both Basic and Hot-warm disaster recovery options:

    1. The customer's Sitecore Managed Cloud solution should be compliant with compatibility requirements described here: https://kb.sitecore.net/articles/768387.
    2. The customer will be eligible to request the Disaster recovery feature only if it is purchased within the Managed Cloud contract.
    3. The customer should have a valid Sitecore license file, Sitecore certificate, and password when requesting the disaster recovery setup from Sitecore Support.
  • When a disaster happens, Sitecore should receive an alert mail within 15 minutes. On the basis of the alert mail, Sitecore validates the authenticity of the alert and creates a support ticket to investigate the issue, and informs the customer about the initial investigation.

    If the issue turns out to be the result of any kind of disaster that means that the primary resource group cannot be recovered temporarily, Sitecore will start the failover process, after approval from the customer. The customer can also raise request for this through the Support Portal.

    The failover process provides the customer with a secondary environment with which they can continue business critical activities until the primary environment becomes available.

    Notes:

    • Customers can request the Disaster Recovery feature to be part of their Sitecore environment. By doing so, this feature will be explicitly mentioned in the customer's contract.
    • The customer should provide Sitecore with a valid Sitecore license and Sitecore certificate (.pfx) file with the certificate credentials while requesting the failover process.
    • The time duration required for the failover process to make the secondary environment available will vary according to the kind of Disaster Recovery setup used by the customer.

    Sitecore has two Disaster Recovery features as described as follows:

    Basic Disaster Recovery

    This recovery option takes a longer Recovery Time . This is because the secondary Sitecore environment will be created as part of the failover process.

    • Backup technology: SQL Azure Geo-Replication, Azure APIs
    • Secondary environment state: Created on-demand
    • Recovery process:
      1. Deploy
      2. Restore
      3. Customer validate
      4. Go live

    Hot-Warm Disaster Recovery

    This recovery option has a shorter recovery time which is lot quicker than the Basic Disaster Recovery. This is because the secondary Sitecore environment is already deployed and only requires to be validated before going live.

    Below are the key specifications:

    • Backup technology: SQL Azure Geo-Replication, Azure APIs
    • Secondary environment state: Fully deployed, but shut down
    • Recovery process:
      1. Wake up
      2. Customer validate
      3. Go live

    Please raise a support query for detailed information on specific Disaster Recovery features on the Sitecore Support Portal.

  • Disaster recovery introduces some new considerations when you are building a Sitecore solution. This section of the document tries to address some of the most common ones.

    Choosing your Azure Region

    Azure organizes its datacenters into regions with a latency-defined perimeter and connected through a dedicated regional low-latency network. When choosing a secondary datacenter, we recommended choosing one in the same region as the primary, to ensure fast backups and consistent customer delivery speeds. To find compatible regions, see the article here.

    Third party service APIs

    If the Sitecore implementation is using any third party service APIs that limit access based on IP, then it is essential to register the IPs of the secondary datacenter with the service. Failure to register the IPs could result in a delay in bringing the secondary Sitecore environment online.

    Outage page

    Managed Cloud uses Azure Storage to host an informational page in case of an outage. Using Azure Storage means the outage page needs to be static (that is, pure HTML), which means no custom backend code can be executed for the page. It is recommended that the outage page only contains the necessary information to assure customers that the site will be back online soon, for example:

    • Mentioning that the site is temporarily unavailable.
    • Support staff are aware and are working on it.
    • The Approximate recovery time.

    One of the limitations of using Azure Storage (or even Azure Web Apps) for the outage page is that you cannot return a 503 (Service Unavailable) HTTP code with a custom page. If a search engine is crawling your site and sees a 503, then it understands that your site is down temporarily and comes back and re-indexes later. If a search engine tries to index your site while it is down and no 503 is returned, then it can lead to some undesirable SEO effects.

  • This section describes limitations to the Disaster Recovery options provided by Managed Cloud.

    No removal of Control Resource Group

    The Control Resource Group contains all resources used to restore Sitecore successfully in a secondary datacenter. Deleting the Control Resource Group or its resources can lead to an inability to perform a successful recovery.

    xDB is excluded while considering the recovery time

    The Recovery time needed while doing the failover process  does not cover the xDB rebuild due to the significant amount of time it can take for a large content database. If the analytics indexes are not rebuilt this should only affect functionality that depends on lists (for example, EXM) and should not affect the frontend site.

    No recovery of Additonal Modules during Failover

    Additional modules such as CDN, WFFM, JavaScript Services, and so on, that have been installed during the primary Sitecore installation (that is, the primary environment of the customer) cannot be recovered during the Failover process. Therefore these modules have to be added separately after the Failover process has been completed.

    xConnect Search Indexer

    Sitecore can only have one active xConnect Search Indexer WebJob across a solution. In case of any failover and restore of service, the indexer must be shut down.

    Certificates in Azure

    Only one website certificate is supported with Managed Cloud Disaster Recovery at this time. One possible workaround for this is using wildcard certificates.

    Azure requirements and cost considerations

    All disaster recovery options are dependent on Azure WebApp Backup and Traffic Manager, which require a minimum of the Standard Tier for WebApps.

    Failover situations not supported

    There is a small set of situations where it might not be possible to restore a production site into the secondary datacenter. For example, when a global Azure service such as authentication or Traffic Manager is down.

    1. How do customers request the Managed Cloud Disaster Recovery (DR) feature for Sitecore Managed Cloud environments?
      Customer can ask to set up Disaster Recovery (Basic or Hot-warm) through the Sitecore regional office or Sitecore sales team.
    2. What actions do customers need to take once the DR setup is done?
      Once the DR setup is completed, customers are requested to perform the following actions:
      • Configure the custom domain of CD instance to point to the DNS name of the traffic manager using a DNS CNAME record.
      • Configure the outage page according to the customer's specifications.
      The instructions to do so are provided by Sitecore engineers after provision of the DR setup. Alternatively, customer can raise a raise a support query for detailed information on the Sitecore Support Portal.
    3. What are the new resources that are introduced once the DR setup is done?
      Post provision of DR setup, the customer is able to see the following resource groups according to the chosen DR type:
      • If the customer has chosen Basic Disaster Recovery, they are able to see one additional resource group along with their Primary Resource group. This resource group is called the Control Resource group, and contains resources necessary for monitoring and executing the DR setup and failover activities.
      • If the customer has chosen Hot-warm Disaster Recovery, they are able to see two additional Resource groups. These resource groups are the Control Resource Group and the Secondary Resource group. The Control Resource group contains resources necessary for monitoring and executing the DR setup and failover activities. The Secondary Resource group contains the same resources as the primary Resource group.
    4. Do customer have limited access rights on the DR resources?
      Sitecore provides limited access to customers on the additional resource groups (Control and Secondary). This helps Sitecore to prevent any changes to the configurations related to backup policies and automations.
    5. How is the paired region chosen for the DR setup?
      Sitecore choses the best paired region for our customer that complies with Microsoft's standards. More detailed descriptions are provided here.

Applies to:

Managed Cloud 1+

June 07, 2019
September 03, 2019