Sitecore Managed Cloud Standard — Disaster Recovery

  • Description

    The Sitecore Managed Cloud Disaster Recovery feature allows customers to maintain or quickly resume mission-critical functions following a disaster, thus supporting the customer's business continuity plan. When a disaster occurs in a region containing the production environment (primary), the Disaster Recovery tool allows the environment to be recovered into another region (secondary) or disaster recovery site.

    Sitecore currently provides two disaster recovery options:

    • Basic 
    • Hot-warm

    This article provides information on both Basic disaster recovery and Hot-warm disaster recovery configurations, workflows, and architectural aspects to be aware of.

  • Prerequisites

    The following prerequisites are common for both basic and hot-warm disaster recovery options:

    1. The customer's Sitecore Managed Cloud solution should be compliant with compatibility requirements described here: https://kb.sitecore.net/articles/768387.
    2. The customer will be eligible to request the Disaster recovery feature only if it is purchased within the Managed Cloud contract.
    3. The customer should have a valid Sitecore license file, Sitecore certificate, and password while requesting the disaster recovery setup from Sitecore Support.
  • When a disaster happens, Sitecore should receive an alert mail within 15 minutes. On the basis of the alert mail, Sitecore validates the authenticity of the alert and creates a support ticket to investigate the issue, and informs the customer about the initial investigation.

    If the issue turns out to be the result of any kind of disaster that means that the primary resource group cannot be recovered temporarily, Sitecore will start the failover process, after approval from the customer. The customer can also raise request for this through the Support Portal.

    The failover process provides the customer with a secondary environment with which they can continue business critical activities until the primary environment becomes available.

    Notes:

    • Customers can request the Disaster Recovery feature to be part of their Sitecore environment. By doing so, this feature will be explicitly mentioned in the customer's contract.
    • The customer should provide Sitecore with a valid Sitecore license and Sitecore certificate (.pfx) file with the certificate credentials while requesting the failover process.
    • The time duration required for the failover process to make the secondary environment available will vary according to the kind of Disaster Recovery setup used by the customer.

    Sitecore has two Disaster Recovery features as described below.

    Basic Disaster Recovery

    This recovery option takes a longer Recovery Time . This is because the secondary Sitecore environment will be created as part of the failover process.

    • Backup technology: SQL Azure Geo-Replication, Azure APIs
    • Secondary environment state: Created on-demand
    • Recovery process:
      1. Deploy
      2. Restore
      3. Customer validate
      4. Go live

    Hot-Warm Disaster Recovery

    This recovery option has a shorter recovery time which is lot quicker than the Basic Disaster Recovery . This is  because the secondary Sitecore environment is already deployed and only requires to be validated before going live.

    Below are the key specifications:

    • Backup technology: SQL Azure Geo-Replication, Azure APIs
    • Secondary environment state: Fully deployed, but shut down
    • Recovery process:
      1. Wake up
      2. Customer validate
      3. Go live

    Please raise a support query for detailed information on specific Disaster Recovery features on the Sitecore Support Portal.

  • Considerations

    Disaster recovery introduces some new considerations when you are building a Sitecore solution. This section of the document tries to address some of the most common ones.

    Choosing your Azure Region

    Azure organizes its datacenters into regions with a latency-defined perimeter and connected through a dedicated regional low-latency network. When choosing a secondary datacenter, we recommended choosing one in the same region as the primary, to ensure fast backups and consistent customer delivery speeds. To find compatible regions, see the article here.

    3rd party service APIs

    If the Sitecore implementation is using any 3rd party service APIs that limit access based on IP, then it is essential to register the IPs of the secondary datacenter with the service. Failure to register the IPs could result in a delay in bringing the secondary Sitecore environment online.

    Outage page

    Managed Cloud uses Azure Storage to host an informational page in case of an outage. Using Azure Storage means the outage page needs to be static (that is, pure HTML), which means no custom backend code can be executed for the page. It is recommended that the outage page only contains the necessary information to assure customers that the site will be back online soon, for example:

    • Mentioning that the site is temporarily unavailable.
    • Support staff are aware and are working on it.
    • The Approximate recovery time.

    One of the limitations of using Azure Storage (or even Azure Web Apps) for the outage page is that you cannot return a 503 (Service Unavailable) HTTP code with a custom page. If a search engine is crawling your site and sees a 503, then it understands that your site is down temporarily and comes back and re-indexes later. If a search engine tries to index your site while it is down and no 503 is returned, then it can lead to some undesirable SEO effects.

  • Limitations

    This section describes limitations to the Disaster Recovery options provided by Managed Cloud.

    No removal of Control Resource Group

    The Control Resource Group contains all resources used to restore Sitecore successfully in a secondary datacenter. Deleting the Control Resource Group or its resources can lead to an inability to perform a successful recovery.

    xDB is excluded while considering the recovery time

    The Recovery time needed while doing the failover process  does not cover the xDB rebuild due to the significant amount of time it can take for a large content database. If the analytics indexes are not rebuilt this should only affect functionality that depends on lists (for example, EXM) and should not affect the frontend site.

    No recovery of Additonal Modules during Failover

    Additional modules such as CDN, WFFM, JavaScript Services, and so on, that have been installed during the primary Sitecore installation (that is, the primary environment of the customer) cannot be recovered during the Failover process. Therefore these modules have to be added separately after the Failover process has been completed.

    xConnect Search Indexer

    Sitecore can only have one active xConnect Search Indexer WebJob across a solution. In case of any failover and restore of service, the indexer must be shut down.

    Certificates in Azure

    Only one website certificate is supported with Managed Cloud Disaster Recovery at this time. One possible workaround for this is using wildcard certificates.

    Azure requirements and cost considerations

    All disaster recovery options are dependent on Azure WebApp Backup and Traffic Manager, which require a minimum of the Standard Tier for WebApps.

    Failover situations not supported

    There is a small set of situations where it might not be possible to restore a production site into the secondary datacenter. For example, when a global Azure service such as authentication or Traffic Manager is down.

Applies to:

Managed Cloud 1+

June 07, 2019
July 24, 2019