Sitecore Managed Cloud Standard — monitoring metrics

  • This article provides the detailed list of default metrics monitored by the Sitecore Cloud Operations team for all Managed Cloud customers. More metrics may be included in future releases.

    For more information about the Managed Cloud Standard monitoring description, please refer to the service aspects article.

  • Sitecore Managed Cloud comes with Advanced Monitoring package starting from all new environments activated after May 2019. The full list of included monitored metrics is highlighted below.

    Azure Web Apps:

    • Web App Content Delivery alert - Availability Tests on /sitecore/service/keepalive.aspx - Every 5 min
    • Web App Http Request 5xx > 10 within 15 min (One alert per Web App)
    • Web App Average Respond time > 1 sec (One alert per Web App)
    • Web App Page response time greater than 30 sec over the last 30 min
    • Web App Hosting Plan Memory usage > 95%
    • Web App Hosting Plan CPU usage > 95%

    Azure SQL database:

    • SQL Databases Throughput unit (DTU) > 95% for 5 min (One alert per DB) - Every 5 min
    • SQL Databases Storage percentage > 75% for 5 min (One alert per database) - Every 15 min (total breaches 1 & frequency 15 min)
    • SQL Databases Database connection failure (One alert per database)
    • SQL Databases CPU > 95% for 5 min (One alert per DB)
    • SQL Databases Storage percentage > 75% for 5 min (One alert per database)
    • SQL Databases Deadlock count > 0 (One alert per database)
    • SQL Databases Data IO percentage > 95% for 5 min (One alert per database)
    • SQL Databases Log IO percentage > 95% for 5 min (One alert per database)
    • SQL Databases Workers percentage > 95% for 5 min (One alert per database)
    • SQL Databases Concurrent Sessions Limit >95% the last 5 min
    • SQL Databases Failed database connections > 5 over the last 5 min
    • SQL Databases In-Memory OLTP storage average greater than 95% over 30 min

    Azure Search Service:

    • Azure Search Average response time > 1 sec over last 5 min (One alert per Azure Search)
    • Azure Search Throttled search queries > 5% over last 5 min (One alert per Azure Search)
    • Azure Search Average Search Query latency greater than 10 sec over the last 30 min
    • Azure Search Service Unavailable Responses > 250 in the last 15 min
    • Azure Search Service storage > 90% (already provided by Cloud Ops)

    Azure Redis Cache:

    • Azure Redis Cache Server load > 95%
    • Azure Redis Cache Percent Processor Time > 95%
    • Azure Redis Cache High number of connected clients over the last 30 min

    SearchStax (SOLR) server:

    • CPU Usage > 80%
    • JVM Heap Memory > 80%
    • Disk space > 80%
    • Search metrics:
      • Average time/request > 3 seconds
      • Timeouts > 10
      • Errors > 10
    • Indexing metrics:
      • Average time/request > 60 seconds
      • Timeouts > 10
      • Errors > 10

    MongoDB server:

    • Availability
    • Performance: CPU > 90% or page faults > 10 per sec
    • Capacity: storage space used > 90%
    • Replication set rollback on failover

    IMPORTANT NOTE: The monitoring package does not yet support the following deployment types:

    • Sitecore single topologies: xP0, xM0, xDB0
    • Sitecore XP version 9.2.0+
  • For customers who joined before May 2019, classic monitoring was used to enable the Cloud Operations team with the required signals to provide important insights and react to any outages or service degradation. Starting September 2019, all customers joined prior to May 2019 were migrated to Alerts Basic package (more details here). More details about Basic package can be found in the relevant article or in the list of predefined alerts below.

    Azure Web Apps:

    • Web App Content Delivery alert - Availability Tests on /sitecore/service/keepalive.aspx - Every 5 min
    • Web App Http Request 5xx > 10 within 15 min (One alert per Web App)
    • Web App Hosting Plan Memory usage > 95%
    • Web App Hosting Plan CPU usage > 95%

    Azure SQL database:

    • SQL Databases Throughput unit (DTU) > 95% for 5 min (One alert per DB) - Every 5 min
    • SQL Databases CPU > 95% for 5 min (One alert per DB)
    • SQL Databases Storage percentage > 75% for 5 min (One alert per database) - Every 15 min (total breaches 1 & frequency 15 min)
    • SQL Databases Data IO percentage > 95% for 5 min (One alert per database)
    • SQL Databases Log IO percentage > 95% for 5 min (One alert per database)
    • SQL Databases Workers percentage > 95% for 5 min (One alert per database)

    Azure Redis Cache:

    • Azure Redis Cache Server load > 95%

    Azure Search Service:

    • Capacity: storage space used > 90%

    MongoDB server:

    • Availability
    • Performance: CPU > 90% or page faults > 10 per second
    • Capacity: storage space used > 90%
    • Replication set rollback on failover

Applies to:

CMS 8.2 Update-1+, Managed Cloud 1+

January 03, 2018
October 23, 2019

Reference number:

347913

Keywords: 

  • Managed Cloud