Redis driver timeout issues

  • If a burst of traffic reaches the application while no free threads are available, the Timeout Exception is thrown as a result of the Redis driver design. The Redis driver blocks the request thread until a response from the Redis Server has been received and data is fully parsed by the callback. A lack of free threads to invoke the callback for parsing received data in a timely manner (one second by default) leads to a timeout exception.

    Technical background

    A thread pool is allowed to create new worker threads to process incoming load under certain conditions. Adding more threads is beneficial only if free CPU resources are available. A thread pool injects new threads when the CPU usage is below 80%.

    Since the CPU performance counter shows the system state for the previous second, the load produced by the newly-created threads is reflected only in a second. This results in no more than 2 threads per second creation constraint to prevent overloading the CPU.

    Note: The CLR thread pool size management is an implementation detail that is subject to change at any time by the technology vendor.

    The current implementation is described in Redis FAQ: Important details about ThreadPool growth.

    Scenario

    If the thread pool has fewer free threads than the number of incoming requests, all of the threads are taken by ASP.NET for incoming request processing, and a few more are created. The remaining ones are in the work queue.

    No free worker threads are left to parse the Redis response because all are blocked while waiting for the parsing results.

    Due to a lack of logic to acknowledge that response parsing has higher priority, a priority inversion takes place:

    • Sharing the common-purpose CLR thread pool leads to the possibility of pool clogging with other work items (incoming ASP.NET request processing).
    • The newly-created thread pool threads are not guaranteed to pick the callback and can pick the incoming ASP.NET request instead.
    • The ASP.NET request blocks the thread until the session state response has been parsed.
    • The response is not parsed due to a lack of free threads.

    The circular wait deadlock condition is resolved when the ASP.NET thread is unblocked by a timeout and throws an exception, leading to a thread being released.

    The released thread might be assigned for pending callback processing depending on the current work queue.

    Further reading

  • To be able to configure an application to secure free threads as well as tolerate network issues:

    To configure an application to secure free threads and tolerate network issues:

    1. Put the Sitecore.Support.210408.dll support patch assembly into the \bin folder.
    2. Put the Sitecore.Support.210408.config file into the \App_Config\Include\zzz folder.
    3. In both the Web.config file and in the \App_Config\Sitecore\Marketing.Tracking\Sitecore.Analytics.Tracking.config file (or \App_Config\Include\Sitecore.Analytics.Tracking.config file for Sitecore XP 8.x) define:
      • operationTimeoutInMilliseconds="5000"
      • connectionTimeoutInmilliseconds= "3000"

    Notes:

    • The configuration values (both thread numbers and operation timeouts) are given for illustration purposes only and act only as a starting point.
    • The final values must be tuned per-solution as a result of load testing.
    • The source code of the patch: ConfigureThreadPool.cs.
  • To resolve this issue, install the hotfix for your specific Sitecore XP release:

    Note: See the readme.txt file inside the archive for the installation instructions.

    If you continue to experience issues related to the number of threads not being enough, you can also consider applying the following configuration changes:

    • Disable ThreadPoolSizeMonitor by removing the following config node from the <initialize> pipeline in the Sitecore.Analytics.Tracking.config file:
      <processor type="Sitecore.Analytics.Pipelines.Loader.StartThreadPoolSizeMonitor, Sitecore.Analytics" />
    • Handle the session expiration throttling by configuring session state provider polling: configure the pollingMaxExpiredSessionsPerSecond setting as described here.
  • To overcome possible Redis timeout issues, apply the following configuration changes:

    • Change the value of the SessionExpirationThreadCount setting in the \App_Config\Sitecore.config file to specify how many threads should process the work items in the queue when a session expires.
    • Change the value of the maxConcurrencyLevel setting in the Web.config file to specify how many threads should handle session expiration logic.
    • In both the Web.config file and the \App_Config\Sitecore\Marketing.Tracking\Sitecore.Analytics.Tracking.config file define:
      • operationTimeoutInMilliseconds to tolerate application CPU saturation.
      • connectionTimeoutInmilliseconds to tolerate network failures.

    Possible cases are as follows:

    1. Default values of the settings are used:
      • SessionExpirationThreadCount and maxConcurrencyLevel (Environment.ProcessorCount and Environment.ProcessorCount * 2 accordingly)
      • operationTimeoutInMilliseconds and connectionTimeoutInmilliseconds (5000ms and 1000ms accordingly)

      In this case items expiration will be slow.

    2. Increased values of these settings are used. In this case items expiration will be faster and Redis memory usage will be lower.

Applies to:

CMS 8.0 Initial Release+

CMS 9.2 Initial Release

May 31, 2018
November 01, 2019

Reference number:

210408, 215600