Redis driver timeout issues

  • If a burst of traffic reaches the application while no free threads are available, the Timeout Exception is thrown as a result of the Redis driver design. The Redis driver blocks the request thread until a response from the Redis Server has been received and data is fully parsed by the callback. A lack of free threads to invoke the callback for parsing the received data in a timely manner (one second by default) leads to a timeout exception.

    Technical background

    A thread pool is allowed to create new worker threads to process incoming load under certain conditions. Adding more threads is beneficial only if free CPU resources are available. A thread pool injects new threads when the CPU usage is below 80%.

    Because the CPU performance counter shows the system state for the previous second, the load produced by the newly-created threads is reflected only in a second. This results in a creation constraint of no more than 2 threads per second to prevent overloading the CPU.

    Note: The CLR thread pool size management is an implementation detail that is subject to change at any time by the technology vendor.

    The current implementation is described in Redis FAQ: Important details about ThreadPool growth.

    Scenario

    If the thread pool has fewer free threads than the number of incoming requests, all of the threads are taken by ASP.NET for incoming request processing, and a few more are created. The remaining ones are in the work queue.

    No free worker threads are left to parse the Redis response because all are blocked while waiting for the parsing results.

    Due to the lack of logic to acknowledge that response parsing has higher priority, a priority inversion takes place:

    • Sharing the common-purpose CLR thread pool leads to the possibility of pool clogging with other work items (incoming ASP.NET request processing).
    • The newly-created thread pool threads are not guaranteed to pick the callback and can pick the incoming ASP.NET request instead.
    • The ASP.NET request blocks the thread until the session state response has been parsed.
    • The response is not parsed due to the lack of free threads.

    The circular wait deadlock condition is resolved when the ASP.NET thread is unblocked by a timeout and throws an exception, leading to a thread being released.

    The released thread might be assigned for pending callback processing depending on the current work queue.

    Further reading

  • To be able to configure an application to secure free threads as well as tolerate network issues: To configure an application to secure free threads and tolerate network issues:
    1. Put the Sitecore.Support.210408 support patch assembly (Sitecore.Support.210408.dll) into the \bin folder.
    2. Put the Sitecore.Support.210408.config file into the \App_Config\Include\zzz folder.
    3. In both the Web.config file and in the \App_Config\Sitecore\Marketing.Tracking\Sitecore.Analytics.Tracking.config file (or \App_Config\Include\Sitecore.Analytics.Tracking.config file for Sitecore XP 8.0.0-8.2.7) define:
      • operationTimeoutInMilliseconds="5000"
      • retryTimeoutInMilliseconds="16000"
      • connectionTimeoutInmilliseconds= "3000"
    Notes:
    • The configuration values (both thread numbers and operation timeouts) are given for illustration purposes only and act only as a starting point.
    • The final values must be tuned per-solution as a result of load testing.
    • The source code of the patch: ConfigureThreadPool.cs.
  • To resolve this issue, download and install the hotfix compatible with the affected product version:

    Be aware that the hotfix was built for a specific Sitecore XP version, and must not be installed on other Sitecore XP versions or in combination with other hotfixes. In case any other hotfixes have already been installed on a certain Sitecore XP instance, send a request for a compatibility check to Sitecore Support.

    Note that the ZIP file contents must be extracted to locate installation instructions and related files inside. The hotfixes must be installed on a CM instance and then synced with other instances using standard development practices.

    If you continue to experience issues related to the number of threads not being enough, you can also consider applying the following configuration changes:
    • Disable ThreadPoolSizeMonitor by removing the following config node from the <initialize> pipeline in the Sitecore.Analytics.Tracking.config file:
      <processor type="Sitecore.Analytics.Pipelines.Loader.StartThreadPoolSizeMonitor, Sitecore.Analytics" />
    • Handle the session expiration throttling by configuring session state provider polling: configure the pollingMaxExpiredSessionsPerSecond setting as described here.
  • To overcome possible Redis timeout issues, apply the following configuration changes in both the Web.config file and the \App_Config\Sitecore\Marketing.Tracking\Sitecore.Analytics.Tracking.config file which define:
    • operationTimeoutInMilliseconds to tolerate application CPU saturation.
    • connectionTimeoutInmilliseconds and retryTimeoutInMilliseconds to tolerate network failures.

Applies to:

CMS 8.0 Initial Release - 9.1 Update-1

CMS 9.2 Initial Release

May 31, 2018
November 17, 2020

Reference number:

210408, 215600