Redis driver timeout issues

  • Description

    If a burst of traffic reaches your application while no free threads are available, the Timeout Exception will be thrown as a result of the Redis driver design. The Redis driver blocks the request thread until a response from the Redis Server is received and data is fully parsed by the callback. A lack of free threads to invoke the callback for parsing received data in a timely manner (one second by default) leads to timeout exception.

    Technical background

    A thread pool is allowed to create new worker threads to process incoming load under certain conditions. Adding more threads is beneficial only if free CPU resources are available. A thread pool will inject new threads when the CPU usage is below 80%.

    Since the CPU performance counter shows the system state for the previous second, the load produced by newly created threads will be reflected only in a second. This results in a no more than 2 threads per second creation constraint to prevent overloading CPU.

    Note: The CLR thread pool size management is an implementation detail that is subject to change at any time by the technology vendor.

    The current implementation is described in Redis FAQ: Important details about ThreadPool growth.

    Scenario

    If the thread pool has fewer free threads than the number of incoming requests, all of the threads will be taken by ASP.NET for incoming request processing, and a few more are created. The remaining are in the work queue.

    No free worker threads are left to parse the Redis response because all are blocked, waiting for the parsing results.

    Due to a lack of logic to acknowledge that response parsing has higher priority, a priority inversion takes place:

    • Sharing the common-purpose CLR thread pool leads to the possibility of pool clogging with other work items (incoming ASP.NET request processing).
    • Newly created thread pool threads are not guaranteed to pick the callback and can pick the incoming ASP.NET request instead.
    • The ASP.NET request blocks the thread until the session state response is parsed.
    • The response is not parsed due to a lack of free threads.

    The circular wait deadlock condition is resolved when the ASP.NET thread is unblocked by a timeout and throws an exception, leading to a thread being released.

    The released thread might be assigned for pending callback processing depending on the current work queue.

    Further reading

  • You can configure an application to secure free threads as well as tolerate network issues:

    To configure an application to secure free threads and tolerate network issues:

    1. Put the Sitecore.Support.210408.dll support patch assembly into the \bin folder.
    2. Put the Sitecore.Support.210408.config file into the \App_Config\Include\zzz folder.
    3. In both the Web.config file and in the \App_Config\Sitecore\Marketing.Tracking\Sitecore.Analytics.Tracking.config file (or \App_Config\Include\Sitecore.Analytics.Tracking.config file for Sitecore XP 8.x) define:
      • operationTimeoutInMilliseconds="5000"
      • retryTimeoutInMilliseconds="16000"
      • connectionTimeoutInmilliseconds= "3000"

    Notes:

    • The configuration values (both thread numbers and operation timeouts) are given for illustration purposes only and act only as a starting point.
    • The final values must be tuned per-solution as a result of load testing.
    • The source code of the patch: ConfigureThreadPool.cs.

Applies to:

CMS 8.0 Initial Release+

May 31, 2018
June 06, 2018

Reference number:

21040, 215600