Distributed processing tasks hang and cannot be completed

  • Description

    Under some circumstances, all the distributed processing tasks in the system hang and no task is completed. It is related to a bug in the GenericProcessing_GetState stored procedure inside the Sitecore.Processing.Pools database. The mentioned stored procedure does not respect the state of the specific processing pool but rather returns the state of all the pool items in the table. This means that task A cannot be finished if there are items in the processing pool for task B.

    In the log files, this behavior is seen only if DEBUG level logging is turned on. The following log messages are present:

    DEBUG TaskAgent (<MachineName> 3556 1): Picked task with Id 1d845ab5-d31d-4ccc-9e82-a7de496ef36a
    DEBUG DistributedWorkExecutor (<MachineName> 3556 1): Start execution.
    DEBUG InteractionCursorScheduler (<MachineName> 3556 1): Get next batch from cursor.
    DEBUG InteractionCursorScheduler (<MachineName> 3556 1): Get cursor through split.
    DEBUG InteractionCursorScheduler (<MachineName> 3556 1): No cursor to work on. Return.
    DEBUG BinaryKeyInteractionProcessingPoolScheduler: Get next batch from pool.
    DEBUG TaskAgent (<MachineName> 3556 1): Complete task 1d845ab5-d31d-4ccc-9e82-a7de496ef36a.
    DEBUG TaskAgent (<MachineName> 3556 1): Picked task with Id 1d845ab5-d31d-4ccc-9e82-a7de496ef36a.

    Note: There might be other log entries between any of these messages. An important pattern to notice is that, after completing a task, the TaskAgent picks up the task with the same ID (the last two log entries).

    Also, in the Sitecore.Processing.Tasks database, all the cursor entries (ProcessingCursors table) with the TaskId column equal to the task ID mentioned in the log entries above, have the IsCompleted column set to true. This means that the same Processing task is picked up indefinitely, even though there is nothing to process.

    One of the ways that this is visible is that Path Analyzer and Experience Analytics stop displaying new data because they leverage the processing pools to calculate data.

  • To fix this issue, you must clear the GenericProcessingPool table in the Sitecore.Processing.Pools database. Execute the following script (replace <Sitecore.Processing.pools database> with your Sitecore.Processing.Pools database name):

    USE [<Sitecore.Processing.pools database>]
    DELETE FROM [xdb_processing_pools].[GenericProcessingPool]

    When the GenericProcessingPool table is clean, restart the Distributed Processing tasks.

Applies to:

CMS 9.0 Initial Release - 9.0 Update-1

June 06, 2018
June 06, 2018