Indexing PDF files is not supported in Azure App Service environment

  • Description

    The following exception is thrown in the Azure App Service environment when indexing the content of a media item, which contains a blob with PDF file:

    ERROR Could not compute value for ComputedIndexField: _content for indexable: sitecore://master/{8EEE161B-F7D1-4339-AE77-1FA10B8CF8D2}?lang=en&ver=1
    Exception: System.Runtime.InteropServices.COMException
    Message: Exception from HRESULT: 0x80048605
    Source: Sitecore.ContentSearch
       at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.IPersistStream.Load(IStream stream)
       at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.InitializeFilterAsPersistStream(IFilter filter, String fileName)
       at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.LoadAndInitIFilter(String fileName, String extension)
       at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterReader..ctor(String fileName)
       at Sitecore.ContentSearch.ComputedFields.MediaItemIFilterTextExtractor.ComputeFieldValue(IIndexable indexable)
       at Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor.ComputeFieldValue(IIndexable indexable)
       at Sitecore.ContentSearch.Azure.CloudSearchDocumentBuilder.AddComputedIndexFields()

    The current implementation of the media indexing feature is based on IFilters, which are not supported in Azure App Service environment. To avoid these exceptions, you must disable the indexing of PDF files.

    Note: Sitecore is currently looking for ways to enable indexing of media items in the Azure App Service environment. The public reference number for this issue is 149879.

Applies to:

CMS 8.2 Update-1 - 9.0 Update-2

March 15, 2017
February 18, 2019

Reference number:

149909, 149879, 200006, 305015

Keywords: 

  • Azure,
  • Search and Indexing