Hexadecimal value is an invalid character during media indexing

  • Index documents are submitted to a Solr server in XML format. Third-party media content extractors might produce characters that cannot be converted to XML. As a result, the entire documents batch cannot be indexed. An error similar to the following can be found in log records:
    Exception: System.ArgumentException
    Message: '', hexadecimal value 0x1F, is an invalid character.
    Source: System.Xml
       at System.Xml.XmlEncodedRawTextWriter.WriteElementTextBlock(Char* pSrc, Char* pSrcEnd)
       at System.Xml.XmlEncodedRawTextWriter.WriteString(String text)
       at System.Xml.XmlWellFormedWriter.WriteString(String text)
       at System.Xml.Linq.ElementWriter.WriteElement(XElement e)
       at System.Xml.Linq.XElement.WriteTo(XmlWriter writer)
       at System.Xml.Linq.XNode.GetXmlString(SaveOptions o)
       at SolrNet.Commands.AddCommand`1.ConvertToXml()
       at SolrNet.Commands.AddCommand`1.Execute(ISolrConnection connection)
       at SolrNet.Impl.LowLevelSolrServer.SendAndParseHeader(ISolrCommand cmd)
       at Sitecore.ContentSearch.SolrProvider.SolrBatchUpdateContext.AddRange(IEnumerable`1 group, Int32 groupSize)
       at Sitecore.ContentSearch.SolrProvider.SolrBatchUpdateContext.Commit()
       at Sitecore.ContentSearch.AbstractSearchIndex.PerformUpdate(IEnumerable`1 indexableInfo, IndexingOptions indexingOptions)
       at Sitecore.ContentSearch.AbstractSearchIndex.Update(IEnumerable`1 indexableInfo)
  • To resolve the issue, consider one of the following solutions compatible with the affected product version:
    • For Sitecore XP 9.3 Initial Release:
      1. Verify that the Sitecore.ContentSearch.SolrProvider.dll assembly version matches 6.0.0-r00353.3573 (right-click on the file, click Properties, then Details, and select the Product version property).
      2. Download and install the following hotfix: SC Hotfix 394458-2.
      Contact Sitecore Support if the assembly version does not match the default version.
    • For Sitecore XP 9.2 Initial Release:
      1. Verify that the Sitecore.ContentSearch.SolrProvider.dll assembly version matches 5.0.0-r00290 (right-click on the file, click Properties, then Details, check the Product version property).
      2. Download and install the following hotfix: SC Hotfix 380817-1.
      Contact Sitecore Support if the assembly version does not match the default version.

    • For Sitecore XP 9.1 Update-1 and lower: Currently, there is no solution for these versions. We recommend upgrading to a newer Sitecore XP version. There are a number of mitigation options that can be used if an upgrade is not possible at the moment:
      • Option 1 (if searching through the content of media files is not used in your solution):

      • Option 2 (if the solution is not hosted in Azure Web Apps):
        Try switching to a different third-party media content extraction implementation.
        • For Sitecore XP 9.1 Update-1:

          1. Update the Sitecore XP configuration as described here.
          2. Install a third-party iFilter for PDF content extraction. 

        • For Sitecore XP 9.1 Initial Release:

          1. Verify that the Sitecore.ContentSearch.dll assembly version matches 4.0.0-r00230 (right-click on the file, click Properties, then Details, and select the Product version property).
          2. Download and install the following hotfix: SC Hotfix 305412-1.
            The hotfix allows using iFilter for PDF content extraction. Content extraction for DOCX, PPTX, and XLSX files is disabled.
          3. Install a third-party iFilter for PDF content extraction.
          Contact Sitecore Support for an alternative solution if the assembly version does not match the default version.

        • For Sitecore XP 9.0 Update-2 and lower:

          Try changing the iFilters that are used for media content extraction.

      • Option 3 (if nothing helps):
        Contact Sitecore Support. Describe the options you tried and the reason these options do not work for your solution.

Applies to:

CMS 7.0 Initial Release - 9.3 Initial Release

CMS 10.0 Initial Release

June 23, 2020
August 04, 2020

Reference number:

305015, 334381, 361915, 380816, 402105

Keywords: 

  • Search and Indexing