Indexing arbitrary error (XMLStreamException Message: null)

Author Message

Jens Görisch

Monday 31 August 2009 7:11:30 am

Hello,

as the title implies, I have problems with indexing into the eZ Find Solr index.

First I want to make clear that I don't have problems with the eZ Find index script. Or at least I don't checked, if this error occurs with this script, too.

Explanation:
We are using a data model, that is ezContentObject-compliant, but more lightweight. To index this data model, we are using the schema file of eZ Find, since the core fields are the same.

This model holds ~8300 objects, which are indexed twice to switch between the "searchable index" and the "indexing index". eZ Publish has 97800 object indexed, which results in more than 100k objects in the index. I don't noticed this error with lower-count-indexes.

Now to the error itself:
When indexing, sometimes the update process causes an error (sometimes means a few XML packets, not a few index processes). The result from Solr is empty and the log-file contains the following entry:

SEVERE: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4004,3038]
Message: null
	at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:586)
	at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:321)
	at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195)
	at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
	at org.mortbay.jetty.Server.handle(Server.java:285)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
	at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

I dumped the XML data and checked the respective position. It always was an ordinary (but different) character. Validating the XML data with xmllint also results in valid XML. Occasionally even no error occurs and indexing succeeds.

I've found a workaround to bypass this temporarily, by just retrying the particular packages until <i>eZSolrBase::addDocs()</i> returns <i>true</i> (up to a count of 3). Strangely the <b>same</b> XML works the second or third time.

Does anybody can report about similar problems? And perhaps already have found a (real) solution and the reason for this?

Thanks in advance,

Jens Görisch

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 05:16:24
Script start
Timing: Jan 18 2025 05:16:24
Module start 'layout'
Timing: Jan 18 2025 05:16:24
Module start 'content'
Timing: Jan 18 2025 05:16:24
Module end 'content'
Timing: Jan 18 2025 05:16:24
Script end

Main resources:

Total runtime0.7801 sec
Peak memory usage4,096.0000 KB
Database Queries48

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0055 589.3203152.6563
Module start 'layout' 0.00550.0036 741.976639.5078
Module start 'content' 0.00910.7696 781.4844419.1953
Module end 'content' 0.77870.0014 1,200.679712.0859
Script end 0.7801  1,212.7656 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00320.4060160.0002
Check MTime0.00130.1726160.0001
Mysql Total
Database connection0.00090.111810.0009
Mysqli_queries0.738494.6476480.0154
Looping result0.00050.0679460.0000
Template Total0.741495.020.3707
Template load0.00200.260220.0010
Template processing0.739494.771120.3697
Template load and register function0.00040.049110.0004
states
state_id_array0.00150.188910.0015
state_identifier_array0.00110.138420.0005
Override
Cache load0.00160.2072170.0001
Sytem overhead
Fetch class attribute can translate value0.00120.153310.0012
Fetch class attribute name0.00060.075910.0006
XML
Image XML parsing0.00020.029910.0002
class_abstraction
Instantiating content class attribute0.00000.000410.0000
General
dbfile0.00150.1874100.0001
String conversion0.00000.000940.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
1content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
4content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
2content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 10
 Number of unique templates used: 6

Time used to render debug report: 0.0001 secs