Xerces Conversion from 1.7 to 2.xx

Stage 1 Stage 2
Xerces 2.6.0 Packages to convert

There are many benefits to moving to a more current Xerces version, but the programming interface differs significantly, requiring changes in all of our packages making Xerces calls or declaring objects belonging to Xerces classes. The two major changes breaking our code are

Compare documentation for the old api and for the new. Inevitably, when we move to Xerces 2.4 (or more likely at least 2.5, which came out in mid-February), all of our packages making use of Xerces classes will have to move in lockstep with it, however I have attempted to minimize the number of source code changes which will be required at that time by enhancing the xml::Dom class so that most native Xerces calls can be replaced by calls to methods of xml::Dom in advance of the switchover. In the first stage of the conversion package owners should eliminate Xerces calls in favor of xml::Dom methods wherever possible. Then the switch to 2.xx will require fewer code changes and they should be more regular and mechanical.

Stage 1

This has already been done for the packages xml, xmlUtil, detModel, calibUtil, and CalibSvc. Remaining packages include at least flux, classification, optimizers, and likelihood. I'm assuming that package owners would rather make the Stage 1 conversion themselves, but if not I am willing to do it.

Hints to xml-using package owners

A typical change would be to replace something like this..

DOMString   valueString = elt.getAttribute("attName");
int         value = atoi(xml::Dom::transToChar(valueString));

with

try {
  int value = xml::Dom::getIntAttribute(elt, "attName");
}
catch (xml::DomException ex) {
  std::string msg = ex.getMsg();
  ..
}

Be warned that methods xml::Dom::getIntAttribute and xml::Dom::getDoubleAttribute are somewhat fussier than the library routines atoi and atof or even facililities::Util::atoi. The xml::Dom methods insist that all characters in the input be used to form a valid numeric string. See the (slightly out of date) doxygen documentation for xml::Dom or the latest CVS repository version of Dom.h in the xml package. [Note: DomElement is just a typedef for DOM_Element, DomString is a typedef for DOMString, and so forth.] I would like to make the transTo.. methods and the addAttribute method with a DOMString argument private to the xml package. The remaining methods avoid all use of DOMString in favor of std::string or char*. xml::Dom also has methods using std::vector<..> where native Xerces methods use, e.g., DOM_NodeList.

If, in the process of converting one of the remaining packages, you find you can't avoid DOMString objects or are having to make extensive native Xerces calls for any purpose,

Stage 2

The Plan

  1. Build debug version of Xerces 2.4 (or greater) for Redhat 9/gcc 3.2; install at SLAC in standard location.
  2. In my private development area, using private version of IExternal/XMLEXT convert xml, the xmlUtil. xml is a special case because it necessarily has lots of native calls to Xerces. xmlUtil is a heavy user of xml and has some native calls of its own. Once these packages are converted, converting the remaining packages should be relatively smooth sailing.
  3. Commit changes but do not tag (since tagging would break the LATEST build)
  4. Build or obtain Xerces 2.xx binaries for Windows 7 and 7.1. Verify that the new modified code works on Windows
  5. In a working directory upgrade as many of the remaining packages as possible before tagging anything
  6. Commit and tag everything; look carefully at LATEST for both Science Tools and GlastRelease on both platforms

Anyone involved at all in this process might be interested in the Xerces migration guide.

The Reality

I more or less got through #2 above, but ran into a snag: I have been unable to successfully do the standard processing on geometry files. Depending on how the parser is configured, code fails in different places, but there seems to be no combination of options which will do the same kinds of things we've been doing with Xerces 1.7. This is most likely a Xerces bug. I've filed a bug report with Xerces. It only concerns XML documents which, like our standard geometry files, use something called general entities to include several physical files. For our purposes, we need this include mechanism to behave just like #include does; that is, from the application's point of view it might just as well have been a single physical file (the behavior described for validating parsers in the W3C XML 1.0 Recommendation; see sections 4.4, 4.4.2, 4.4.3) but I can't get newer versions of Xerces to behave this way. I've been advised that this bug is unlikely to be fixed soon unless I propose a patch.

For the immediate future, I suggest we complete the stage 1 conversion but stick with 1.7. My experience so far bears out the guess that, for most packages using xml, the stage 2 conversion will be straightforward. If some compelling reason to change versions appears before the Xerces bug has been fixed, there is a possible work-around. In a preprocessing stage the geometry files for a particular instrument can be read in and written back out as a single file. There is a sample program distributed with Xerces called DOMPrint which does it correctly for our geoometry files. Perhaps there is some way to get CMT to run this program or some similar program as part of the build procedure for xmlGeoDbs.

Patch

As of 7 March I built a patched 2.5.0 Xerces library which behaves correctly for our applications. I had to modify two methods in the class AbstractDOMParser. I've submitted the changes as an attachment to the bug report .

The patched library has been installed in the standard place for SLAC Linux: $GLAST_EXT/xerces/2.5.0 (unpatched version is in $GLAST_EXT/xerces/2.5.0-save). If you would like to try it, you might want to check out packages users/jrb/IExternal/XMLEXT and users/jrb/xml. Eventually, when everyone is ready to move to the new Xerces, these or something very similar will be in the standard CVS subdirectories.

Xerces 2.6.0

Version 2.6.0 of Xerces has been released and includes the changes we need. I've built it and tested it on SLAC Linux. It handles our geometry files correctly. I also built it on Windows with Studio 7.1 but haven't yet tested it.

Migration help

Packages which are directly dependent on xml (that is, have #includes for files from the xml package) should have a use statement which includes the major version. Currently, for xml it should look like

use xml v4r*

After conversion the 4 will become a 5

For an example of a package using xml which has been converted for the new Xerces, see users/jrb/xmlUtil. Most of the changes required are of one of the following types:

Packages needing conversion

PackageExtentContainers
xmlmajorGlastRelease, ScienceTools
xmlUtilmajorGlastRelease
detModel middlingGlastRelease
fluxmiddlingGlastRelease, ScienceTools
calibUtil minorGlastRelease
CalibSvc middlingGlastRelease
classification minor/middlingGlastRelease
optimizers?ScienceTools
Likelihood?ScienceTools

J. Bogart
Last modified: Friday, 05-Nov-2004 16:51:48 PST