1. Home
  2. Tutorials
  3. C/C++
  4. Xerces C XML Parsing API 2.7
Yolinux.com Tutorial

Parsing XML with Xerces-C 2.7.0 C++ API

Xerces version 2.7.0 (See the YoLinux.com Xerces-C 3.0.1 update.)

Parsing XML files using the ApacheXML Xerces-C libraries.

Xerces-C Intro:

The Apache project's Xerces-C libraries support the DOM approach to XML parsing. The entire XML file is imported into memory and the data is held as nodes in a data tree which can be traversed for information.

The Xerces-C C++ parser home page: http://xml.apache.org/xerces-c/

Compiling Xerces-C:

  • Go to your working directory. i.e.: cd /home/user-1/src
  • Download Xerces-C 2.7.0 source from the Apache archive.
  • Unpack the downloaded file: tar -xzf xerces-c-src_2_7_0.tar.gz
  • Set the XERCESCROOT environment variable to the directory which contains the source code: (bash shell example)
    export XERCESCROOT=/home/user-id/src/xerces-c-src_2_7_0
  • Go to source directory: cd xerces-c-src_2_7_0/src/xercesc
  • Run script which runs "configure": runConfigure -plinux -cgcc -xg++ -C--prefix=/opt/ApacheXML
  • Build: make
  • Install: make install
    [Potential Pitfall]: If installing as root (required when installing to directory paths like /opt and /usr), remember that root also requires the environment variable XERCESCROOT. Possible error:
    make -C /obj install
    make: *** /obj: No such file or directory.  Stop.
    make: *** [install] Error 2
                
Options for "runConfigure":
Option Description
-h Help
-p platform-name Specify: aix, beos, linux, freebsd, netbsd, solaris, hp-10, hp-11, openserver, unixware, os400, 0s390, irix, ptx, tru64, macosx, cygwin, gnx, interix, mingw-msys
Required. No default.
Sets Makefile environment variable. i.e. PLATFORM=LINUX
-c compiler Choices: gcc, cc, xlc_r, icc, icpc, ecc
Default=cc
Sets Makefile environment variable CC
-x C++_compiler Chose: g++, CC, aCC, xlC_r, aCCOS, xlC_rv5compat, QCC
Default g++
Sets Makefile environment variable CXX
-d Build debug version
-m message-header Choices: inmem, icu, MsgFile, iconv
Default: inmem
Sets Makefile environment variable MESSAGELOADER used by Xerces.
-n net-accessor Choices: fileonly, libwww, socket, native
Default: socket
Sets Makefile environment variable NETACCESSOR
-t transcoder Choices: icu, Iconv400, uniconv390, IconvFBSD, IconvGNU, native
Default: native
Sets Makefile environment variable: TRANSCODER
-r thread-option Choices: pthread, dce (AIX, HP-11, Solaris), spoc (IRIX), none
Default: pthread
Sets Makefile environment variable: THREADS
-b bits-to-build Choices: 64, 32
Default: 32
Sets Makefile environment variable: BITSTOBUILD
-l extra-linker-options Sets Makefile environment variable: LDFLAGS
-z compiler-options
-C configure-options Example: -C--prefix=/opt
-P Install prefix

This will install development files such as include header files and libraries in "/opt" so compiler flags and linker flags are required:

  • Compiler flags: -I/opt/include
  • Linker flags: -L/opt/lib -lxerces-c


Creating an RPM for Xerces-C libraries:

The downloaded gzipped tar file can be used to generate an RPM:
rpmbuild -ta xerces-c-src_2_7_0.tar.gz

[Potential Pitfall]: RHEL6+ rpmbuild failure
If you get the following error running the command rpmbuild -ta xerces-c-src_2_7_0.tar.gz
error: line 13: Unknown tag: Copyright:    Apache
      
This error was found using RHEL6 which uses a revised version of rpmbuild from that when 2.7.0 was released.

Fix:

  • Un-tar: tar xzf xerces-c-src_2_7_0.tar.gz
  • Edit file xerces-c-src_2_7_0/xerces-c.spec Change line 13 from:

    Copyright:     Apache
          
    to:
    License:        Apache
          
  • Re-tar: tar czf xerces-c-src_2_7_0.tar.gz xerces-c-src_2_7_0
  • Build RPM: rpmbuild -ta xerces-c-src_2_7_0.tar.gz

[Potential Pitfall]: If you download the package "xerces-c-current.tar.gz", you may have to rename it to make it work. The error message will give you a clue as to what to name it.
In this example: mv xerces-c-current.tar.gz xerces-c-src_2_7_0.tar.gz
Then execute the "rpmbuild" command.

[Potential Pitfall]: If building as a Linux user, you will have to open up the directory permissions of /use/src/redhat/... or build as root user.

[Potential Pitfall]: This did not work with Red Hat Enterprise 5. (RHEL4 and 2.7.0 ok. RHEL5 and 3.0.1 ok. RHEL5 and 2.7.0 not ok) In this case I just downloaded the prebuild RPMs from http://pkgs.repoforge.org/xerces-c/

Results of rpmbuild -ta xerces-c-src_2_7_0.tar.gz

Red Hat Enterprise 6.3 RPMs:

Wrote: /home/user1/rpmbuild/SRPMS/xerces-c-2.7.0-3.src.rpm
Wrote: /home/user1/rpmbuild/RPMS/x86_64/xerces-c-2.7.0-3.x86_64.rpm
Wrote: /home/user1/rpmbuild/RPMS/x86_64/xerces-c-devel-2.7.0-3.x86_64.rpm
Wrote: /home/user1/rpmbuild/RPMS/x86_64/xerces-c-doc-2.7.0-3.x86_64.rpm
Wrote: /home/user1/rpmbuild/RPMS/x86_64/xerces-c-debuginfo-2.7.0-3.x86_64.rpm
      

or this generates the RHEL5 RPM packages:

  • /usr/src/packages/RPMS/i586/xerces-c-2.7.0-3.i586.rpm
  • /usr/src/packages/RPMS/i586/xerces-c-devel-2.7.0-3.i586.rpm
  • /usr/src/packages/RPMS/i586/xerces-c-doc-2.7.0-3.i586.rpm

or Red Hat Enterprise Linux 4 RPMs:

  • /usr/src/redhat/RPMS/i386/xerces-c-2.7.0-3.i386.rpm
  • /usr/src/redhat/RPMS/i386/xerces-c-devel-2.7.0-3.i386.rpm
  • /usr/src/redhat/RPMS/i386/xerces-c-debuginfo-2.7.0-3.i386.rpm
  • /usr/src/redhat/RPMS/i386/xerces-c-doc-2.7.0-3.i386.rpm

(Cleanup: rm -Rf /var/tmp/xerces-c-root /usr/src/redhat/BUILD/xerces-c-src2_7_0)

or (Fedora Core 3 x86_64)

  • /usr/src/redhat/SRPMS/xerces-c-2.7.0-3.src.rpm
  • /usr/src/redhat/RPMS/x86_64/xerces-c-2.7.0-3.x86_64.rpm
  • /usr/src/redhat/RPMS/x86_64/xerces-c-devel-2.7.0-3.x86_64.rpm
  • /usr/src/redhat/RPMS/x86_64/xerces-c-doc-2.7.0-3.x86_64.rpm
  • /usr/src/redhat/RPMS/x86_64/xerces-c-debuginfo-2.7.0-3.x86_64.rpm
Depending on your distribution and platform.

Install the RPMs with the command: rpm -ivh xerces-c-2.7.0-3.i586.rpm xerces-c-devel-2.7.0-3.i586.rpm xerces-c-doc-2.7.0-3.i586.rpm

Installing the RPM will place files in:

  • Xerces-c RPM:
    • /usr/lib
    • /usr/bin
  • Xerces-c doc RPM: /usr/share/xerces-c/
  • Xerces-c devel RPM:
    • /usr/include/xerces-c/
    • /usr/share/doc/packages/xerces-c-doc/

Because the development libraries and include files are located in the regular system areas expected by the compiler, the only linker flag required is "-lxerces-c"

Programming with Xerces-C:

XML file: sample.xml
01<?xml version="1.0" encoding="UTF-8" standalone="no"?>
02<root>
03   <ApplicationSettings
04           option_a = "10"
05           option_b = "24"
06           >
07   </ApplicationSettings>
08   <OtherStuff
09           option_x = "500"
10           >
11   </OtherStuff>
12</root>

Include file: parser.hpp
01#ifndef XML_PARSER_HPP
02#define XML_PARSER_HPP
03/**
04 *  @file
05 *  Class "GetConfig" provides the functions to read the XML data.
06 *  @version 1.0
07 */
08#include <xercesc/dom/DOM.hpp>
09#include <xercesc/dom/DOMDocument.hpp>
10#include <xercesc/dom/DOMDocumentType.hpp>
11#include <xercesc/dom/DOMElement.hpp>
12#include <xercesc/dom/DOMImplementation.hpp>
13#include <xercesc/dom/DOMImplementationLS.hpp>
14#include <xercesc/dom/DOMNodeIterator.hpp>
15#include <xercesc/dom/DOMNodeList.hpp>
16#include <xercesc/dom/DOMText.hpp>
17 
18#include <xercesc/parsers/XercesDOMParser.hpp>
19#include <xercesc/util/XMLUni.hpp>
20 
21#include <string>
22#include <stdexcept>
23 
24// Error codes
25 
26enum {
27   ERROR_ARGS = 1,
28   ERROR_XERCES_INIT,
29   ERROR_PARSE,
30   ERROR_EMPTY_DOCUMENT
31};
32 
33class GetConfig
34{
35public:
36   GetConfig();
37  ~GetConfig();
38   void readConfigFile(std::string&) throw(std::runtime_error);
39  
40   char *getOptionA() { return m_OptionA; };
41   char *getOptionB() { return m_OptionB; };
42 
43private:
44   xercesc::XercesDOMParser *m_ConfigFileParser;
45   char* m_OptionA;
46   char* m_OptionB;
47 
48   // Internal class use only. Hold Xerces data in UTF-16 SMLCh type.
49 
50   XMLCh* TAG_root;
51 
52   XMLCh* TAG_ApplicationSettings;
53   XMLCh* ATTR_OptionA;
54   XMLCh* ATTR_OptionB;
55};
56#endif

C++ Program file: parser.cpp
001#include <string>
002#include <iostream>
003#include <sstream>
004#include <stdexcept>
005#include <list>
006 
007#include <sys/types.h>
008#include <sys/stat.h>
009#include <unistd.h>
010#include <errno.h>
011 
012#include "parser.hpp"
013 
014using namespace xercesc;
015using namespace std;
016 
017/**
018 *  Constructor initializes xerces-C libraries.
019 *  The XML tags and attributes which we seek are defined.
020 *  The xerces-C DOM parser infrastructure is initialized.
021 */
022 
023GetConfig::GetConfig()
024{
025   try
026   {
027      XMLPlatformUtils::Initialize();  // Initialize Xerces infrastructure
028   }
029   catch( XMLException& e )
030   {
031      char* message = XMLString::transcode( e.getMessage() );
032      cerr << "XML toolkit initialization error: " << message << endl;
033      XMLString::release( &message );
034      // throw exception here to return ERROR_XERCES_INIT
035   }
036 
037   // Tags and attributes used in XML file.
038   // Can't call transcode till after Xerces Initialize()
039   TAG_root        = XMLString::transcode("root");
040   TAG_ApplicationSettings = XMLString::transcode("ApplicationSettings");
041   ATTR_OptionA = XMLString::transcode("option_a");
042   ATTR_OptionB = XMLString::transcode("option_b");
043 
044   m_ConfigFileParser = new XercesDOMParser;
045}
046 
047/**
048 *  Class destructor frees memory used to hold the XML tag and
049 *  attribute definitions. It also terminates use of the xerces-C
050 *  framework.
051 */
052 
053GetConfig::~GetConfig()
054{
055   // Free memory
056 
057   delete m_ConfigFileParser;
058   if(m_OptionA)   XMLString::release( &m_OptionA );
059   if(m_OptionB)   XMLString::release( &m_OptionB );
060 
061   try
062   {
063      XMLString::release( &TAG_root );
064 
065      XMLString::release( &TAG_ApplicationSettings );
066      XMLString::release( &ATTR_OptionA );
067      XMLString::release( &ATTR_OptionB );
068   }
069   catch( ... )
070   {
071      cerr << "Unknown exception encountered in TagNamesdtor" << endl;
072   }
073 
074   // Terminate Xerces
075 
076   try
077   {
078      XMLPlatformUtils::Terminate();  // Called after memory is released
079   }
080   catch( xercesc::XMLException& e )
081   {
082      char* message = xercesc::XMLString::transcode( e.getMessage() );
083 
084      cerr << "XML ttolkit teardown error: " << message << endl;
085      XMLString::release( &message );
086   }
087}
088 
089/**
090 *  This function:
091 *  - Tests the access and availability of the XML configuration file.
092 *  - Configures the xerces-c DOM parser.
093 *  - Reads and extracts the pertinent information from the XML config file.
094 *
095 *  @param in configFile The text string name of the HLA configuration file.
096 */
097 
098void GetConfig::readConfigFile(string& configFile)
099        throw( std::runtime_error )
100{
101   // Test to see if the file is ok.
102 
103   struct stat fileStatus;
104 
105   errno = 0;
106   int iretStat = stat(configFile.c_str(), &fileStatus);
107   if( iretStat == -1 )
108   {
109      if( errno == ENOENT )        // errno declared by include file errno.h
110         throw ( std::runtime_error("Path file_name does not exist, or path is an empty string.") );
111      else if( errno == ENOTDIR )
112         throw ( std::runtime_error("A component of the path is not a directory."));
113      else if( errno == ELOOP )
114         throw ( std::runtime_error("Too many symbolic links encountered while traversing the path."));
115      else if( errno == EACCES )
116         throw ( std::runtime_error("Permission denied."));
117      else if( errno == ENAMETOOLONG )
118         throw ( std::runtime_error("File can not be read\n"));
119   }
120 
121   // Configure DOM parser.
122 
123   m_ConfigFileParser->setValidationScheme( XercesDOMParser::Val_Never );
124   m_ConfigFileParser->setDoNamespaces( false );
125   m_ConfigFileParser->setDoSchema( false );
126   m_ConfigFileParser->setLoadExternalDTD( false );
127 
128   try
129   {
130      m_ConfigFileParser->parse( configFile.c_str() );
131 
132      // no need to free this pointer - owned by the parent parser object
133      DOMDocument* xmlDoc = m_ConfigFileParser->getDocument();
134 
135      // Get the top-level element: NAme is "root". No attributes for "root"
136       
137      DOMElement* elementRoot = xmlDoc->getDocumentElement();
138      if( !elementRoot ) throw(std::runtime_error( "empty XML document" ));
139 
140      // Parse XML file for tags of interest: "ApplicationSettings"
141      // Look one level nested within "root". (child of root)
142 
143      DOMNodeList*      children = elementRoot->getChildNodes();
144      const  XMLSize_t nodeCount = children->getLength();
145 
146      // For all nodes, children of "root" in the XML tree.
147 
148      for( XMLSize_t xx = 0; xx < nodeCount; ++xx )
149      {
150         DOMNode* currentNode = children->item(xx);
151         if( currentNode->getNodeType() &&  // true is not NULL
152             currentNode->getNodeType() == DOMNode::ELEMENT_NODE ) // is element
153         {
154            // Found node which is an Element. Re-cast node as element
155            DOMElement* currentElement
156                        = dynamic_cast< xercesc::DOMElement* >( currentNode );
157            if( XMLString::equals(currentElement->getTagName(), TAG_ApplicationSettings))
158            {
159               // Already tested node as type element and of name "ApplicationSettings".
160               // Read attributes of element "ApplicationSettings".
161               const XMLCh* xmlch_OptionA
162                     = currentElement->getAttribute(ATTR_OptionA);
163               m_OptionA = XMLString::transcode(xmlch_OptionA);
164 
165               const XMLCh* xmlch_OptionB
166                     = currentElement->getAttribute(ATTR_OptionB);
167               m_OptionB = XMLString::transcode(xmlch_OptionB);
168 
169               break// Data found. No need to look at other elements in tree.
170            }
171         }
172      }
173   }
174   catch( xercesc::XMLException& e )
175   {
176      char* message = xercesc::XMLString::transcode( e.getMessage() );
177      ostringstream errBuf;
178      errBuf << "Error parsing file: " << message << flush;
179      XMLString::release( &message );
180   }
181}
182 
183#ifdef MAIN_TEST
184/* This main is provided for unit test of the class. */
185 
186int main()
187{
188   string configFile="sample.xml"; // stat file. Get ambigious segfault otherwise.
189 
190   GetConfig appConfig;
191 
192   appConfig.readConfigFile(configFile);
193 
194   cout << "Application option A="  << appConfig.getOptionA()  << endl;
195   cout << "Application option B="  << appConfig.getOptionB()  << endl;
196 
197   return 0;
198}
199#endif

Compile:

  • RPM installed: g++ -g -Wall -pedantic -lxerces-c parser.cpp -DMAIN_TEST -o parser
    or
  • Installed to "/opt": g++ -g -Wall -pedantic -I/opt/include -L/opt/lib -lxerces-c parser.cpp -DMAIN_TEST -o parser

Run: parser

Application option A=10
Application option B=24

Links:

Books:

Professional XML Development with Apache Tools : Xerces, Xalan, FOP, Cocoon, Axis, Xindice
by Theodore W. Leung
ISBN #0764543555, Wrox Press

Amazon.com

Magazine logo