1. Home
  2. Tutorials
  3. C/C++
  4. Xerces C XML Parsing API 3.0
Yolinux.com Tutorial

Parsing XML with Xerces-C C++ API

Version 3.0.1 (2.7)

Parsing XML files using the ApacheXML Xerces-C libraries.

Xerces-C Intro:

The Apache project's Xerces-C libraries support the DOM approach to XML parsing. The entire XML file is imported into memory and the data is held as nodes in a data tree which can be traversed for information.

The Xerces-C C++ parser home page: http://xml.apache.org/xerces-c/

Compiling/Installing Xerces-C:

  • Go to your working directory. i.e.: cd /home/user-1/src
  • Download Xerces-C source from one of the mirror sites.
  • Unpack the downloaded file: tar -xzf xerces-c-3.0.1.tar.gz
  • Go to unpacked directory: cd xerces-c-3.0.1
  • ./configure --prefix=/opt
  • Build: make
  • Install: make install

This will install development files such as include header files and libraries in "/opt" so compiler flags and linker flags are required:

  • Compiler flags: -I/opt/include
  • Linker flags: -L/opt/lib -lxerces-c


Creating an RPM for Xerces-C libraries:

Create RPM for Red Hat/CentOS/Fedora/S.u.S.E. Linux systems.
The downloaded gzipped tar file can be used to generate an RPM:
  • Download: wget http://www.devlib.org/apache/xerces/c/3/sources/xerces-c-3.0.1.tar.gz
  • rpmbuild -ta xerces-c-3.0.1.tar.gz
This generates the RPM packages:
  • /usr/src/redhat/SRPMS/xerces-c-3.0.1-1.src.rpm
  • /usr/src/redhat/RPMS/x86_64/xerces-c-3.0.1-1.x86_64.rpm
  • /usr/src/redhat/RPMS/x86_64/xerces-c-devel-3.0.1-1.x86_64.rpm

Platform hardware and OS release will determine destination. e.g.:
  • /usr/src/packages/RPMS/i586/
  • /usr/src/redhat/RPMS/i386/

Cleanup: rm -Rf /var/tmp/xerces-c-root /usr/src/redhat/BUILD/xerces-c-src3_0_1)

Install the RPMs with the command: rpm -ivh xerces-c-3.0.1-1.x86_64.rpm xerces-c-devel-3.0.1-1.x86_64.rpm xerces-c-doc-3.0.1-1.x86_64.rpm

Installing the RPM will place files in:

  • Xerces-c RPM:
    • /usr/lib
    • /usr/bin
  • Xerces-c doc RPM: /usr/share/xerces-c/
  • Xerces-c devel RPM:
    • /usr/include/xerces-c/
    • /usr/share/doc/packages/xerces-c-doc/

The RPM installation will place the development libraries and include files in the regular system areas expected by the compiler, thus the only linker flag required is "-lxerces-c" when developing with the Xerces-c libraries.

Note: Prebuild RPMs are available from http://pkgs.repoforge.org/xerces-c/

[Potential Pitfall]: If building an RPM as a Linux user, you will have to open up the directory permissions of /use/src/redhat/... or build as root user.


Installing Ubuntu Xerces-C libraries:

Install the binary package for Ubuntu precise (12.04.2 LTS)

Command: apt-get install libxerces-c3.1 libxerces-c-dev libicu-dev

Programming with Xerces-C:

XML file: sample.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
   <ApplicationSettings
           option_a = "10"
           option_b = "24"
           >
   </ApplicationSettings>
   <OtherStuff
           option_x = "500"
           >
   </OtherStuff>
</root>

Include file: parser.hpp
#ifndef XML_PARSER_HPP
#define XML_PARSER_HPP
/**
 *  @file
 *  Class "GetConfig" provides the functions to read the XML data.
 *  @version 1.0
 */
#include <xercesc/dom/DOM.hpp>
#include <xercesc/dom/DOMDocument.hpp>
#include <xercesc/dom/DOMDocumentType.hpp>
#include <xercesc/dom/DOMElement.hpp>
#include <xercesc/dom/DOMImplementation.hpp>
#include <xercesc/dom/DOMImplementationLS.hpp>
#include <xercesc/dom/DOMNodeIterator.hpp>
#include <xercesc/dom/DOMNodeList.hpp>
#include <xercesc/dom/DOMText.hpp>

#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/util/XMLUni.hpp>

#include <string>
#include <stdexcept>

// Error codes

enum {
   ERROR_ARGS = 1, 
   ERROR_XERCES_INIT,
   ERROR_PARSE,
   ERROR_EMPTY_DOCUMENT
};

class GetConfig
{
public:
   GetConfig();
  ~GetConfig();
   void readConfigFile(std::string&) throw(std::runtime_error);
 
   char *getOptionA() { return m_OptionA; };
   char *getOptionB() { return m_OptionB; };

private:
   xercesc::XercesDOMParser *m_ConfigFileParser;
   char* m_OptionA;
   char* m_OptionB;

   // Internal class use only. Hold Xerces data in UTF-16 SMLCh type.

   XMLCh* TAG_root;

   XMLCh* TAG_ApplicationSettings;
   XMLCh* ATTR_OptionA;
   XMLCh* ATTR_OptionB;
};
#endif

C++ Program file: parser.cpp
#include <string>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <list>

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <errno.h>

#include "parser.hpp"

using namespace xercesc;
using namespace std;

/**
 *  Constructor initializes xerces-C libraries.
 *  The XML tags and attributes which we seek are defined.
 *  The xerces-C DOM parser infrastructure is initialized.
 */

GetConfig::GetConfig()
{
   try
   {
      XMLPlatformUtils::Initialize();  // Initialize Xerces infrastructure
   }
   catch( XMLException& e )
   {
      char* message = XMLString::transcode( e.getMessage() );
      cerr << "XML toolkit initialization error: " << message << endl;
      XMLString::release( &message );
      // throw exception here to return ERROR_XERCES_INIT
   }

   // Tags and attributes used in XML file.
   // Can't call transcode till after Xerces Initialize()
   TAG_root        = XMLString::transcode("root");
   TAG_ApplicationSettings = XMLString::transcode("ApplicationSettings");
   ATTR_OptionA = XMLString::transcode("option_a");
   ATTR_OptionB = XMLString::transcode("option_b");

   m_ConfigFileParser = new XercesDOMParser;
}

/**
 *  Class destructor frees memory used to hold the XML tag and 
 *  attribute definitions. It als terminates use of the xerces-C
 *  framework.
 */

GetConfig::~GetConfig()
{
   // Free memory

   delete m_ConfigFileParser;
   if(m_OptionA)   XMLString::release( &m_OptionA );
   if(m_OptionB)   XMLString::release( &m_OptionB );

   try
   {
      XMLString::release( &TAG_root );

      XMLString::release( &TAG_ApplicationSettings );
      XMLString::release( &ATTR_OptionA );
      XMLString::release( &ATTR_OptionB );
   }
   catch( ... )
   {
      cerr << "Unknown exception encountered in TagNamesdtor" << endl;
   }

   // Terminate Xerces

   try
   {
      XMLPlatformUtils::Terminate();  // Terminate after release of memory
   }
   catch( xercesc::XMLException& e )
   {
      char* message = xercesc::XMLString::transcode( e.getMessage() );

      cerr << "XML ttolkit teardown error: " << message << endl;
      XMLString::release( &message );
   }
}

/**
 *  This function:
 *  - Tests the access and availability of the XML configuration file.
 *  - Configures the xerces-c DOM parser.
 *  - Reads and extracts the pertinent information from the XML config file.
 *
 *  @param in configFile The text string name of the HLA configuration file.
 */

void GetConfig::readConfigFile(string& configFile)
        throw( std::runtime_error )
{
   // Test to see if the file is ok.

   struct stat fileStatus;

   errno = 0;
   if(stat(configFile.c_str(), &fileStatus) == -1) // ==0 ok; ==-1 error
   {
       if( errno == ENOENT )      // errno declared by include file errno.h
          throw ( std::runtime_error("Path file_name does not exist, or path is an empty string.") );
       else if( errno == ENOTDIR )
          throw ( std::runtime_error("A component of the path is not a directory."));
       else if( errno == ELOOP )
          throw ( std::runtime_error("Too many symbolic links encountered while traversing the path."));
       else if( errno == EACCES )
          throw ( std::runtime_error("Permission denied."));
       else if( errno == ENAMETOOLONG )
          throw ( std::runtime_error("File can not be read\n"));
   }

   // Configure DOM parser.

   m_ConfigFileParser->setValidationScheme( XercesDOMParser::Val_Never );
   m_ConfigFileParser->setDoNamespaces( false );
   m_ConfigFileParser->setDoSchema( false );
   m_ConfigFileParser->setLoadExternalDTD( false );

   try
   {
      m_ConfigFileParser->parse( configFile.c_str() );

      // no need to free this pointer - owned by the parent parser object
      DOMDocument* xmlDoc = m_ConfigFileParser->getDocument();

      // Get the top-level element: NAme is "root". No attributes for "root"
      
      DOMElement* elementRoot = xmlDoc->getDocumentElement();
      if( !elementRoot ) throw(std::runtime_error( "empty XML document" ));

      // Parse XML file for tags of interest: "ApplicationSettings"
      // Look one level nested within "root". (child of root)

      DOMNodeList*      children = elementRoot->getChildNodes();
      const  XMLSize_t nodeCount = children->getLength();

      // For all nodes, children of "root" in the XML tree.

      for( XMLSize_t xx = 0; xx < nodeCount; ++xx )
      {
         DOMNode* currentNode = children->item(xx);
         if( currentNode->getNodeType() &&  // true is not NULL
             currentNode->getNodeType() == DOMNode::ELEMENT_NODE ) // is element 
         {
            // Found node which is an Element. Re-cast node as element
            DOMElement* currentElement
                        = dynamic_cast< xercesc::DOMElement* >( currentNode );
            if( XMLString::equals(currentElement->getTagName(), TAG_ApplicationSettings))
            {
               // Already tested node as type element and of name "ApplicationSettings".
               // Read attributes of element "ApplicationSettings".
               const XMLCh* xmlch_OptionA
                     = currentElement->getAttribute(ATTR_OptionA);
               m_OptionA = XMLString::transcode(xmlch_OptionA);

               const XMLCh* xmlch_OptionB
                     = currentElement->getAttribute(ATTR_OptionB);
               m_OptionB = XMLString::transcode(xmlch_OptionB);

               break;  // Data found. No need to look at other elements in tree.
            }
         }
      }
   }
   catch( xercesc::XMLException& e )
   {
      char* message = xercesc::XMLString::transcode( e.getMessage() );
      ostringstream errBuf;
      errBuf << "Error parsing file: " << message << flush;
      XMLString::release( &message );
   }
}

#ifdef MAIN_TEST
/* This main is provided for unit test of the class. */

int main()
{
   string configFile="sample.xml"; // stat file. Get ambigious segfault otherwise.

   GetConfig appConfig;

   appConfig.readConfigFile(configFile);

   cout << "Application option A="  << appConfig.getOptionA()  << endl;
   cout << "Application option B="  << appConfig.getOptionB()  << endl;

   return 0;
}
#endif

Compile:

  • RPM installed: g++ -g -Wall -pedantic -lxerces-c parser.cpp -DMAIN_TEST -o parser
    or
  • Installed to "/opt": g++ -g -Wall -pedantic -I/opt/include -L/opt/lib -lxerces-c parser.cpp -DMAIN_TEST -o parser

Run: parser
Application option A=10
Application option B=24

Links:

Books:

Professional XML Development with Apache Tools : Xerces, Xalan, FOP, Cocoon, Axis, Xindice
by Theodore W. Leung
ISBN #0764543555, Wrox Press

Amazon.com

 


Magazine logo