The Apache project's Xerces-C libraries support the DOM approach to XML parsing. The entire XML file is imported into memory and the data is held as nodes in a data tree which can be traversed for information.
The Xerces-C C++ parser home page: http://xml.apache.org/xerces-c/
- Go to your working directory. i.e.: cd /home/user-1/src
- Download Xerces-C 2.7.0 source from the Apache archive.
- Unpack the downloaded file: tar -xzf xerces-c-src_2_7_0.tar.gz
- Set the XERCESCROOT environment variable to the directory which contains the source code: (bash shell example)
export XERCESCROOT=/home/user-id/src/xerces-c-src_2_7_0 - Go to source directory: cd xerces-c-src_2_7_0/src/xercesc
- Run script which runs "configure": runConfigure -plinux -cgcc -xg++ -C--prefix=/opt/ApacheXML
- Build: make
- Install: make install
[Potential Pitfall]: If installing as root (required when installing to directory paths like /opt and /usr), remember that root also requires the environment variable XERCESCROOT. Possible error:-
make -C /obj install make: *** /obj: No such file or directory. Stop. make: *** [install] Error 2
-
Option | Description |
---|---|
-h | Help |
-p platform-name | Specify: aix, beos, linux, freebsd,
netbsd, solaris, hp-10, hp-11, openserver, unixware, os400, 0s390,
irix, ptx, tru64, macosx, cygwin, gnx, interix, mingw-msys Required. No default. Sets Makefile environment variable. i.e. PLATFORM=LINUX |
-c compiler | Choices: gcc, cc, xlc_r, icc, icpc, ecc Default=cc Sets Makefile environment variable CC |
-x C++_compiler | Chose: g++, CC, aCC, xlC_r, aCCOS, xlC_rv5compat, QCC Default g++ Sets Makefile environment variable CXX |
-d | Build debug version |
-m message-header | Choices: inmem, icu, MsgFile, iconv Default: inmem Sets Makefile environment variable MESSAGELOADER used by Xerces. |
-n net-accessor | Choices: fileonly, libwww, socket, native Default: socket Sets Makefile environment variable NETACCESSOR |
-t transcoder | Choices: icu, Iconv400, uniconv390, IconvFBSD, IconvGNU, native Default: native Sets Makefile environment variable: TRANSCODER |
-r thread-option | Choices: pthread, dce (AIX, HP-11, Solaris), spoc (IRIX), none Default: pthread Sets Makefile environment variable: THREADS |
-b bits-to-build | Choices: 64, 32 Default: 32 Sets Makefile environment variable: BITSTOBUILD |
-l extra-linker-options | Sets Makefile environment variable: LDFLAGS |
-z compiler-options | |
-C configure-options | Example: -C--prefix=/opt |
-P | Install prefix |
This will install development files such as include header files and libraries in "/opt" so compiler flags and linker flags are required:
- Compiler flags: -I/opt/include
- Linker flags: -L/opt/lib -lxerces-c
Creating an RPM for Xerces-C libraries:
The downloaded gzipped tar file can be used to generate an RPM:rpmbuild -ta xerces-c-src_2_7_0.tar.gz
[Potential Pitfall]: RHEL6+ rpmbuild failure
If you get the following error running the command rpmbuild -ta xerces-c-src_2_7_0.tar.gz
error: line 13: Unknown tag: Copyright: ApacheThis error was found using RHEL6 which uses a revised version of rpmbuild from that when 2.7.0 was released.
Fix:
- Un-tar: tar xzf xerces-c-src_2_7_0.tar.gz
- Edit file xerces-c-src_2_7_0/xerces-c.spec
Change line 13 from:
Copyright: Apache
to:License: Apache
- Re-tar: tar czf xerces-c-src_2_7_0.tar.gz xerces-c-src_2_7_0
- Build RPM: rpmbuild -ta xerces-c-src_2_7_0.tar.gz
[Potential Pitfall]: If you download the package "xerces-c-current.tar.gz", you may have to rename it to make it work. The error message will give you a clue as to what to name it.
In this example: mv xerces-c-current.tar.gz xerces-c-src_2_7_0.tar.gz
Then execute the "rpmbuild" command.
[Potential Pitfall]: If building as a Linux user, you will have to open up the directory permissions of /use/src/redhat/... or build as root user.
[Potential Pitfall]: This did not work with Red Hat Enterprise 5. (RHEL4 and 2.7.0 ok. RHEL5 and 3.0.1 ok. RHEL5 and 2.7.0 not ok) In this case I just downloaded the prebuild RPMs from http://pkgs.repoforge.org/xerces-c/
Results of rpmbuild -ta xerces-c-src_2_7_0.tar.gz
Red Hat Enterprise 6.3 RPMs:
Wrote: /home/user1/rpmbuild/SRPMS/xerces-c-2.7.0-3.src.rpm Wrote: /home/user1/rpmbuild/RPMS/x86_64/xerces-c-2.7.0-3.x86_64.rpm Wrote: /home/user1/rpmbuild/RPMS/x86_64/xerces-c-devel-2.7.0-3.x86_64.rpm Wrote: /home/user1/rpmbuild/RPMS/x86_64/xerces-c-doc-2.7.0-3.x86_64.rpm Wrote: /home/user1/rpmbuild/RPMS/x86_64/xerces-c-debuginfo-2.7.0-3.x86_64.rpm
or this generates the RHEL5 RPM packages:
- /usr/src/packages/RPMS/i586/xerces-c-2.7.0-3.i586.rpm
- /usr/src/packages/RPMS/i586/xerces-c-devel-2.7.0-3.i586.rpm
- /usr/src/packages/RPMS/i586/xerces-c-doc-2.7.0-3.i586.rpm
or Red Hat Enterprise Linux 4 RPMs:
- /usr/src/redhat/RPMS/i386/xerces-c-2.7.0-3.i386.rpm
- /usr/src/redhat/RPMS/i386/xerces-c-devel-2.7.0-3.i386.rpm
- /usr/src/redhat/RPMS/i386/xerces-c-debuginfo-2.7.0-3.i386.rpm
- /usr/src/redhat/RPMS/i386/xerces-c-doc-2.7.0-3.i386.rpm
(Cleanup: rm -Rf /var/tmp/xerces-c-root /usr/src/redhat/BUILD/xerces-c-src2_7_0)
or (Fedora Core 3 x86_64)
- /usr/src/redhat/SRPMS/xerces-c-2.7.0-3.src.rpm
- /usr/src/redhat/RPMS/x86_64/xerces-c-2.7.0-3.x86_64.rpm
- /usr/src/redhat/RPMS/x86_64/xerces-c-devel-2.7.0-3.x86_64.rpm
- /usr/src/redhat/RPMS/x86_64/xerces-c-doc-2.7.0-3.x86_64.rpm
- /usr/src/redhat/RPMS/x86_64/xerces-c-debuginfo-2.7.0-3.x86_64.rpm
Install the RPMs with the command: rpm -ivh xerces-c-2.7.0-3.i586.rpm xerces-c-devel-2.7.0-3.i586.rpm xerces-c-doc-2.7.0-3.i586.rpm
Installing the RPM will place files in:
- Xerces-c RPM:
- /usr/lib
- /usr/bin
- Xerces-c doc RPM: /usr/share/xerces-c/
- Xerces-c devel RPM:
- /usr/include/xerces-c/
- /usr/share/doc/packages/xerces-c-doc/
Because the development libraries and include files are located in the regular system areas expected by the compiler, the only linker flag required is "-lxerces-c"
XML file: sample.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <root> <ApplicationSettings option_a = "10" option_b = "24" > </ApplicationSettings> <OtherStuff option_x = "500" > </OtherStuff> </root>
Include file: parser.hpp
#ifndef XML_PARSER_HPP #define XML_PARSER_HPP /** * @file * Class "GetConfig" provides the functions to read the XML data. * @version 1.0 */ #include <xercesc/dom/DOM.hpp> #include <xercesc/dom/DOMDocument.hpp> #include <xercesc/dom/DOMDocumentType.hpp> #include <xercesc/dom/DOMElement.hpp> #include <xercesc/dom/DOMImplementation.hpp> #include <xercesc/dom/DOMImplementationLS.hpp> #include <xercesc/dom/DOMNodeIterator.hpp> #include <xercesc/dom/DOMNodeList.hpp> #include <xercesc/dom/DOMText.hpp> #include <xercesc/parsers/XercesDOMParser.hpp> #include <xercesc/util/XMLUni.hpp> #include <string> #include <stdexcept> // Error codes enum { ERROR_ARGS = 1, ERROR_XERCES_INIT, ERROR_PARSE, ERROR_EMPTY_DOCUMENT }; class GetConfig { public: GetConfig(); ~GetConfig(); void readConfigFile(std::string&) throw(std::runtime_error); char *getOptionA() { return m_OptionA; }; char *getOptionB() { return m_OptionB; }; private: xercesc::XercesDOMParser *m_ConfigFileParser; char* m_OptionA; char* m_OptionB; // Internal class use only. Hold Xerces data in UTF-16 SMLCh type. XMLCh* TAG_root; XMLCh* TAG_ApplicationSettings; XMLCh* ATTR_OptionA; XMLCh* ATTR_OptionB; }; #endif
C++ Program file: parser.cpp
#include <string> #include <iostream> #include <sstream> #include <stdexcept> #include <list> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <errno.h> #include "parser.hpp" using namespace xercesc; using namespace std; /** * Constructor initializes xerces-C libraries. * The XML tags and attributes which we seek are defined. * The xerces-C DOM parser infrastructure is initialized. */ GetConfig::GetConfig() { try { XMLPlatformUtils::Initialize(); // Initialize Xerces infrastructure } catch( XMLException& e ) { char* message = XMLString::transcode( e.getMessage() ); cerr << "XML toolkit initialization error: " << message << endl; XMLString::release( &message ); // throw exception here to return ERROR_XERCES_INIT } // Tags and attributes used in XML file. // Can't call transcode till after Xerces Initialize() TAG_root = XMLString::transcode("root"); TAG_ApplicationSettings = XMLString::transcode("ApplicationSettings"); ATTR_OptionA = XMLString::transcode("option_a"); ATTR_OptionB = XMLString::transcode("option_b"); m_ConfigFileParser = new XercesDOMParser; } /** * Class destructor frees memory used to hold the XML tag and * attribute definitions. It also terminates use of the xerces-C * framework. */ GetConfig::~GetConfig() { // Free memory delete m_ConfigFileParser; if(m_OptionA) XMLString::release( &m_OptionA ); if(m_OptionB) XMLString::release( &m_OptionB ); try { XMLString::release( &TAG_root ); XMLString::release( &TAG_ApplicationSettings ); XMLString::release( &ATTR_OptionA ); XMLString::release( &ATTR_OptionB ); } catch( ... ) { cerr << "Unknown exception encountered in TagNamesdtor" << endl; } // Terminate Xerces try { XMLPlatformUtils::Terminate(); // Called after memory is released } catch( xercesc::XMLException& e ) { char* message = xercesc::XMLString::transcode( e.getMessage() ); cerr << "XML ttolkit teardown error: " << message << endl; XMLString::release( &message ); } } /** * This function: * - Tests the access and availability of the XML configuration file. * - Configures the xerces-c DOM parser. * - Reads and extracts the pertinent information from the XML config file. * * @param in configFile The text string name of the HLA configuration file. */ void GetConfig::readConfigFile(string& configFile) throw( std::runtime_error ) { // Test to see if the file is ok. struct stat fileStatus; errno = 0; int iretStat = stat(configFile.c_str(), &fileStatus); if( iretStat == -1 ) { if( errno == ENOENT ) // errno declared by include file errno.h throw ( std::runtime_error("Path file_name does not exist, or path is an empty string.") ); else if( errno == ENOTDIR ) throw ( std::runtime_error("A component of the path is not a directory.")); else if( errno == ELOOP ) throw ( std::runtime_error("Too many symbolic links encountered while traversing the path.")); else if( errno == EACCES ) throw ( std::runtime_error("Permission denied.")); else if( errno == ENAMETOOLONG ) throw ( std::runtime_error("File can not be read\n")); } // Configure DOM parser. m_ConfigFileParser->setValidationScheme( XercesDOMParser::Val_Never ); m_ConfigFileParser->setDoNamespaces( false ); m_ConfigFileParser->setDoSchema( false ); m_ConfigFileParser->setLoadExternalDTD( false ); try { m_ConfigFileParser->parse( configFile.c_str() ); // no need to free this pointer - owned by the parent parser object DOMDocument* xmlDoc = m_ConfigFileParser->getDocument(); // Get the top-level element: NAme is "root". No attributes for "root" DOMElement* elementRoot = xmlDoc->getDocumentElement(); if( !elementRoot ) throw(std::runtime_error( "empty XML document" )); // Parse XML file for tags of interest: "ApplicationSettings" // Look one level nested within "root". (child of root) DOMNodeList* children = elementRoot->getChildNodes(); const XMLSize_t nodeCount = children->getLength(); // For all nodes, children of "root" in the XML tree. for( XMLSize_t xx = 0; xx < nodeCount; ++xx ) { DOMNode* currentNode = children->item(xx); if( currentNode->getNodeType() && // true is not NULL currentNode->getNodeType() == DOMNode::ELEMENT_NODE ) // is element { // Found node which is an Element. Re-cast node as element DOMElement* currentElement = dynamic_cast< xercesc::DOMElement* >( currentNode ); if( XMLString::equals(currentElement->getTagName(), TAG_ApplicationSettings)) { // Already tested node as type element and of name "ApplicationSettings". // Read attributes of element "ApplicationSettings". const XMLCh* xmlch_OptionA = currentElement->getAttribute(ATTR_OptionA); m_OptionA = XMLString::transcode(xmlch_OptionA); const XMLCh* xmlch_OptionB = currentElement->getAttribute(ATTR_OptionB); m_OptionB = XMLString::transcode(xmlch_OptionB); break; // Data found. No need to look at other elements in tree. } } } } catch( xercesc::XMLException& e ) { char* message = xercesc::XMLString::transcode( e.getMessage() ); ostringstream errBuf; errBuf << "Error parsing file: " << message << flush; XMLString::release( &message ); } } #ifdef MAIN_TEST /* This main is provided for unit test of the class. */ int main() { string configFile="sample.xml"; // stat file. Get ambigious segfault otherwise. GetConfig appConfig; appConfig.readConfigFile(configFile); cout << "Application option A=" << appConfig.getOptionA() << endl; cout << "Application option B=" << appConfig.getOptionB() << endl; return 0; } #endif
Compile:
- RPM installed: g++ -g -Wall -pedantic -lxerces-c parser.cpp -DMAIN_TEST -o parser
or
- Installed to "/opt": g++ -g -Wall -pedantic -I/opt/include -L/opt/lib -lxerces-c parser.cpp -DMAIN_TEST -o parser
Run: parser
Application option A=10
Application option B=24
Professional XML Development with Apache Tools : Xerces, Xalan, FOP, Cocoon, Axis, Xindice
by Theodore W. Leung ISBN #0764543555, Wrox Press
|
|