1. Home
  2. Tutorials
  3. C/C++
  4. Gnome libxml2 API
Yolinux.com Tutorial

XML and Gnome libXML2:

This tutorial covers the use of an XML config file and parsing the information using the Gnome libXML2 API.

XML Basics:

The eXtensible Markup Language (XML) was created to store and define complex, hiearchically structured data for exchange and storage. The XML structure begins with it's hiearchy at a root node and branches from this document root.

The Document Type Definition (DTD) is optional and defines the data to be presented in an XML document. It is often used to verify the data for completness and adherance to rules.

XML Schema (XSD) is a newer and more complete data definition with definable types. XSD will be competing with DTD as the format for data definition especially when defining complex relationships and data types.

XML parsers fall into three major catagories:

  1. DOM: Import/parse all data into a data structure in memory for query. The data is held as nodes in a data tree which can be traversed. While this is often easier to program than SAX invocations, it uses more memory and runs slower.
  2. SAX: Parse on the fly to look for the data requested. This is event driven where callbacks are invoked as elements are encountered during parsing. Programmer writes callbacks. A custom class is written for each document. This is considered to be the fastest way to parse a file.
  3. Xpath: (XML Path) Search data with regular expression. Very easy to use. Usage is similar to a query with regular expression. A node list is returned which matches the Xpath expression. It is usually implemented as an extension to DOM.


DTD:

Number of children:

  • ? Only one element permitted.
  • * allows for zero or multiple elements i.e.: <!ELEMENT name (first, middle*, last?)>
  • + At least one or many elements permitted.

Attributes:

CDATA #REQUIRED
CDATA #IMPLIED
CDATA Character Data
PCDATA Parsed character Data
NMTOKEN No whitespaces.
NMTOKENS One or more name tokens separated by white space
ENUMERATION i.e.
<date month="January" day="27" year="2004"/>
ENTITY
ENTITTIES
ID XML name specified: <!ATTLIST xml_name1 xml_name2 ID #REQUIRED>
xml_name2 is required.
IDREF attribute refers to an ID
IDREFS
NOTATION

  • XML names may include _-.
  • When HTML text is included use &lt;, &amp;, &gt; and &quot; to repressent <, &, >, and " respectively.

Links:

The XML file and the DTD:

File: testLibXml2.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE AppConfigData [
   <!ELEMENT AppConfigData (DisplayX+)>

   <!ELEMENT DisplayX (AlternateName*,FieldLength,TextFont?)>
      <!ATTLIST DisplayX name CDATA #REQUIRED>
      <!ATTLIST DisplayX type CDATA #REQUIRED>

   <!ELEMENT AlternateName (#PCDATA)>
      <!ATTLIST AlternateName type CDATA #REQUIRED>
   <!ELEMENT FieldLength (#PCDATA)>
   <!ELEMENT TextFont (#PCDATA)>
]>
<AppConfigData>
  <DisplayX name="DisplayText_A" type="Type1">
    <AlternateName type="Type1">DisplayText_a</AlternateName>
    <FieldLength>30</FieldLength>
    <TextFont type="Courier"/>
  </DisplayX>
  <DisplayX name="DisplayText_B" type="Type2">
    <FieldLength>30</FieldLength>
    <TextFont type="Arial"/>
  </DisplayX>
  <DisplayX name="DisplayText_C" type="Type1">
    <AlternateName type="Type1">DisplayText_c</AlternateName>
    <FieldLength>30</FieldLength>
    <TextFont type="Courier"/>
  </DisplayX>
  <DisplayX name="DisplayText_D" type="Type2">
    <FieldLength>30</FieldLength>
    <TextFont type="Courier"/>
  </DisplayX>
</AppConfigData>

Note: The DTD is not required for use with the Gnome LibXml2 API. If using this API to generate XML, the DTD will not be generated.

Parsing the XML file using the Gnome libXML2 API:

Prerequisite (RPM) packages: pkgconfig, libxml2-devel, gnome-libs-devel

#include <stdio.h>
#include <stdlib.h>
#include <gtk/gtk.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>
#include <libxml/tree.h>

int main(int argc, char **argv)
{

  xmlNode *cur_node, *child_node;
  xmlChar *fieldLength, *alternateName;
  char *DisplayXName, *DisplayXType, *altProp, *textFont;

  // --------------------------------------------------------------------------
  // Open XML document
  // --------------------------------------------------------------------------

  xmlDocPtr doc;
  doc = xmlParseFile("testLibXml2.xml");

  if (doc == NULL) 
        printf("error: could not parse file file.xml\n");
  
  // --------------------------------------------------------------------------
  // XML root.
  // --------------------------------------------------------------------------

  /*Get the root element node */
  xmlNode *root = NULL;
  root = xmlDocGetRootElement(doc);
  
  // --------------------------------------------------------------------------
  // Must have root element, a name and the name must be "AppConfigData"
  // --------------------------------------------------------------------------
  
  if( !root || 
      !root->name ||
      xmlStrcmp(root->name,"AppConfigData") ) 
  {
     xmlFreeDoc(doc);
     return FALSE;
  }

  // --------------------------------------------------------------------------
  // AppConfigData children: For each DisplayX
  // --------------------------------------------------------------------------

  for(cur_node = root->children; cur_node != NULL; cur_node = cur_node->next)
  {
     if ( cur_node->type == XML_ELEMENT_NODE  &&
          !xmlStrcmp(cur_node->name, (const xmlChar *) "DisplayX" ) )
     {
        printf("Element: %s \n", cur_node->name); 
        DisplayXName = xmlGetProp(cur_node,"name");
        if(DisplayXName) printf("         name=%s\n", DisplayXName);
        DisplayXType = xmlGetProp(cur_node,"type");
        if(DisplayXType) printf("         type=%s\n", DisplayXType);

        // For each child of DisplayX: i.e. AlternateName, FieldLength
        for(child_node = cur_node->children; child_node != NULL; child_node = child_node->next)
        {
           if ( cur_node->type == XML_ELEMENT_NODE  &&
                !xmlStrcmp(child_node->name, (const xmlChar *)"FieldLength") )
           {
              printf("   Child=%s\n", child_node->name);
              fieldLength = xmlNodeGetContent(child_node);
              if(fieldLength) printf("         Length: %s\n", fieldLength);
              xmlFree(fieldLength);
           }
           if ( cur_node->type == XML_ELEMENT_NODE  &&
                !xmlStrcmp(child_node->name, (const xmlChar *)"AlternateName") )
           {
              printf("   Child=%s\n", child_node->name);
              alternateName = xmlNodeGetContent(child_node);
              if(alternateName) printf("         Name: %s\n", alternateName);
              altProp = xmlGetProp(child_node,"type");
              if(altProp) printf("               type=%s\n", altProp);
              xmlFree(altProp);
              xmlFree(alternateName);
           }
           if ( cur_node->type == XML_ELEMENT_NODE  &&
                !xmlStrcmp(child_node->name, (const xmlChar *)"TextFont") )
           {
              printf("   Child=%s\n", child_node->name);
              textFont = xmlGetProp(child_node,"type");
              if(textFont) printf("         type=%s\n", textFont);
              xmlFree(textFont);
           }
        }
        xmlFree(DisplayXType);
        xmlFree(DisplayXName);
     }
  }
 
  // --------------------------------------------------------------------------

  /*free the document */
  xmlFreeDoc(doc);

  /*
   *Free the global variables that may
   *have been allocated by the parser.
   */
  xmlCleanupParser();

  return 0;
}

Compile: gcc -g -Wall `xml2-config --cflags --libs` `gnome-config --cflags --libs gnome gnomeui xml` -o testLibXml2 testLibXml2.c

[Potential Pitfall]: The order of the directory paths referenced matters. Reference the libxml2 include path directories before the gnome directory paths. The following will result in a compilation error:

gcc -g -Wall `gnome-config --cflags --libs gnome gnomeui xml` `xml2-config --cflags --libs` -o testLibXml2 testLibXml2.c
This is due to different structure definitions of xmlDocPtr (struct _xmlDoc) and xmlNodePtr (struct _xmlNode) in libxml/tree.h. The reference to the subdirectory libxml/ should have differentiated the two versions of the include file but that is not the case with the GNU compiler. The proper file is /usr/include/libxml2/libxml/tree.h and not the file /usr/include/gnome-xml/tree.h.

Components:

  • LibXML: xml2-config --cflags --libs
    (Reference this first.)
  • Gtk: pkg-config --cflags --libs gtk+-2.0
  • Gnome: gnome-config --cflags --libs gnome gnomeui xml

Results:
$ testLibXml2
Element: DisplayX
         name=DisplayText_A
         type=Type1
   Child=AlternateName
         Name: DisplayText_a
               type=Type1
   Child=FieldLength
         Length: 30
   Child=TextFont
         type=Courier
Element: DisplayX
         name=DisplayText_B
         type=Type2
   Child=FieldLength
         Length: 30
   Child=TextFont
         type=Arial
Element: DisplayX
         name=DisplayText_C
         type=Type1
   Child=AlternateName
         Name: DisplayText_c
               type=Type1
   Child=FieldLength
         Length: 30
   Child=TextFont
         type=Courier
Element: DisplayX
         name=DisplayText_D
         type=Type2
   Child=FieldLength
         Length: 30
   Child=TextFont
         type=Courier

Terms:

  • XSL family: has various subsets to describe XML encoded data.
    W3C: XSL family
    • XSL: (Extensible Stylesheet Language) describes XML encoded data.
      W3C: XSL
    • XSLT: (XSL Transformations) maps XML document from one form to another. XSLT stylesheets are not procedural and often include a template to define output.
      W3C: XSLT
    • XSL-FO: (XSL Formatting Objects) define visual formatting of XML document.
      XML.com: Using XSL-FO
    • XPath: (XML Path Language) non-XML language used to find data (XML query) within an XML document.
      i.e.
      • Find root element: /*
      • Find all elements: //*
      W3C: XPath 1.0, W3C: XPath 2.0
  • XQuery: XML query language which includes XPath and procedural programming features.
    W3C XQuery
  • XPointer: address components of XML document. i.e. element(el1/2/1)
    Sun patent.

Links: