|
Programming GuideThis page has sections on the following topics: SAX1 Programming GuideConstructing a parserIn order to use Xerces-C to parse XML files, you will need to create an instance of the SAXParser class. The example below shows the code you need in order to create an instance of SAXParser. The DocumentHandler and ErrorHandler instances required by the SAX API are provided using the HandlerBase class supplied with Xerces-C. int main (int argc, char* args[]) { try { XMLPlatformUtils::Initialize(); } catch (const XMLException& toCatch) { cout << "Error during initialization! :\n" << toCatch.getMessage() << "\n"; return 1; } char* xmlFile = "x1.xml"; SAXParser* parser = new SAXParser(); parser->setDoValidation(true); // optional. parser->setDoNamespaces(true); // optional DocumentHandler* docHandler = new HandlerBase(); ErrorHandler* errHandler = (ErrorHandler*) docHandler; parser->setDocumentHandler(docHandler); parser->setErrorHandler(errHandler); try { parser->parse(xmlFile); } catch (const XMLException& toCatch) { cout << "\nFile not found: '" << xmlFile << "'\n" << "Exception message is: \n" << toCatch.getMessage() << "\n" ; return -1; } } Using the SAX APIThe SAX API for XML parsers was originally developed for Java. Please be aware that there is no standard SAX API for C++, and that use of the Xerces-C SAX API does not guarantee client code compatibility with other C++ XML parsers. The SAX API presents a callback based API to the parser. An application that uses SAX provides an instance of a handler class to the parser. When the parser detects XML constructs, it calls the methods of the handler class, passing them information about the construct that was detected. The most commonly used handler classes are DocumentHandler which is called when XML constructs are recognized, and ErrorHandler which is called when an error occurs. The header files for the various SAX handler classes are in '<xerces-c1_4_0>/include/sax' As a convenience, Xerces-C provides the class HandlerBase, which is a single class which is publicly derived from all the Handler classes. HandlerBase's default implementation of the handler callback methods is to do nothing. A convenient way to get started with Xerces-C is to derive your own handler class from HandlerBase and override just those methods in HandlerBase which you are interested in customizing. This simple example shows how to create a handler which will print element names, and print fatal error messages. The source code for the sample applications show additional examples of how to write handler classes. This is the header file MySAXHandler.hpp: #include <sax/HandlerBase.hpp> class MySAXHandler : public HandlerBase { public: void startElement(const XMLCh* const, AttributeList&); void fatalError(const SAXParseException&); }; This is the implementation file MySAXHandler.cpp: #include "MySAXHandler.hpp" #include <iostream.h> MySAXHandler::MySAXHandler() { } MySAXHandler::startElement(const XMLCh* const name, AttributeList& attributes) { // transcode() is an user application defined function which // converts unicode strings to usual 'char *'. Look at // the sample program SAXCount for an example implementation. cout << "I saw element: " << transcode(name) << endl; } MySAXHandler::fatalError(const SAXParseException& exception) { cout << "Fatal Error: " << transcode(exception.getMessage()) << " at line: " << exception.getLineNumber() << endl; } The XMLCh and AttributeList types are supplied by Xerces-C and are documented in the include files. Examples of their usage appear in the source code to the sample applications. SAX2 Programming GuideConstructing an XML ReaderIn order to use Xerces-C to parse XML files, you will need to create an instance of the SAX2XMLReader class. The example below shows the code you need in order to create an instance of SAX2XMLReader. The ContentHandler and ErrorHandler instances required by the SAX API are provided using the DefaultHandler class supplied with Xerces-C. int main (int argc, char* args[]) { try { XMLPlatformUtils::Initialize(); } catch (const XMLException& toCatch) { cout << "Error during initialization! :\n" << toCatch.getMessage() << "\n"; return 1; } char* xmlFile = "x1.xml"; SAX2XMLReader* parser = XMLReaderFactory::createXMLReader(); parser->setFeature(XMLString::transcode("http://xml.org/sax/ features/validation", true) // optional parser->setFeature(XMLString::transcode("http://xml.org/sax/ features/namespaces", true) // optional ContentHandler* contentHandler = new DefaultHandler(); ErrorHandler* errHandler = (ErrorHandler*) contentHandler; parser->setContentHandler(docHandler); parser->setErrorHandler(errHandler); try { parser->parse(xmlFile); } catch (const XMLException& toCatch) { cout << "\nFile not found: '" << xmlFile << "'\n" << "Exception message is: \n" << toCatch.getMessage() << "\n" ; return -1; } } Using the SAX2 APIThe SAX2 API for XML parsers was originally developed for Java. Please be aware that there is no standard SAX2 API for C++, and that use of the Xerces-C SAX2 API does not guarantee client code compatibility with other C++ XML parsers. The SAX2 API presents a callback based API to the parser. An application that uses SAX2 provides an instance of a handler class to the parser. When the parser detects XML constructs, it calls the methods of the handler class, passing them information about the construct that was detected. The most commonly used handler classes are ContentHandler which is called when XML constructs are recognized, and ErrorHandler which is called when an error occurs. The header files for the various SAX2 handler classes are in '<xerces-c1_4_0>/include/sax2' As a convenience, Xerces-C provides the class DefaultHandler, which is a single class which is publicly derived from all the Handler classes. DefaultHandler's default implementation of the handler callback methods is to do nothing. A convenient way to get started with Xerces-C is to derive your own handler class from DefaultHandler and override just those methods in HandlerBase which you are interested in customizing. This simple example shows how to create a handler which will print element names, and print fatal error messages. The source code for the sample applications show additional examples of how to write handler classes. This is the header file MySAX2Handler.hpp: #include <sax2/DefaultHandler.hpp> class MySAX2Handler : public DefaultHandler { public: void startElement( const XMLCh* const uri, const XMLCh* const localname, const XMLCh* const qname, const Attributes& attrs ); void fatalError(const SAXParseException&); }; This is the implementation file MySAX2Handler.cpp: #include "MySAX2Handler.hpp" #include <iostream.h> MySAX2Handler::MySAX2Handler() { } MySAX2Handler::startElement(const XMLCh* const uri, const XMLCh* const localname, const XMLCh* const qname, const Attributes& attrs) { // transcode() is an user application defined function which // converts unicode strings to usual 'char *'. Look at // the sample program SAX2Count for an example implementation. cout << "I saw element: " << transcode(qname) << endl; } MySAX2Handler::fatalError(const SAXParseException& exception) { cout << "Fatal Error: " << transcode(exception.getMessage()) << " at line: " << exception.getLineNumber() << endl; } The XMLCh and Attributes types are supplied by Xerces-C and are documented in the include files. Examples of their usage appear in the source code to the sample applications. Xerces SAX2 Supported FeaturesThe behavior of the SAX2XMLReader is dependant on the values of the following features.
All of the features below can be set using the
DOM Programming GuideJava and C++ DOM comparisonsThe C++ DOM API is very similar in design and use, to the Java DOM API bindings. As a consequence, conversion of existing Java code that makes use of the DOM to C++ is a straight forward process. This section outlines the differences between Java and C++ bindings. Accessing the API from application code// C++ #include <dom/DOM.hpp> // Java import org.w3c.dom.* The header file <dom/DOM.hpp> includes all the individual headers for the DOM API classes. Class NamesThe C++ class names are prefixed with "DOM_". The intent is to prevent conflicts between DOM class names and other names that may already be in use by an application or other libraries that a DOM based application must link with. The use of C++ namespaces would also have solved this conflict problem, but for the fact that many compilers do not yet support them. DOM_Document myDocument; // C++ DOM_Node aNode; DOM_Text someText; Document myDocument; // Java Node aNode; Text someText; If you wish to use the Java class names in C++, then you need to typedef them in C++. This is not advisable for the general case - conflicts really do occur - but can be very useful when converting a body of existing Java code to C++. typedef DOM_Document Document; typedef DOM_Node Node; Document myDocument; // Now C++ usage is // indistinguishable from Java Node aNode; Objects and Memory ManagementThe C++ DOM implementation uses automatic memory management, implemented using reference counting. As a result, the C++ code for most DOM operations is very similar to the equivalent Java code, right down to the use of factory methods in the DOM document class for nearly all object creation, and the lack of any explicit object deletion. Consider the following code snippets // This is C++ DOM_Node aNode; aNode = someDocument.createElement("ElementName"); DOM_Node docRootNode = someDoc.getDocumentElement(); docRootNode.AppendChild(aNode); // This is Java Node aNode; aNode = someDocument.createElement("ElementName"); Node docRootNode = someDoc.getDocumentElement(); docRootNode.AppendChild(aNode); The Java and the C++ are identical on the surface, except for the class names, and this similarity remains true for most DOM code. However, Java and C++ handle objects in somewhat different ways, making it important to understand a little bit of what is going on beneath the surface. In Java, the variable In C++ the variable Key points to remember when using the C++ DOM classes:
DOMStringClass DOMString provides the mechanism for passing string data to and from the DOM API. DOMString is not intended to be a completely general string class, but rather to meet the specific needs of the DOM API. The design derives from two primary sources: from the DOM's
CharacterData interface and from class Main features are:
When a string is passed into a method of the DOM, when setting the value of a Node, for example, the string is cloned so that any subsequent alteration or reuse of the string by the application will not alter the document contents. Similarly, when strings from the document are returned to an application via the DOM API, the string is cloned so that the document can not be inadvertently altered by subsequent edits to the string.
Equality TestingThe DOMString equality operators (and all of the rest of the DOM class conventions) are modeled after the Java equivalents. The equals() method compares the content of the string, while the == operator checks whether the string reference variables (the application program variables) refer to the same underlying string in memory. This is also true of DOM_Node, DOM_Element, etc., in that operator == tells whether the variables in the application are referring to the same actual node or not. It's all very Java-like
Here is an example of how the equality operators work: DOMString a = "Hello"; DOMString b = a; DOMString c = a.clone(); if (b == a) // This is true if (a == c) // This is false if (a.equals(c)) // This is true b = b + " World"; if (b == a) // Still true, and the string's // value is "Hello World" if (a.equals(c)) // false. a is "Hello World"; // c is still "Hello". DowncastingApplication code sometimes must cast an object reference from DOM_Node to one of the classes deriving from DOM_Node, DOM_Element, for example. The syntax for doing this in C++ is different from that in Java. // This is C++ DOM_Node aNode = someFunctionReturningNode(); DOM_Element el = (Element &) aNode; // This is Java Node aNode = someFunctionReturningNode(); Element el = (Element) aNode; The C++ cast is not type-safe; the Java cast is checked for compatible types at runtime. If necessary, a type-check can be made in C++ using the node type information: // This is C++ DOM_Node aNode = someFunctionReturningNode(); DOM_Element el; // by default, el will == null. if (anode.getNodeType() == DOM_Node::ELEMENT_NODE) el = (Element &) aNode; else // aNode does not refer to an element. // Do something to recover here. SubclassingThe C++ DOM classes, DOM_Node, DOM_Attr, DOM_Document, etc., are not designed to be subclassed by an application program. As an alternative, the DOM_Node class provides a User Data field for use by applications as a hook for extending nodes by referencing additional data or objects. See the API description for DOM_Node for details. Copyright © 2000 The Apache Software Foundation. All Rights Reserved. |
|||||||||||||||||||||||||||||||||||||||||||||||||
With any suggestions or questions please feel free to contact us |