Implementation of DOM (Core) Interface in C++

SourceForge.net Logo

This DOM implementation is based on the XML parser expat written by James Clark. expat is an event callback XML parser.

Contents

The directory interface is the DOM interface, written in C++. It is what may be called a C++ Language Binding of DOM interface, and it is a translation of Java Language Binding from Java to C++. It can be used by the user of DOM to see what functions are available, how they must be called, what they return etc.

The directory include contains the actual header files that are used for the compilation of the C++DOM and that must be included by a program that uses it. They are basically the same files as those in the directory interface, but they have additional things that are required for the implementation. There are also some additional files that are used for representing the structure of DTD, which are not mentioned in DOM specification and are specific to our implementation.

The directory src contains the implementation of the objects of the DOM interface. It also contains some files related with parsing of xml and dtd files.

The directory lib contains DOM.a and expat.a, two archive files that contain the objects neccessary for linking.

sample contains a sample program that shows how to use C++DOM. It reads from the command line the names of an XML file and a DTD file, builds in memory a DOM representation for this document, then prints it out by traversing the structure.

dtdparse contains a parser for DTD that is build using flex and yacc. The files lex.yy.c and y.tab.c from this directory are copied to the directory src, where they are included in another file.

Recompilation of C++DOM

First, go to directory expat and type make. This creates an archive file, expat.a that contains the object code neccessary for linking and will put it into the directory lib.

Then, type make in the directory DOM. This will create the object file for each class and will archive them into DOM.a in the directory lib.

If dtd parser is also recompiled, (although there is no need to do this, unless you modify it) then the files lex.yy.c and y.tab.c must be copied to src and <#include "lex.yy.c"> inside y.tab.c must be changed to <#include src/lex.yy.c>.

Using C++DOM

Include DOM.h from the directory include and link with DOM.a and expat.a from the directory lib. See also how it is used in the sample file.

Extentions to DOM Specification wrt to DTD

In order to represent the DTD with objects in memory, we have used some additional objects that are not defined by DOM specification. These are:

Also, the DocumentType object keeps a hash (namedNodeMap) of all ElementType-s and a hash of all AttrType-s. An ElementType has a pointer to a ContentType object (what an element of this type can contain) and a multiplicity (*, ?, +) of itself. An AttrType contains a pointer to the ElementType to which it belongs, its type, its default value etc. A PCDATA object serves to represent #PCDATA.

A ContentType object represents in memory something like this (in DTD): ( , , ... )* (comma may also be a '|' pipestem, and '*' can also be '+', '?' or nothing) It has a list of children, where each child can be another ContentType object, an ElementType object, or #PCDATA. It has a field that keeps whether the children are separated by ',' or by '|', and another field that keeps the multiplicity.

What is not implemented

Not everything in the DOM specification has been implemented (because of lack of time, because we didn't need them, because we didn't understand them very well, or for some other reason). Entities, notations, entity references have been left out almost totally. Some things that are mentioned in DOM specification, like readonly nodes, some exceptions, normalization etc. have not been implemented.

Although the DTD of a document is represented in memory, the implementation doesn't do yet validation. We think that this can be done by patern matching of the DTD tree and the XML tree (by traversing them at the same time).

Another thing that we think is necessary for validation (but is not implemented) is that each DOM object should have a pointer to the type that it belongs to, e.g. each Element should have a pointer to the ElementType that it belongs to, each Attr to its AttrType etc. This would enable the implementor to check easily whether the insertion of a new node violates the rules of DTD, etc, making easy the validation of editting of an XML document.

Download

View: DOM/, and download: DOM.tgz

Copying

C++DOM, is an implementation of DOM (Core) interface in C++. Copyright (C) 1999 Dashamir Hoxha & Aurel Cami.

C++DOM is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

C++DOM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with C++DOM; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA