Syntax-K

Know-How für Ihr Projekt

Perl Documentation

NAME

XML::LibXML::Reader - XML::LibXML::Reader - interface to libxml2 pull parser

SYNOPSIS

use XML::LibXML::Reader;
my $reader = XML::LibXML::Reader->new(location => "file.xml")
       or die "cannot read file.xml\n";
while ($reader->read) {
  processNode($reader);
}
sub processNode {
    my $reader = shift;
    printf "%d %d %s %d\n", ($reader->depth,
                             $reader->nodeType,
                             $reader->name,
                             $reader->isEmptyElement);
}

or

my $reader = XML::LibXML::Reader->new(location => "file.xml")
       or die "cannot read file.xml\n";
  $reader->preservePattern('//table/tr');
  $reader->finish;
  print $reader->document->toString(1);

DESCRIPTION

This is a perl interface to libxml2's pull-parser implementation xmlTextReader http://xmlsoft.org/html/libxml-xmlreader.html. This feature requires at least libxml2-2.6.21. Pull-parsers (such as StAX in Java, or XmlReader in C#) use an iterator approach to parse XML documents. They are easier to program than event-based parser (SAX) and much more lightweight than tree-based parser (DOM), which load the complete tree into memory.

The Reader acts as a cursor going forward on the document stream and stopping at each node on the way. At every point, the DOM-like methods of the Reader object allow one to examine the current node (name, namespace, attributes, etc.)

The user's code keeps control of the progress and simply calls the read() function repeatedly to progress to the next node in the document order. Other functions provide means for skipping complete sub-trees, or nodes until a specific element, etc.

At every time, only a very limited portion of the document is kept in the memory, which makes the API more memory-efficient than using DOM. However, it is also possible to mix Reader with DOM. At every point the user may copy the current node (optionally expanded into a complete sub-tree) from the processed document to another DOM tree, or to instruct the Reader to collect sub-document in form of a DOM tree consisting of selected nodes.

Reader API also supports namespaces, xml:base, entity handling, and DTD validation. Schema and RelaxNG validation support will probably be added in some later revision of the Perl interface.

The naming of methods compared to libxml2 and C# XmlTextReader has been changed slightly to match the conventions of XML::LibXML. Some functions have been changed or added with respect to the C interface.

CONSTRUCTOR

Depending on the XML source, the Reader object can be created with either of:

my $reader = XML::LibXML::Reader->new( location => "file.xml", ... );
  my $reader = XML::LibXML::Reader->new( string => $xml_string, ... );
  my $reader = XML::LibXML::Reader->new( IO => $file_handle, ... );
  my $reader = XML::LibXML::Reader->new( FD => fileno(STDIN), ... );
  my $reader = XML::LibXML::Reader->new( DOM => $dom, ... );

where ... are (optional) reader options described below in "Reader options" or various parser options described in XML::LibXML::Parser. The constructor recognizes the following XML sources:

Source specification

Reader options

METHODS CONTROLLING PARSING PROGRESS

METHODS EXTRACTING INFORMATION

METHODS EXTRACTING DOM NODES

METHODS PROCESSING ATTRIBUTES

OTHER METHODS

DESTRUCTION

XML::LibXML takes care of the reader object destruction when the last reference to the reader object goes out of scope. The document tree is preserved, though, if either of $reader->document or $reader->preserveNode was used and references to the document tree exist.

NODE TYPES

The reader interface provides the following constants for node types (the constant symbols are exported by default or if tag :types is used).

XML_READER_TYPE_NONE                    => 0
XML_READER_TYPE_ELEMENT                 => 1
XML_READER_TYPE_ATTRIBUTE               => 2
XML_READER_TYPE_TEXT                    => 3
XML_READER_TYPE_CDATA                   => 4
XML_READER_TYPE_ENTITY_REFERENCE        => 5
XML_READER_TYPE_ENTITY                  => 6
XML_READER_TYPE_PROCESSING_INSTRUCTION  => 7
XML_READER_TYPE_COMMENT                 => 8
XML_READER_TYPE_DOCUMENT                => 9
XML_READER_TYPE_DOCUMENT_TYPE           => 10
XML_READER_TYPE_DOCUMENT_FRAGMENT       => 11
XML_READER_TYPE_NOTATION                => 12
XML_READER_TYPE_WHITESPACE              => 13
XML_READER_TYPE_SIGNIFICANT_WHITESPACE  => 14
XML_READER_TYPE_END_ELEMENT             => 15
XML_READER_TYPE_END_ENTITY              => 16
XML_READER_TYPE_XML_DECLARATION         => 17

STATES

The following constants represent the values returned by readState(). They are exported by default, or if tag :states is used:

XML_READER_NONE      => -1
XML_READER_START     =>  0
XML_READER_ELEMENT   =>  1
XML_READER_END       =>  2
XML_READER_EMPTY     =>  3
XML_READER_BACKTRACK =>  4
XML_READER_DONE      =>  5
XML_READER_ERROR     =>  6

SEE ALSO

XML::LibXML::Pattern for information about compiled patterns.

http://xmlsoft.org/html/libxml-xmlreader.html

http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html

ORIGINAL IMPLEMENTATION

Heiko Klein, <H.Klein@gmx.net<gt> and Petr Pajas

AUTHORS

Matt Sergeant, Christian Glahn, Petr Pajas

VERSION

2.0128

COPYRIGHT

2001-2007, AxKit.com Ltd.

2002-2006, Christian Glahn.

2006-2009, Petr Pajas.

LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.