Syntax-K

Know-How für Ihr Projekt

Perl Documentation

NAME

XML::XForms::Validate - Perl extension for validation of XForms submissions

SYNOPSIS

use XML::XForms::Validate qw(validate);

# For method="post":
$msg = validate(input => $filename, xforms => $file, base => '../instances', model => 'form2') and die $msg;

# For method="get", method="urlencoded-post" or method="form-data-post":
$result = validate(input => \%parameters, xforms => \$xml_string);
die $result if !ref($result);

# OO usage:
my $validator = XML::XForms::Validate->new(input => \$xml_string, model => $model, base => $base);
$result = $validator->validate(input => $input);
die $result if !ref($result);
$result = $validator->normalize($validator->validate(input => $input2));
die $result if !ref($result);

DESCRIPTION

This module validates input data against an XML document containing one or more XForms models. It is able to process all serializations except multipart/related, relying on pre-parsed data for multipart/form-data or application/x-www-form-urlencoded.

Usage is rather simple: Supply input data (usually a submitted XML instance), an XML document containing one or more XForms models, and possibly some optional arguments. The return value is a hash of validated (and possibly modified) result DOM trees, one entry per original instance, or an error message string if validation failed.

Since XForms is a sufficient complex standard to make perfect validation of submission data impossible in the general case, some assumptions must be made. Most forms should work fine, but it is possible (and easy, if you know how) to create forms that yield submissions which are rejected as invalid. Likewise, there are some constructions which can allow invalid submissions to pass as valid. These limitations are documented in "VALIDATION", so please read that section carefully.

RATIONALE

In a networked scenario, XForms is a client-side technology. Having a Perl module may seem a bit useless, since Perl is usually used on the server side. On the other hand, everyone knows that user input should always be validated, but client-side validation is inherently untrusted.

There are several options for server-side validation of XML data, for example XML Schema or RelaxNG/Schematron. This module, in contrast, tries to deduce the allowed modifications directly from the XForms document that was used to build the input. It makes life easier for simple forms that do not warrant a full-blown XML Schema document. Most importantly, it is able to perform additional checks that are impossible with standalone schema validation, like readonly value enforcement and calculation result checks.

VALIDATION

The submitted data is checked, and a result instance is built according to the following rules. Only if all checks succeed will the submitted instance be declared valid. Note that if a model item property relies on content of a non-relevant instance node, behaviour is undefined, since non-relevant nodes are not submitted.

Comparison to the original instance, relevant MIP check

The element tree must be equal to the original instance. If there are more nodes than in the original, validation fails. If nodes are missing, they are copied from the original instance to the result instance. For these added nodes, the relevant model item property must evaluate to false. If any added nodes are relevant, validation fails. If any non-added nodes are non-relevant, validation fails.

Only elements and attributes are checked (actually, their localName and namespaceURI). Text content is checked later, and all other nodes are ignored.

xforms:insert and xforms:delete are not processed, which means that instances that contain additional or less elements due to these actions are regarded as invalid, even though it may be valid to create such instances.

readonly nodes, unreferenced nodes

If a node is read-only in both, the original and the submitted instance, they must match. Validation aborts, even if the node might have been non-readonly at some time during user interaction. Since that can't be autodetected, so the safest thing is to reject all mismatches. In all other cases, modification is allowed freely. Instance nodes not referenced by any form control or setvalue action are treated as readonly.

An exception are whitespace-only text nodes: if both, the original instance and the submitted data differ on the amount of whitespace in a whitespace-only text node, validation continues without error.

Note that readonly checks may not work correctly if binding expressions reference text nodes directly (instead of their parent elements).

required, constraint, calculate and type model item properties

Only relevant nodes are checked in this step. Validation fails:

For type, only of the built-in data types as specified in section 5 of the XForms specification are supported. Even this is incomplete, see XML::Schema::Type::Builtin. xsi:type attributes are not checked.

XML Schema validation

Schema documents may be specified by using the xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes on the original instance(s) root node(s). Each instance is validated using it's own XML Schema(s).

If the schema option is given, the given XML Schema will be used to validate the submitted data. No result instance is built, and none of the above checks are done. This is useful if the above assumptions and limitations reject valid documents. This can happen if the XForms document uses scripting, expressions that rely on non-relevant nodes, or certain combinations of XForms Actions. On success, the submission data is returned as a DOM tree.

METHODS AND FUNCTIONS

new(%options)

Creates a new validator object which contains preprocessed data structures. Thus, OO usage will need less processing time if multiple validations against one XForms model are done.

validate(%options)

Perform actual validation. Returns a hash of XML::LibXML::Document object on success (keyed by instance id, empty key '' for the submission subtree), or a plain string containing an error message in English language. Since validation errors are not supposed to occur on well-behaving XForms clients, no way to localize these messages is provided.

May be called as function or object method.

normalize($dom, $keep_extra_namespaces)

Normalize an XML::LibXML::Document (or a hash as returned by validate) by converting it (all of them) to its canonicalized form and stripping anything that is not an element, attribute, text node, or namespace node. It will strip nodes in the XInclude namespace. It will also strip namespace nodes that are unused unless you specify a true value as second parameter. It will return a new XML::LibXML::Document (or hash, respectively). The original DOM tree will be left unmodified.

The result should not contain any security-relevant or unexpected content anymore so that it is safe for further processing.

May be called as function or object method, and as a convenience, it will pass through strings unmodified.

OPTIONS

Behavior of the validator is controlled via named options.

For OO usage, the constructor takes the xforms, model and base options. These are ignored on the validate method call.

xforms

An XML document that contains at least one xforms:model element. The value is interpreted like this:

input

The submitted instance. Input type is autodetected using these rules:

The latter two data types are used for multipart/form-data and application/x-www-form-urlencoded serializations. Note that rebuilding the instance from these involves a certain amount of guessing. If any element local-name occurs more than once in the submitted instance, correct association of submitted values with DOM nodes may fail.

The other data types assume text/xml serialization. multipart/related is currently unsupported.

base

A base URL for external references. Relative URLs are resolved as per the xml:base specification. This is only used for the src attribute of xforms:instance elements. For security reasons, no external DTD subsets, external entities or XIncludes are processed.

root

A base URL for external references. Host-Relative URLs are resolved as per the xml:base specification. This is only used for the src attribute of xforms:instance elements. For security reasons, no external DTD subsets, external entities or XIncludes are processed.

model

The model id to use, in case there are multiple models in the XForms file. If not specified, the first model in document order is used.

The contained instances (including those specified via the src attribute) are considered trusted. External references might be retrieved and XML Schema information is honoured (except when noted otherwise). Never use unchecked user input as original instance data!

submission

The id of a submission element that was used to submit the input. If not given, the first submission element is used.

instance

Override for instance data. If given and defined, the value is interpreted similar to the xforms option. The default xforms:instance node in the model is replaced by the resulting XML data.

If a hashref is given, keys are instance IDs to replace, and the corresponding values are processed as above.

schema

An XML Schema document that will be used for schema validation of the submitted instance instead of the usual checks. Value is a URL or file name relative to the current working directory.

SECURITY

Since validation is inherently about security, there are a few measures to allow this module to be used with potentially untrusted input:

XForms validation has some inherent limitations. It is difficult to associate original instance nodes with their corresponding submitted instance nodes, especially for text nodes. Furthermore, submissions do not contain non-relevant nodes, thus part of the DOM tree is guessed. See "VALIDATION" above for a detailed description of checks and their individual limitations.

EXPORT

None by default.

The validate and normalize functions can be imported on request. Both can be used as standalone functions or as object methods.

KNOWN BUGS / TODO

SEE ALSO

The XForms 1.0 specification.

XML::LibXML and http://www.libxml.org for supported features, especially regarding XML Schema validation (which isn't complete as of writing this documentation).

XML::Schema for supported data types.

AUTHOR

Jörg Walter, <info@syntax-k.de>

COPYRIGHT AND LICENSE

Copyright (C) 2008 by Jörg Walter

This library is free software; you can redistribute it and/or modify it under the same terms as Perl version 5.8.0 itself.