dxml.dom
This implements a DOM for representing an XML 1.0 document. parseDOM
uses an dxml.parser.EntityRange to parse the document, and
DOMEntity recursively represents the DOM tree.
See the documentation for dxml.parser and
dxml.parser.EntityRange for details on the parser and its
configuration options.
For convenience, dxml.parser.EntityType and
dxml.parser.simpleXML are publicly imported by this module,
since EntityType
is required
to correctly use DOMEntity, and
simpleXML
is highly likely to
be used when calling parseDOM.
License:
Boost License 1.0.
See Also:
Official Specification for XML 1.0
Examples:
auto xml = "<!-- comment -->\n" ~ "<root>\n" ~ " <foo>some text<whatever/></foo>\n" ~ " <bar/>\n" ~ " <baz></baz>\n" ~ "</root>"; { auto dom = parseDOM(xml); assert(dom.type == EntityType.elementStart); assert(dom.name.empty); assert(dom.children.length == 2); assert(dom.children[0].type == EntityType.comment); assert(dom.children[0].text == " comment "); auto root = dom.children[1]; assert(root.type == EntityType.elementStart); assert(root.name == "root"); assert(root.children.length == 3); auto foo = root.children[0]; assert(foo.type == EntityType.elementStart); assert(foo.name == "foo"); assert(foo.children.length == 2); assert(foo.children[0].type == EntityType.text); assert(foo.children[0].text == "some text"); assert(foo.children[1].type == EntityType.elementEmpty); assert(foo.children[1].name == "whatever"); assert(root.children[1].type == EntityType.elementEmpty); assert(root.children[1].name == "bar"); assert(root.children[2].type == EntityType.elementStart); assert(root.children[2].name == "baz"); assert(root.children[2].children.length == 0); } { auto dom = parseDOM!simpleXML(xml); assert(dom.type == EntityType.elementStart); assert(dom.name.empty); assert(dom.children.length == 1); auto root = dom.children[0]; assert(root.type == EntityType.elementStart); assert(root.name == "root"); assert(root.children.length == 3); auto foo = root.children[0]; assert(foo.type == EntityType.elementStart); assert(foo.name == "foo"); assert(foo.children.length == 2); assert(foo.children[0].type == EntityType.text); assert(foo.children[0].text == "some text"); assert(foo.children[1].type == EntityType.elementStart); assert(foo.children[1].name == "whatever"); assert(foo.children[1].children.length == 0); assert(root.children[1].type == EntityType.elementStart); assert(root.children[1].name == "bar"); assert(root.children[1].children.length == 0); assert(root.children[2].type == EntityType.elementStart); assert(root.children[2].name == "baz"); assert(root.children[2].children.length == 0); }
- struct
DOMEntity
(R); DOMEntity!RparseDOM
(Config config = Config.init, R)(R range)
if(isForwardRange!R && isSomeChar!(ElementType!R)); DOMEntity!(ER.Input)parseDOM
(ER)(ref ER range)
if(isInstanceOf!(EntityRange, ER)); - Represents an entity in an XML document as a DOM tree.
parseDOM
either takes a range of characters or an dxml.parser.EntityRange and generates aDOMEntity
from that XML. WhenparseDOM
processes the XML, it returns aDOMEntity
representing the entire document. Even though the XML document itself isn't technically an entity in the XML document, it's simplest to treat it as if it were an EntityType.elementStart with an empty name. ThatDOMEntity
then contains child entities that recursively define the DOM tree through their children. For DOMEntities of type EntityType.elementStart , DOMEntity.children gives access to all of the child entities of that start tag. Other DOMEntities have no children. Note that the type determines which properties of theDOMEntity
can be used, and it can determine whether functions which aDOMEntity
is passed to are allowed to be called. Each function lists which EntityType s are allowed, and it is an error to call them with any other EntityType . IfparseDOM
is given a range of characters, it in turn passes that to dxml.parser.parseXML to do the actual XML parsing. As such, that overload accepts an optional dxml.parser.Config as a template argument to configure the parser. IfparseDOM
is given an EntityRange , the range does not have to be at the start of the document. It can be used to create a DOM for a portion of the document. As when a character range is passed to it, it will return aDOMEntity
with the type EntityType.elementStart and an empty name. It will iterate the range until it either reaches the end of the range, or it reaches the end tag which matches the start tag which is the parent of the entity that was the front of the range when it was passed toparseDOM
. The EntityType.elementStart is passed by ref, so if it was not at the top level when it was passed toparseDOM
(and thus still has elements in it whenparseDOM
returns), the range will then be at the entity after that matching end tag, and the application can continue to process the range after that if it so chooses.Parameters:config The dxml.parser.Config to use with dxml.parser.parseXML if the range passed to parseDOM
is a range of characters.R range Either a range of characters representing an entire XML document or an dxml.parser.EntityRange which may refer to some or all of an XML document. Returns: ADOMEntity
representing the DOM tree from the point in the document that was passed toparseDOM
(the start of the document if a range of characters was passed, and wherever in the document the range was if an EntityRange was passed).Throws: XMLParsingException if the parser encounters invalid XML.- alias
SliceOfR
= R; - The type used when any slice of the original range of characters is used. If the range was a string or supports slicing, then
SliceOfR
is the same type as the range; otherwise, it's the result of calling std.range.takeExactly on it.import std.algorithm : filter; import std.range : takeExactly; static assert(is(DOMEntity!string.SliceOfR == string)); auto range = filter!(a => true)("some xml"); static assert(is(DOMEntity!(typeof(range)).SliceOfR == typeof(takeExactly(range, 42))));
- const pure nothrow @nogc @property @safe EntityType
type
(); - The EntityType for this DOMEntity.The
type
can never be EntityType.elementEnd , because children already indicates where the contents of the start tag end.type
determines which properties of the DOMEntity can be used, and it can determine whether functions which a DOMEntity is passed to are allowed to be called. Each function lists which EntityType s are allowed, and it is an error to call them with any other EntityType .Examples:import std.range.primitives; auto xml = "<root>\n" ~ " <!--no comment-->\n" ~ " <![CDATA[cdata run]]>\n" ~ " <text>I am text!</text>\n" ~ " <empty/>\n" ~ " <?pi?>\n" ~ "</root>"; auto dom = parseDOM(xml); assert(dom.type == EntityType.elementStart); assert(dom.name.empty); assert(dom.children.length == 1); auto root = dom.children[0]; assert(root.type == EntityType.elementStart); assert(root.name == "root"); assert(root.children.length == 5); assert(root.children[0].type == EntityType.comment); assert(root.children[0].text == "no comment"); assert(root.children[1].type == EntityType.cdata); assert(root.children[1].text == "cdata run"); auto textTag = root.children[2]; assert(textTag.type == EntityType.elementStart); assert(textTag.name == "text"); assert(textTag.children.length == 1); assert(textTag.children[0].type == EntityType.text); assert(textTag.children[0].text == "I am text!"); assert(root.children[3].type == EntityType.elementEmpty); assert(root.children[3].name == "empty"); assert(root.children[4].type == EntityType.pi); assert(root.children[4].name == "pi");
- const pure nothrow @nogc @property @safe TextPos
pos
(); - The position in the the original text where the entity starts.Examples:
import std.range.primitives; import dxml.parser : TextPos; import dxml.util : stripIndent; auto xml = "<root>\n" ~ " <foo>\n" ~ " Foo and bar. Always foo and bar...\n" ~ " </foo>\n" ~ "</root>"; auto dom = parseDOM(xml); assert(dom.type == EntityType.elementStart); assert(dom.name.empty); assert(dom.pos == TextPos(1, 1)); auto root = dom.children[0]; assert(root.type == EntityType.elementStart); assert(root.name == "root"); assert(root.pos == TextPos(1, 1)); auto foo = root.children[0]; assert(foo.type == EntityType.elementStart); assert(foo.name == "foo"); assert(foo.pos == TextPos(2, 5)); auto text = foo.children[0]; assert(text.type == EntityType.text); assert(text.text.stripIndent() == "Foo and bar. Always foo and bar..."); assert(text.pos == TextPos(2, 10));
- @property SliceOfR
name
(); - Gives the
name
of this DOMEntity.Note that this is the directname
in the XML for this entity and does not contain any of the names of any of the parent entities that this entity has.Supported EntityTypes: elementStart elementEnd elementEmpty pi See Also: pathExamples:auto xml = "<root>\n" ~ " <empty/>\n" ~ " <?pi?>\n" ~ "</root>"; auto dom = parseDOM(xml); assert(dom.type == EntityType.elementStart); assert(dom.name.empty); auto root = dom.children[0]; assert(root.type == EntityType.elementStart); assert(root.name == "root"); assert(root.children[0].type == EntityType.elementEmpty); assert(root.children[0].name == "empty"); assert(root.children[1].type == EntityType.pi); assert(root.children[1].name == "pi");
- @property SliceOfR[]
path
(); - Gives the list of the names of the parent start tags of this DOMEntity.The name of the current entity (if it has one) is not included in the
path
. Note that if parseDOM were given an EntityRange , thepath
starts where the range started. So, it doesn't necessarily contain the entirepath
from the start of the XML document.See Also: nameExamples:auto xml = "<root>\n" ~ " <bar>\n" ~ " <baz>\n" ~ " <xyzzy/>\n" ~ " </baz>\n" ~ " <frobozz>\n" ~ " <!-- comment -->\n" ~ " It's magic!\n" ~ " </frobozz>\n" ~ " </bar>\n" ~ " <foo></foo>\n" ~ "</root>"; auto dom = parseDOM(xml); assert(dom.type == EntityType.elementStart); assert(dom.name.empty); assert(dom.path.empty); auto root = dom.children[0]; assert(root.type == EntityType.elementStart); assert(root.name == "root"); assert(root.path.empty); auto bar = root.children[0]; assert(bar.type == EntityType.elementStart); assert(bar.name == "bar"); assert(bar.path == ["root"]); auto baz = bar.children[0]; assert(baz.type == EntityType.elementStart); assert(baz.name == "baz"); assert(baz.path == ["root", "bar"]); auto xyzzy = baz.children[0]; assert(xyzzy.type == EntityType.elementEmpty); assert(xyzzy.name == "xyzzy"); assert(xyzzy.path == ["root", "bar", "baz"]); auto frobozz = bar.children[1]; assert(frobozz.type == EntityType.elementStart); assert(frobozz.name == "frobozz"); assert(frobozz.path == ["root", "bar"]); auto comment = frobozz.children[0]; assert(comment.type == EntityType.comment); assert(comment.text == " comment "); assert(comment.path == ["root", "bar", "frobozz"]); auto text = frobozz.children[1]; assert(text.type == EntityType.text); assert(text.text == "\n It's magic!\n "); assert(text.path == ["root", "bar", "frobozz"]); auto foo = root.children[1]; assert(foo.type == EntityType.elementStart); assert(foo.name == "foo"); assert(foo.path == ["root"]);
- @property auto
attributes
(); - Returns a dynamic array of
attributes
for a start tag where each attribute is represented as a
Tuple!( SliceOfR, "name", SliceOfR, "value", TextPos , "pos").Supported EntityTypes: elementStart elementEmpty Examples:import std.algorithm.comparison : equal; import std.algorithm.iteration : filter; import std.range.primitives; import dxml.parser : TextPos; { auto xml = "<root/>"; auto root = parseDOM(xml).children[0]; assert(root.type == EntityType.elementEmpty); assert(root.attributes.empty); } { auto xml = "<root a='42' q='29' w='hello'/>"; auto root = parseDOM(xml).children[0]; assert(root.type == EntityType.elementEmpty); auto attrs = root.attributes; assert(attrs.length == 3); assert(attrs[0].name == "a"); assert(attrs[0].value == "42"); assert(attrs[0].pos == TextPos(1, 7)); assert(attrs[1].name == "q"); assert(attrs[1].value == "29"); assert(attrs[1].pos == TextPos(1, 14)); assert(attrs[2].name == "w"); assert(attrs[2].value == "hello"); assert(attrs[2].pos == TextPos(1, 21)); } // Because the type of name and value is SliceOfR, == with a string // only works if the range passed to parseXML was string. { auto xml = filter!"true"("<root a='42' q='29' w='hello'/>"); auto root = parseDOM(xml).children[0]; assert(root.type == EntityType.elementEmpty); auto attrs = root.attributes; assert(attrs.length == 3); assert(equal(attrs[0].name, "a")); assert(equal(attrs[0].value, "42")); assert(attrs[0].pos == TextPos(1, 7)); assert(equal(attrs[1].name, "q")); assert(equal(attrs[1].value, "29")); assert(attrs[1].pos == TextPos(1, 14)); assert(equal(attrs[2].name, "w")); assert(equal(attrs[2].value, "hello")); assert(attrs[2].pos == TextPos(1, 21)); }
- @property SliceOfR
text
(); - Returns the textual value of this DOMEntity.In the case of EntityType.pi , this is the
text
that follows the name, whereas in the other cases, thetext
is the entire contents of the entity (save for the delimeters on the ends if that entity has them).Supported EntityTypes: cdata comment pi text Examples:auto xml = "<?xml version='1.0'?>\n" ~ "<?instructionName?>\n" ~ "<?foo here is something to say?>\n" ~ "<root>\n" ~ " <![CDATA[ Yay! random text >> << ]]>\n" ~ " <!-- some random comment -->\n" ~ " <p>something here</p>\n" ~ " <p>\n" ~ " something else\n" ~ " here</p>\n" ~ "</root>"; auto dom = parseDOM(xml); // "<?instructionName?>\n" ~ auto pi1 = dom.children[0]; assert(pi1.type == EntityType.pi); assert(pi1.name == "instructionName"); assert(pi1.text.empty); // "<?foo here is something to say?>\n" ~ auto pi2 = dom.children[1]; assert(pi2.type == EntityType.pi); assert(pi2.name == "foo"); assert(pi2.text == "here is something to say"); // "<root>\n" ~ auto root = dom.children[2]; assert(root.type == EntityType.elementStart); // " <![CDATA[ Yay! random text >> << ]]>\n" ~ auto cdata = root.children[0]; assert(cdata.type == EntityType.cdata); assert(cdata.text == " Yay! random text >> << "); // " <!-- some random comment -->\n" ~ auto comment = root.children[1]; assert(comment.type == EntityType.comment); assert(comment.text == " some random comment "); // " <p>something here</p>\n" ~ auto p1 = root.children[2]; assert(p1.type == EntityType.elementStart); assert(p1.name == "p"); assert(p1.children[0].type == EntityType.text); assert(p1.children[0].text == "something here"); // " <p>\n" ~ // " something else\n" ~ // " here</p>\n" ~ auto p2 = root.children[3]; assert(p2.type == EntityType.elementStart); assert(p2.children[0].type == EntityType.text); assert(p2.children[0].text == "\n something else\n here");
- @property DOMEntity[]
children
(); - Returns the child entities of the current entity.They are in the same order that they were in the XML document.
Supported EntityTypes: elementStart Examples:auto xml = "<potato>\n" ~ " <!--comment-->\n" ~ " <foo>bar</foo>\n" ~ " <tag>\n" ~ " <silly>you</silly>\n" ~ " <empty/>\n" ~ " <nocontent></nocontent>\n" ~ " </tag>\n" ~ "</potato>\n" ~ "<!--the end-->"; auto dom = parseDOM(xml); assert(dom.children.length == 2); auto potato = dom.children[0]; assert(potato.type == EntityType.elementStart); assert(potato.name == "potato"); assert(potato.children.length == 3); auto comment = potato.children[0]; assert(comment.type == EntityType.comment); assert(comment.text == "comment"); auto foo = potato.children[1]; assert(foo.type == EntityType.elementStart); assert(foo.name == "foo"); assert(foo.children.length == 1); assert(foo.children[0].type == EntityType.text); assert(foo.children[0].text == "bar"); auto tag = potato.children[2]; assert(tag.type == EntityType.elementStart); assert(tag.name == "tag"); assert(tag.children.length == 3); auto silly = tag.children[0]; assert(silly.type == EntityType.elementStart); assert(silly.name == "silly"); assert(silly.children.length == 1); assert(silly.children[0].type == EntityType.text); assert(silly.children[0].text == "you"); auto empty = tag.children[1]; assert(empty.type == EntityType.elementEmpty); assert(empty.name == "empty"); auto nocontent = tag.children[2]; assert(nocontent.type == EntityType.elementStart); assert(nocontent.name == "nocontent"); assert(nocontent.children.length == 0); auto endComment = dom.children[1]; assert(endComment.type == EntityType.comment); assert(endComment.text == "the end");