Jonathan M Davis: The Long-Winded D Guy

dxml.dom

This implements a DOM for representing an XML 1.0 document. parseDOM uses an dxml.parser.EntityRange to parse the document, and DOMEntity recursively represents the DOM tree.
See the documentation for dxml.parser and dxml.parser.EntityRange for details on the parser and its configuration options.
For convenience, dxml.parser.EntityType and dxml.parser.simpleXML are publicly imported by this module, since EntityType is required to correctly use DOMEntity, and simpleXML is highly likely to be used when calling parseDOM.
Examples:
import std.range.primitives : empty;

auto xml = "<!-- comment -->\n" ~
           "<root>\n" ~
           "    <foo>some text<whatever/></foo>\n" ~
           "    <bar/>\n" ~
           "    <baz></baz>\n" ~
           "</root>";
{
    auto dom = parseDOM(xml);
    assert(dom.type == EntityType.elementStart);
    assert(dom.name.empty);
    assert(dom.children.length == 2);

    assert(dom.children[0].type == EntityType.comment);
    assert(dom.children[0].text == " comment ");

    auto root = dom.children[1];
    assert(root.type == EntityType.elementStart);
    assert(root.name == "root");
    assert(root.children.length == 3);

    auto foo = root.children[0];
    assert(foo.type == EntityType.elementStart);
    assert(foo.name == "foo");
    assert(foo.children.length == 2);

    assert(foo.children[0].type == EntityType.text);
    assert(foo.children[0].text == "some text");

    assert(foo.children[1].type == EntityType.elementEmpty);
    assert(foo.children[1].name == "whatever");

    assert(root.children[1].type == EntityType.elementEmpty);
    assert(root.children[1].name == "bar");

    assert(root.children[2].type == EntityType.elementStart);
    assert(root.children[2].name == "baz");
    assert(root.children[2].children.length == 0);
}
{
    auto dom = parseDOM!simpleXML(xml);
    assert(dom.type == EntityType.elementStart);
    assert(dom.name.empty);
    assert(dom.children.length == 1);

    auto root = dom.children[0];
    assert(root.type == EntityType.elementStart);
    assert(root.name == "root");
    assert(root.children.length == 3);

    auto foo = root.children[0];
    assert(foo.type == EntityType.elementStart);
    assert(foo.name == "foo");
    assert(foo.children.length == 2);

    assert(foo.children[0].type == EntityType.text);
    assert(foo.children[0].text == "some text");

    assert(foo.children[1].type == EntityType.elementStart);
    assert(foo.children[1].name == "whatever");
    assert(foo.children[1].children.length == 0);

    assert(root.children[1].type == EntityType.elementStart);
    assert(root.children[1].name == "bar");
    assert(root.children[1].children.length == 0);

    assert(root.children[2].type == EntityType.elementStart);
    assert(root.children[2].name == "baz");
    assert(root.children[2].children.length == 0);
}
struct DOMEntity(R);
DOMEntity!R parseDOM(Config config = Config.init, R)(R range)
if(isForwardRange!R && isSomeChar!(ElementType!R));
DOMEntity!(ER.Input) parseDOM(ER)(ref ER range)
if(isInstanceOf!(EntityRange, ER));
Represents an entity in an XML document as a DOM tree.
parseDOM either takes a range of characters or an dxml.parser.EntityRange and generates a DOMEntity from that XML.
When parseDOM processes the XML, it returns a DOMEntity representing the entire document. Even though the XML document itself isn't technically an entity in the XML document, it's simplest to treat it as if it were an EntityType.elementStart with an empty name. That DOMEntity then contains child entities that recursively define the DOM tree through their children.
For DOMEntities of type EntityType.elementStart, DOMEntity.children gives access to all of the child entities of that start tag. Other DOMEntities have no children.
Note that the type determines which properties of the DOMEntity can be used, and it can determine whether functions which a DOMEntity is passed to are allowed to be called. Each function lists which EntityTypes are allowed, and it is an error to call them with any other EntityType.
If parseDOM is given a range of characters, it in turn passes that to dxml.parser.parseXML to do the actual XML parsing. As such, that overload accepts an optional dxml.parser.Config as a template argument to configure the parser.
If parseDOM is given an EntityRange, the range does not have to be at the start of the document. It can be used to create a DOM for a portion of the document. When a character range is passed to it, it will return a DOMEntity with the type EntityType.elementStart and an empty name. It will iterate the range until it either reaches the end of the range, or it reaches the end tag which matches the start tag which is the parent of the entity that was the front of the range when it was passed to parseDOM. The EntityType.elementStart is passed by ref, so if it was not at the top level when it was passed to parseDOM (and thus still has elements in it when parseDOM returns), the range will then be at the entity after that matching end tag, and the application can continue to process the range after that if it so chooses.
Parameters:
config The dxml.parser.Config to use with dxml.parser.parseXML if the range passed to parseDOM is a range of characters.
R range Either a range of characters representing an entire XML document or a dxml.parser.EntityRange which may refer to some or all of an XML document.
Returns: A DOMEntity representing the DOM tree from the point in the document that was passed to parseDOM (the start of the document if a range of characters was passed, and wherever in the document the range was if an EntityRange was passed).
Throws: XMLParsingException if the parser encounters invalid XML.
Examples: parseDOM with the default Config and a range of characters.
import std.range.primitives;

auto xml = "<root>\n" ~
           "    <!-- no comment -->\n" ~
           "    <foo></foo>\n" ~
           "    <baz>\n" ~
           "        <xyzzy>It's an adventure!</xyzzy>\n" ~
           "    </baz>\n" ~
           "    <tag/>\n" ~
           "</root>";

auto dom = parseDOM(xml);
assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);
assert(dom.children.length == 1);

auto root = dom.children[0];
assert(root.type == EntityType.elementStart);
assert(root.name == "root");
assert(root.children.length == 4);

assert(root.children[0].type == EntityType.comment);
assert(root.children[0].text == " no comment ");

assert(root.children[1].type == EntityType.elementStart);
assert(root.children[1].name == "foo");
assert(root.children[1].children.length == 0);

auto baz = root.children[2];
assert(baz.type == EntityType.elementStart);
assert(baz.name == "baz");
assert(baz.children.length == 1);

auto xyzzy = baz.children[0];
assert(xyzzy.type == EntityType.elementStart);
assert(xyzzy.name == "xyzzy");
assert(xyzzy.children.length == 1);

assert(xyzzy.children[0].type == EntityType.text);
assert(xyzzy.children[0].text == "It's an adventure!");

assert(root.children[3].type == EntityType.elementEmpty);
assert(root.children[3].name == "tag");
Examples: parseDOM with simpleXML and a range of characters.
import std.range.primitives : empty;

auto xml = "<root>\n" ~
           "    <!-- no comment -->\n" ~
           "    <foo></foo>\n" ~
           "    <baz>\n" ~
           "        <xyzzy>It's an adventure!</xyzzy>\n" ~
           "    </baz>\n" ~
           "    <tag/>\n" ~
           "</root>";

auto dom = parseDOM!simpleXML(xml);
assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);
assert(dom.children.length == 1);

auto root = dom.children[0];
assert(root.type == EntityType.elementStart);
assert(root.name == "root");
assert(root.children.length == 3);

assert(root.children[0].type == EntityType.elementStart);
assert(root.children[0].name == "foo");
assert(root.children[0].children.length == 0);

auto baz = root.children[1];
assert(baz.type == EntityType.elementStart);
assert(baz.name == "baz");
assert(baz.children.length == 1);

auto xyzzy = baz.children[0];
assert(xyzzy.type == EntityType.elementStart);
assert(xyzzy.name == "xyzzy");
assert(xyzzy.children.length == 1);

assert(xyzzy.children[0].type == EntityType.text);
assert(xyzzy.children[0].text == "It's an adventure!");

assert(root.children[2].type == EntityType.elementStart);
assert(root.children[2].name == "tag");
assert(root.children[2].children.length == 0);
Examples: parseDOM with simpleXML and an EntityRange.
import std.range.primitives : empty;
import dxml.parser : parseXML;

auto xml = "<root>\n" ~
           "    <!-- no comment -->\n" ~
           "    <foo></foo>\n" ~
           "    <baz>\n" ~
           "        <xyzzy>It's an adventure!</xyzzy>\n" ~
           "    </baz>\n" ~
           "    <tag/>\n" ~
           "</root>";

auto range = parseXML!simpleXML(xml);
auto dom = parseDOM(range);
assert(range.empty);

assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);
assert(dom.children.length == 1);

auto root = dom.children[0];
assert(root.type == EntityType.elementStart);
assert(root.name == "root");
assert(root.children.length == 3);

assert(root.children[0].type == EntityType.elementStart);
assert(root.children[0].name == "foo");
assert(root.children[0].children.length == 0);

auto baz = root.children[1];
assert(baz.type == EntityType.elementStart);
assert(baz.name == "baz");
assert(baz.children.length == 1);

auto xyzzy = baz.children[0];
assert(xyzzy.type == EntityType.elementStart);
assert(xyzzy.name == "xyzzy");
assert(xyzzy.children.length == 1);

assert(xyzzy.children[0].type == EntityType.text);
assert(xyzzy.children[0].text == "It's an adventure!");

assert(root.children[2].type == EntityType.elementStart);
assert(root.children[2].name == "tag");
assert(root.children[2].children.length == 0);
Examples: parseDOM with an EntityRange which is not at the start of the document.
import std.range.primitives : empty;
import dxml.parser : parseXML, skipToPath;

auto xml = "<root>\n" ~
           "    <!-- no comment -->\n" ~
           "    <foo></foo>\n" ~
           "    <baz>\n" ~
           "        <xyzzy>It's an adventure!</xyzzy>\n" ~
           "    </baz>\n" ~
           "    <tag/>\n" ~
           "</root>";

auto range = parseXML!simpleXML(xml).skipToPath("baz/xyzzy");
assert(range.front.type == EntityType.elementStart);
assert(range.front.name == "xyzzy");

auto dom = parseDOM(range);
assert(range.front.type == EntityType.elementStart);
assert(range.front.name == "tag");

assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);
assert(dom.children.length == 1);

auto xyzzy = dom.children[0];
assert(xyzzy.type == EntityType.elementStart);
assert(xyzzy.name == "xyzzy");
assert(xyzzy.children.length == 1);

assert(xyzzy.children[0].type == EntityType.text);
assert(xyzzy.children[0].text == "It's an adventure!");
Examples: parseDOM at compile-time
enum xml = "<!-- comment -->\n" ~
           "<root>\n" ~
           "    <foo>some text<whatever/></foo>\n" ~
           "    <bar/>\n" ~
           "    <baz></baz>\n" ~
           "</root>";

enum dom = parseDOM(xml);
static assert(dom.type == EntityType.elementStart);
static assert(dom.name.empty);
static assert(dom.children.length == 2);

static assert(dom.children[0].type == EntityType.comment);
static assert(dom.children[0].text == " comment ");
alias SliceOfR = R;
The type used when any slice of the original range of characters is used. If the range was a string or supports slicing, then SliceOfR is the same type as the range; otherwise, it's the result of calling std.range.takeExactly on it.
import std.algorithm : filter;
import std.range : takeExactly;

static assert(is(DOMEntity!string.SliceOfR == string));

auto range = filter!(a => true)("some xml");

static assert(is(DOMEntity!(typeof(range)).SliceOfR ==
                 typeof(takeExactly(range, 42))));
alias Attribute = Tuple!(SliceOfR, "name", SliceOfR, "value", TextPos, "pos");
The exact instantiation of std.typecons.Tuple that attributes returns a range of.
See Also: attributes
const pure nothrow @nogc @property @safe EntityType type();
The EntityType for this DOMEntity.
The type can never be EntityType.elementEnd, because the end of children already indicates where the contents of the start tag end.
type determines which properties of the DOMEntity can be used, and it can determine whether functions which a DOMEntity is passed to are allowed to be called. Each function lists which EntityTypes are allowed, and it is an error to call them with any other EntityType.
Examples:
import std.range.primitives;

auto xml = "<root>\n" ~
           "    <!--no comment-->\n" ~
           "    <![CDATA[cdata run]]>\n" ~
           "    <text>I am text!</text>\n" ~
           "    <empty/>\n" ~
           "    <?pi?>\n" ~
           "</root>";

auto dom = parseDOM(xml);
assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);
assert(dom.children.length == 1);

auto root = dom.children[0];
assert(root.type == EntityType.elementStart);
assert(root.name == "root");
assert(root.children.length == 5);

assert(root.children[0].type == EntityType.comment);
assert(root.children[0].text == "no comment");

assert(root.children[1].type == EntityType.cdata);
assert(root.children[1].text == "cdata run");

auto textTag = root.children[2];
assert(textTag.type == EntityType.elementStart);
assert(textTag.name == "text");
assert(textTag.children.length == 1);

assert(textTag.children[0].type == EntityType.text);
assert(textTag.children[0].text == "I am text!");

assert(root.children[3].type == EntityType.elementEmpty);
assert(root.children[3].name == "empty");

assert(root.children[4].type == EntityType.pi);
assert(root.children[4].name == "pi");
const pure nothrow @nogc @property @safe TextPos pos();
The position in the the original text where the entity starts.
Examples:
import std.range.primitives : empty;
import dxml.parser : TextPos;
import dxml.util : stripIndent;

auto xml = "<root>\n" ~
           "    <foo>\n" ~
           "        Foo and bar. Always foo and bar...\n" ~
           "    </foo>\n" ~
           "</root>";

auto dom = parseDOM(xml);
assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);
assert(dom.pos == TextPos(1, 1));

auto root = dom.children[0];
assert(root.type == EntityType.elementStart);
assert(root.name == "root");
assert(root.pos == TextPos(1, 1));

auto foo = root.children[0];
assert(foo.type == EntityType.elementStart);
assert(foo.name == "foo");
assert(foo.pos == TextPos(2, 5));

auto text = foo.children[0];
assert(text.type == EntityType.text);
assert(text.text.stripIndent() ==
       "Foo and bar. Always foo and bar...");
assert(text.pos == TextPos(2, 10));
@property SliceOfR name();
Gives the name of this DOMEntity.
Note that this is the direct name in the XML for this entity and does not contain any of the names of any of the parent entities that this entity has.
Supported EntityTypes:
elementStart
elementEnd
elementEmpty
pi
See Also: path
Examples:
import std.range.primitives : empty;

auto xml = "<root>\n" ~
           "    <empty/>\n" ~
           "    <?pi?>\n" ~
           "</root>";

auto dom = parseDOM(xml);
assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);

auto root = dom.children[0];
assert(root.type == EntityType.elementStart);
assert(root.name == "root");

assert(root.children[0].type == EntityType.elementEmpty);
assert(root.children[0].name == "empty");

assert(root.children[1].type == EntityType.pi);
assert(root.children[1].name == "pi");
@property SliceOfR[] path();
Gives the list of the names of the parent start tags of this DOMEntity.
The name of the current entity (if it has one) is not included in the path.
Note that if parseDOM were given an EntityRange, the path starts where the range started. So, it doesn't necessarily contain the entire path from the start of the XML document.
See Also: name
Examples:
import std.range.primitives : empty;

auto xml = "<root>\n" ~
           "    <bar>\n" ~
           "        <baz>\n" ~
           "            <xyzzy/>\n" ~
           "        </baz>\n" ~
           "        <frobozz>\n" ~
           "            <!-- comment -->\n" ~
           "            It's magic!\n" ~
           "        </frobozz>\n" ~
           "    </bar>\n" ~
           "    <foo></foo>\n" ~
           "</root>";

auto dom = parseDOM(xml);
assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);
assert(dom.path.empty);

auto root = dom.children[0];
assert(root.type == EntityType.elementStart);
assert(root.name == "root");
assert(root.path.empty);

auto bar = root.children[0];
assert(bar.type == EntityType.elementStart);
assert(bar.name == "bar");
assert(bar.path == ["root"]);

auto baz = bar.children[0];
assert(baz.type == EntityType.elementStart);
assert(baz.name == "baz");
assert(baz.path == ["root", "bar"]);

auto xyzzy = baz.children[0];
assert(xyzzy.type == EntityType.elementEmpty);
assert(xyzzy.name == "xyzzy");
assert(xyzzy.path == ["root", "bar", "baz"]);

auto frobozz = bar.children[1];
assert(frobozz.type == EntityType.elementStart);
assert(frobozz.name == "frobozz");
assert(frobozz.path == ["root", "bar"]);

auto comment = frobozz.children[0];
assert(comment.type == EntityType.comment);
assert(comment.text == " comment ");
assert(comment.path == ["root", "bar", "frobozz"]);

auto text = frobozz.children[1];
assert(text.type == EntityType.text);
assert(text.text == "\n            It's magic!\n        ");
assert(text.path == ["root", "bar", "frobozz"]);

auto foo = root.children[1];
assert(foo.type == EntityType.elementStart);
assert(foo.name == "foo");
assert(foo.path == ["root"]);
@property auto attributes();
Returns a dynamic array of attributes for a start tag where each attribute is represented as a
Tuple!( SliceOfR, "name", SliceOfR, "value", TextPos, "pos").
Examples:
import std.algorithm.comparison : equal;
import std.algorithm.iteration : filter;
import std.range.primitives : empty;
import dxml.parser : TextPos;

{
    auto xml = "<root/>";
    auto root = parseDOM(xml).children[0];
    assert(root.type == EntityType.elementEmpty);
    assert(root.attributes.empty);

    static assert(is(ElementType!(typeof(root.attributes)) ==
                     typeof(root).Attribute));
}
{
    auto xml = "<root a='42' q='29' w='hello'/>";
    auto root = parseDOM(xml).children[0];
    assert(root.type == EntityType.elementEmpty);

    auto attrs = root.attributes;
    assert(attrs.length == 3);

    assert(attrs[0].name == "a");
    assert(attrs[0].value == "42");
    assert(attrs[0].pos == TextPos(1, 7));

    assert(attrs[1].name == "q");
    assert(attrs[1].value == "29");
    assert(attrs[1].pos == TextPos(1, 14));

    assert(attrs[2].name == "w");
    assert(attrs[2].value == "hello");
    assert(attrs[2].pos == TextPos(1, 21));
}
// Because the type of name and value is SliceOfR, == with a string
// only works if the range passed to parseXML was string.
{
    auto xml = filter!"true"("<root a='42' q='29' w='hello'/>");
    auto root = parseDOM(xml).children[0];
    assert(root.type == EntityType.elementEmpty);

    auto attrs = root.attributes;
    assert(attrs.length == 3);

    assert(equal(attrs[0].name, "a"));
    assert(equal(attrs[0].value, "42"));
    assert(attrs[0].pos == TextPos(1, 7));

    assert(equal(attrs[1].name, "q"));
    assert(equal(attrs[1].value, "29"));
    assert(attrs[1].pos == TextPos(1, 14));

    assert(equal(attrs[2].name, "w"));
    assert(equal(attrs[2].value, "hello"));
    assert(attrs[2].pos == TextPos(1, 21));
}
@property SliceOfR text();
Returns the textual value of this DOMEntity.
In the case of EntityType.pi, this is the text that follows the name, whereas in the other cases, the text is the entire contents of the entity (save for the delimeters on the ends if that entity has them).
Supported EntityTypes:
cdata
comment
pi
text
Examples:
import std.range.primitives : empty;

auto xml = "<?xml version='1.0'?>\n" ~
           "<?instructionName?>\n" ~
           "<?foo here is something to say?>\n" ~
           "<root>\n" ~
           "    <![CDATA[ Yay! random text >> << ]]>\n" ~
           "    <!-- some random comment -->\n" ~
           "    <p>something here</p>\n" ~
           "    <p>\n" ~
           "       something else\n" ~
           "       here</p>\n" ~
           "</root>";
auto dom = parseDOM(xml);

// "<?instructionName?>\n" ~
auto pi1 = dom.children[0];
assert(pi1.type == EntityType.pi);
assert(pi1.name == "instructionName");
assert(pi1.text.empty);

// "<?foo here is something to say?>\n" ~
auto pi2 = dom.children[1];
assert(pi2.type == EntityType.pi);
assert(pi2.name == "foo");
assert(pi2.text == "here is something to say");

// "<root>\n" ~
auto root = dom.children[2];
assert(root.type == EntityType.elementStart);

// "    <![CDATA[ Yay! random text >> << ]]>\n" ~
auto cdata = root.children[0];
assert(cdata.type == EntityType.cdata);
assert(cdata.text == " Yay! random text >> << ");

// "    <!-- some random comment -->\n" ~
auto comment = root.children[1];
assert(comment.type == EntityType.comment);
assert(comment.text == " some random comment ");

// "    <p>something here</p>\n" ~
auto p1 = root.children[2];
assert(p1.type == EntityType.elementStart);
assert(p1.name == "p");

assert(p1.children[0].type == EntityType.text);
assert(p1.children[0].text == "something here");

// "    <p>\n" ~
// "       something else\n" ~
// "       here</p>\n" ~
auto p2 = root.children[3];
assert(p2.type == EntityType.elementStart);

assert(p2.children[0].type == EntityType.text);
assert(p2.children[0].text == "\n       something else\n       here");
@property DOMEntity[] children();
Returns the child entities of the current entity.
They are in the same order that they were in the XML document.
Supported EntityTypes:
elementStart
Examples:
auto xml = "<potato>\n" ~
           "    <!--comment-->\n" ~
           "    <foo>bar</foo>\n" ~
           "    <tag>\n" ~
           "        <silly>you</silly>\n" ~
           "        <empty/>\n" ~
           "        <nocontent></nocontent>\n" ~
           "    </tag>\n" ~
           "</potato>\n" ~
           "<!--the end-->";
auto dom = parseDOM(xml);
assert(dom.children.length == 2);

auto potato = dom.children[0];
assert(potato.type == EntityType.elementStart);
assert(potato.name == "potato");
assert(potato.children.length == 3);

auto comment = potato.children[0];
assert(comment.type == EntityType.comment);
assert(comment.text == "comment");

auto foo = potato.children[1];
assert(foo.type == EntityType.elementStart);
assert(foo.name == "foo");
assert(foo.children.length == 1);

assert(foo.children[0].type == EntityType.text);
assert(foo.children[0].text == "bar");

auto tag = potato.children[2];
assert(tag.type == EntityType.elementStart);
assert(tag.name == "tag");
assert(tag.children.length == 3);

auto silly = tag.children[0];
assert(silly.type == EntityType.elementStart);
assert(silly.name == "silly");
assert(silly.children.length == 1);

assert(silly.children[0].type == EntityType.text);
assert(silly.children[0].text == "you");

auto empty = tag.children[1];
assert(empty.type == EntityType.elementEmpty);
assert(empty.name == "empty");

auto nocontent = tag.children[2];
assert(nocontent.type == EntityType.elementStart);
assert(nocontent.name == "nocontent");
assert(nocontent.children.length == 0);

auto endComment = dom.children[1];
assert(endComment.type == EntityType.comment);
assert(endComment.text == "the end");