You can subclass XmlReader
to "filter" out undesired elements, then use XmlDocument.Load()
with your reader instead of letting it create its own.
Note that this will exclude only the value of the offending tags: If you put a breakpoint in your Read() loop, you'll find that <foo>bar</foo>
comes in three pieces: <foo>
has NodeType Element with no value, "bar" has NodeType Text, with an empty LocalName, and </foo>
is NodeType EndElement with no value. If "bar" were over the limit length, the "filter" below would turn <foo>bar</foo>
into <foo></foo>
To exclude all of <foo>bar</foo>
based on the length of "bar", you'd have to look ahead. Doable, but maybe not worth your time. Hopefully that's not a requirement here.
An alternative (or addition) to this class might be a version of this with a Func<string, string>
that every Value
is passed through: s => (s.Length > MAX_LEN) ? "" : s
.
Also, for all I know, XmlTextReaderImpl
(the actual type of _reader
) may cache the whole text and kill your performance anyway. You may have to write your own guts for the thing as well.
public class FilteredXmlReader : XmlReader
{
public Func<XmlReader, bool> Filter;
private XmlReader _reader;
private FilteredXmlReader(TextReader input, Func<XmlReader, bool> filterProc)
{
Filter = filterProc;
_reader = XmlReader.Create(input);
}
public static new XmlReader Create(TextReader input, Func<XmlReader, bool> filterProc)
{
return new FilteredXmlReader(input, filterProc);
}
public override bool Read()
{
var b = _reader.Read();
while (!(bool)Filter?.Invoke(_reader))
{
b = _reader.Read();
}
return b;
}
#region Wrapper Boilerplate
public override XmlNodeType NodeType => _reader.NodeType;
public override string LocalName => _reader.LocalName;
public override string NamespaceURI => _reader.NamespaceURI;
public override string Prefix => _reader.Prefix;
public override string Value => _reader.Value;
public override int Depth => _reader.Depth;
public override string BaseURI => _reader.BaseURI;
public override bool IsEmptyElement => _reader.IsEmptyElement;
public override int AttributeCount => _reader.AttributeCount;
public override bool EOF => _reader.EOF;
public override ReadState ReadState => _reader.ReadState;
public override XmlNameTable NameTable => _reader.NameTable;
public override string GetAttribute(string name) => _reader.GetAttribute(name);
public override string GetAttribute(string name, string namespaceURI) => _reader.GetAttribute(name, namespaceURI);
public override string GetAttribute(int i) => _reader.GetAttribute(i);
public override string LookupNamespace(string prefix) => _reader.LookupNamespace(prefix);
public override bool MoveToAttribute(string name) => _reader.MoveToAttribute(name);
public override bool MoveToAttribute(string name, string ns) => _reader.MoveToAttribute(name, ns);
public override bool MoveToElement() => _reader.MoveToElement();
public override bool MoveToFirstAttribute() => _reader.MoveToFirstAttribute();
public override bool MoveToNextAttribute() => _reader.MoveToNextAttribute();
public override bool ReadAttributeValue() => _reader.ReadAttributeValue();
public override void ResolveEntity() => _reader.ResolveEntity();
#endregion Wrapper Boilerplate
}
Usage:
var xml = "<test />";
XmlDocument doc = new XmlDocument();
XmlReader rdr = FilteredXmlReader.Create(new System.IO.StringReader(xml),
r => r?.Value.Length < 20);
var filteredXML = doc.OuterXml;
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…