Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
511 views
in Technique[技术] by (71.8m points)

c# - Parse huge OData JSON by streaming certain sections of the json to avoid LOH

I have an OData response as JSON (Which is in few MBs) and the requirement is to stream "certain parts of JSON" without even loading them to memory.

For Example: When I'm reading the property "value[0].Body.Content" in the below JSON (which will be in MBs), I want to Stream this value part without de-serializing it into an Object of type string. So basically read the value part into a fixed size byte array and write that byte array to destination stream (repeating the step until that data is finished processing).

JSON:

{
    "@odata.context": "https://localhost:5555/api/v2.0/$metadata#Me/Messages",
    "value": [
        {
            "@odata.id": "https://localhost:5555/api/v2.0/",
            "@odata.etag": "W/"Something"",
            "Id": "vccvJHDSFds43hwy98fh",
            "CreatedDateTime": "2018-12-01T01:47:53Z",
            "LastModifiedDateTime": "2018-12-01T01:47:53Z",
            "ChangeKey": "SDgf43tsdf",
            "WebLink": "https://localhost:5555/?ItemID=dfsgsdfg9876ijhrf",
            "Body": {
                "ContentType": "HTML",
                "Content": "<html>
<body>Huge Data Here
</body>
</html>
"
            },
            "ToRecipients": [{
                    "EmailAddress": {
                        "Name": "ME",
                        "Address": "me@me.com"
                    }
                }
            ],
            "CcRecipients": [],
            "BccRecipients": [],
            "ReplyTo": [],
            "Flag": {
                "FlagStatus": "NotFlagged"
            }
        }
    ],
    "@odata.nextLink": "http://localhost:5555/rest/jersey/sleep?%24filter=LastDeliveredDateTime+ge+2018-12-01+and+LastDeliveredDateTime+lt+2018-12-02&%24top=50&%24skip=50"
}

Approaches Tried:
1. Newtonsoft

I initially tried using Newtonsoft streaming, but it internally converts the data into string and loads into memory. (This is resulting in LOH shooting up and memory not getting released until compaction happens - We've a memory limit for our worker process and cannot keep this in memory)

**code:**

    using (var jsonTextReader = new JsonTextReader(sr))
    {
        var pool = new CustomArrayPool();
        // Checking if pooling will help with memory
        jsonTextReader.ArrayPool = pool;

        while (jsonTextReader.Read())
        {
            if (jsonTextReader.TokenType == JsonToken.PropertyName
                && ((string)jsonTextReader.Value).Equals("value"))
            {
                jsonTextReader.Read();

                if (jsonTextReader.TokenType == JsonToken.StartArray)
                {
                    while (jsonTextReader.Read())
                    {
                        if (jsonTextReader.TokenType == JsonToken.StartObject)
                        {
                            var Current = JToken.Load(jsonTextReader);
                            // By Now, the LOH Shoots up.
                            // Avoid below code of converting this JToken back to byte array.
                            destinationStream.write(Encoding.ASCII.GetBytes(Current.ToString()));
                        }
                        else if (jsonTextReader.TokenType == JsonToken.EndArray)
                        {
                            break;
                        }
                    }
                }
            }

            if (jsonTextReader.TokenType == JsonToken.StartObject)
            {
                var Current = JToken.Load(jsonTextReader);
                // Do some processing with Current
                destinationStream.write(Encoding.ASCII.GetBytes(Current.ToString()));
            }
        }
    }
  1. OData.Net:

    I was thinking if this is doable using OData.Net Library as it looks like it supports streaming of string fields. But couldn't get far, as I end up with creating a Model for the data, which would mean the value would get converted into one string object of MB's.

    Code

    ODataMessageReaderSettings settings = new ODataMessageReaderSettings();
    IODataResponseMessage responseMessage = new InMemoryMessage { Stream = stream };
    responseMessage.SetHeader("Content-Type", "application/json;odata.metadata=minimal;");
    // ODataMessageReader reader = new ODataMessageReader((IODataResponseMessage)message, settings, GetEdmModel());
    ODataMessageReader reader = new ODataMessageReader(responseMessage, settings, new EdmModel());
    var oDataResourceReader = reader.CreateODataResourceReader();
    var property = reader.ReadProperty();
    


Any idea how to parse this JSON in parts using OData.Net/Newtonsoft and stream value of certain fields?
Is the only way to do this, is to manually parse the stream?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you are copying portions of JSON from one stream to another, you can do this more efficiently with JsonWriter.WriteToken(JsonReader) thus avoiding the intermediate Current = JToken.Load(jsonTextReader) and Encoding.ASCII.GetBytes(Current.ToString()) representations and their associated memory overhead:

using (var textWriter = new StreamWriter(destinationStream, new UTF8Encoding(false, true), 1024, true))
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.Indented, CloseOutput = false })
{
    // Use Formatting.Indented or Formatting.None as required.
    jsonWriter.WriteToken(jsonTextReader);
}

However, Json.NET's JsonTextReader does not have the ability to read a single string value in "chunks" in the same way as XmlReader.ReadValueChunk(). It will always fully materialize each atomic string value. If your strings values are so large that they are going on the large object heap, even using JsonWriter.WriteToken() will not prevent these strings from being completely loaded into memory.

As an alternative, you might consider the readers and writers returned by JsonReaderWriterFactory. These readers and writers are used by DataContractJsonSerializer and translate JSON to XML on-the-fly as it is being read and written. Since the base classes for these readers and writers are XmlReader and XmlWriter, they do support reading and writing string values in chunks. Using them appropriately will avoid allocation of strings in the large object heap.

To do this, first define the following extension methods, that copy a selected subset of JSON value(s) from an input stream to an output stream, as specified by a path to the data to be streamed:

public static class JsonExtensions
{
    public static void StreamNested(Stream from, Stream to, string [] path)
    {
        var reversed = path.Reverse().ToArray();

        using (var xr = JsonReaderWriterFactory.CreateJsonReader(from, XmlDictionaryReaderQuotas.Max))
        {
            foreach (var subReader in xr.ReadSubtrees(s => s.Select(n => n.LocalName).SequenceEqual(reversed)))
            {
                using (var xw = JsonReaderWriterFactory.CreateJsonWriter(to, Encoding.UTF8, false))
                {
                    subReader.MoveToContent();

                    xw.WriteStartElement("root");
                    xw.WriteAttributes(subReader, true);

                    subReader.Read();

                    while (!subReader.EOF)
                    {
                        if (subReader.NodeType == XmlNodeType.Element && subReader.Depth == 1)
                            xw.WriteNode(subReader, true);
                        else
                            subReader.Read();
                    }

                    xw.WriteEndElement();
                }
            }
        }
    }
}

public static class XmlReaderExtensions
{
    public static IEnumerable<XmlReader> ReadSubtrees(this XmlReader xmlReader, Predicate<Stack<XName>> filter)
    {
        Stack<XName> names = new Stack<XName>();

        while (xmlReader.Read())
        {
            if (xmlReader.NodeType == XmlNodeType.Element)
            {
                names.Push(XName.Get(xmlReader.LocalName, xmlReader.NamespaceURI));
                if (filter(names))
                {
                    using (var subReader = xmlReader.ReadSubtree())
                    {
                        yield return subReader;
                    }
                }
            }

            if ((xmlReader.NodeType == XmlNodeType.Element && xmlReader.IsEmptyElement)
                || xmlReader.NodeType == XmlNodeType.EndElement)
            {
                names.Pop();
            }
        }
    }
}

Now, the string [] path argument to StreamNested() is not any sort of path. Instead, it is a path corresponding to the hierarchy of XML elements corresponding to the JSON you want to select as translated by the XmlReader returned by JsonReaderWriterFactory.CreateJsonReader(). The mapping used for this translation is, in turn, documented by Microsoft in Mapping Between JSON and XML. To select and stream only those JSON values matching value[*], the XML path required is //root/value/item. Thus, you can select and stream your desired nested objects by doing:

JsonExtensions.StreamNested(inputStream, destinationStream, new[] { "root", "value", "item" });

Notes:

  • Mapping Between JSON and XML is somewhat complex. It's often easier just to load some sample JSON into an XDocument using the following extension method:

    static XDocument ParseJsonAsXDocument(string json)
    {
        using (var xr = JsonReaderWriterFactory.CreateJsonReader(new MemoryStream(Encoding.UTF8.GetBytes(json)), Encoding.UTF8, XmlDictionaryReaderQuotas.Max, null))
        {
            return XDocument.Load(xr);
        }
    }
    

    And then determine the correct XML path observationally.

  • For a related question, see JObject.SelectToken Equivalent in .NET.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...