Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
962 views
in Technique[技术] by (71.8m points)

c# - MongoDB .NET Driver Group By Time Range

I am a noob in MongoDB and wanted to know how to count total documents inserted into the collection for every 15 minutes interval starting 12 AM UTC until the current UTC time.

Below is a sample document

{
    "_id" : ObjectId("5ade8bfc6b941c7726a54f01"),
    "Country" : "US"
    "Timestamp" : ISODate("2018-04-24T01:44:28.040Z"),
}

Here is the expected output:

{
    "Count": 245,
    "ReceiveDateString": "5/2/2018 12:00:00 AM"
},
{
    "Count": 239,
    "ReceiveDateString": "5/2/2018 12:15:00 AM"
},
{
    "Count": 252,
    "ReceiveDateString": "5/2/2018 12:30:00 AM"
},
{
    "Count": 255,
    "ReceiveDateString": "5/2/2018 12:45:00 AM"
},
{
    "Count": 242,
    "ReceiveDateString": "5/2/2018 1:00:00 AM"
}
.
.
.

and so on until current UTC time.

I can group by minutes as follows:

var filter = Builders<Model>.Filter.Where(r => r.Timestamp > startDate && r.Timestamp < endDate);
var result = Collection.Aggregate()
           .Match(filter)
           .Group(
               r => r.Timestamp.Minute,
               g => new
               {
                   ReceiveDate = g.Select(x => x.Timestamp).First(),
                   Count = g.Count(),
               }
           ).ToEnumerable();

However, I am not able to figure out how to translate the solution provided in Group result by 15 minutes time interval in MongoDb to the MongoDB C# driver query.

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you are looking the the "exact thing" as the referenced post to do with .NET, then it probably isn't actually going to get implemented like that. You can do that, but you're probably not going to go to all the bother and actually go for one of the other alternatives, unless you have a need for "flexible intervals" to the extent that I do..

Fluent Aggregate

If you have a modern MongoDB 3.6 or greater server available then you can use $dateFromParts in order to reconstruct the date from the "rounded" parts extracted from the date:

DateTime startDate = new DateTime(2018, 5, 1, 0, 0, 0, DateTimeKind.Utc);
DateTime endDate = new DateTime(2018, 6, 1, 0, 0, 0, DateTimeKind.Utc);

var result = Collection.Aggregate()
  .Match(k => k.Timestamp >= startDate && k.Timestamp < endDate)
  .Group(k =>
    new DateTime(k.Timestamp.Year, k.Timestamp.Month, k.Timestamp.Day,
        k.Timestamp.Hour, k.Timestamp.Minute - (k.Timestamp.Minute % 15), 0),
    g => new { _id = g.Key, count = g.Count() }
  )
  .SortBy(d => d._id)
  .ToList();

Statement sent to server:

[
  { "$match" : {
    "Timestamp" : {
      "$gte" : ISODate("2018-05-01T00:00:00Z"),
      "$lt" : ISODate("2018-06-01T00:00:00Z")
    }
  } },
  { "$group" : {
    "_id" : { 
      "$dateFromParts" : {
        "year" : { "$year" : "$Timestamp" },
        "month" : { "$month" : "$Timestamp" },
        "day" : { "$dayOfMonth" : "$Timestamp" },
        "hour" : { "$hour" : "$Timestamp" },
        "minute" : { "$subtract" : [
          { "$minute" : "$Timestamp" },
          { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] }
        ] },
        "second" : 0
      }
    },
    "count" : { "$sum" : 1 }
  } },
  { "$sort": { "_id": 1 } }
]

If you don't have that feature available, then you can simply leave that off and leave the date "disassembled", but then assemble it again as you process the cursor. Just to simulate with a list:

var result = Collection.Aggregate()
 .Match(k => k.Timestamp >= startDate && k.Timestamp < endDate)
 .Group(k => new
    {
      year = k.Timestamp.Year,
      month = k.Timestamp.Month,
      day = k.Timestamp.Day,
      hour = k.Timestamp.Hour,
      minute = k.Timestamp.Minute - (k.Timestamp.Minute % 15)
    },
    g => new { _id = g.Key, count = g.Count() }
  )
  .SortBy(d => d._id)
  .ToList();

foreach (var doc in result)
{
  //System.Console.WriteLine(doc.ToBsonDocument());
  System.Console.WriteLine(
    new BsonDocument {
      { "_id", new DateTime(doc._id.year, doc._id.month, doc._id.day,
        doc._id.hour, doc._id.minute, 0) },
      { "count", doc.count }
    }
  );
}

Statement sent to the server:

[
  { "$match" : {
    "Timestamp" : {
      "$gte" : ISODate("2018-05-01T00:00:00Z"),
      "$lt" : ISODate("2018-06-01T00:00:00Z")
    }
  } },
  { "$group" : {
    "_id" : {
      "year" : { "$year" : "$Timestamp" },
      "month" : { "$month" : "$Timestamp" },
      "day" : { "$dayOfMonth" : "$Timestamp" },
      "hour" : { "$hour" : "$Timestamp" },
      "minute" : { "$subtract" : [
        { "$minute" : "$Timestamp" }, 
        { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] }
      ] }
    },
    "count" : { "$sum" : 1 }
  } },
  { "$sort" : { "_id" : 1 } }
]

There's very little difference between the two in terms of the code. It's just that in one case the "casting back" to DateTime actually happens on the server with the $dateFromParts and in the other we just do that exact same casting using the DateTime constructor in code as you iterate each cursor result.

So they really are almost the same with the only real difference being where the "server" does the casting the date returned uses a lot less bytes per document. In fact "5 times" less since all numeric formats here ( including the BSON Date ) are based on 64 bit integers. Even so, all of those numbers are still actually "lighter" than sending back any "string" representation of a date.

LINQ Queryable

Those are the basic forms which really stay the same when mapping onto these different forms:

var query = from p in Collection.AsQueryable()
            where p.Timestamp >= startDate && p.Timestamp < endDate
            group p by new DateTime(p.Timestamp.Year, p.Timestamp.Month, p.Timestamp.Day,
              p.Timestamp.Hour, p.Timestamp.Minute - (p.Timestamp.Minute % 15), 0) into g
            orderby g.Key
            select new { _id = g.Key, count = g.Count() };

Statement sent to the server:

[
  { "$match" : {
    "Timestamp" : {
      "$gte" : ISODate("2018-05-01T00:00:00Z"),
      "$lt" : ISODate("2018-06-01T00:00:00Z")
    }
  } },
  { "$group" : {
    "_id" : {
      "$dateFromParts" : {
        "year" : { "$year" : "$Timestamp" }, 
        "month" : { "$month" : "$Timestamp" },
        "day" : { "$dayOfMonth" : "$Timestamp" }, 
        "hour" : { "$hour" : "$Timestamp" }, 
        "minute" : { "$subtract" : [
          { "$minute" : "$Timestamp" },
          { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] }
        ] },
        "second" : 0
      }
    },
    "__agg0" : { "$sum" : 1 }
  } },
  { "$sort" : { "_id" : 1 } },
  { "$project" : { "_id" : "$_id", "count" : "$__agg0" } }
]

Or using GroupBy()

var query = Collection.AsQueryable()
    .Where(k => k.Timestamp >= startDate && k.Timestamp < endDate)
    .GroupBy(k =>
      new DateTime(k.Timestamp.Year, k.Timestamp.Month, k.Timestamp.Day,
            k.Timestamp.Hour, k.Timestamp.Minute - (k.Timestamp.Minute % 15), 0),
      (k, s) => new { _id = k, count = s.Count() }
    )
    .OrderBy(k => k._id);

Statement sent to the server:

[
  { "$match" : {
    "Timestamp" : {
      "$gte" : ISODate("2018-05-01T00:00:00Z"),
      "$lt" : ISODate("2018-06-01T00:00:00Z")
    }
  } },
  { "$group" : {
    "_id" : {
      "$dateFromParts" : {
        "year" : { "$year" : "$Timestamp" },
        "month" : { "$month" : "$Timestamp" },
        "day" : { "$dayOfMonth" : "$Timestamp" },
        "hour" : { "$hour" : "$Timestamp" },
        "minute" : { "$subtract" : [ 
          { "$minute" : "$Timestamp" }, 
          { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] } 
        ] },
        "second" : 0
      }
    },
    "count" : { "$sum" : 1 }
  } },
  { "$sort" : { "_id" : 1 } }
]

As you can see it's all basically the same form


Converting the Original

If you are looking to replicate the original "date math" form as posted, then it currently falls beyond the scope of what you can actually do with either LINQ or the Fluent builders. The only way to get that same sequence is with BsonDocument construction:

DateTime epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);

var group = new BsonDocument { {
  "$group",
  new BsonDocument {
    { "_id",
    new BsonDocument { {
      "$add", new BsonArray
      {
        new BsonDocument { {
            "$subtract",
            new BsonArray {
              new BsonDocument { { "$subtract", new BsonArray { "$Timestamp", epoch } } },
              new BsonDocument { {
                "$mod", new BsonArray
                {
                 new BsonDocument { { "$subtract", new BsonArray { "$Timestamp", epoch } } },
                 1000 * 60 * 15
               }
             } }
           }
         } },
         epoch
       }
     } }
     },
     {
       "count", new BsonDocument("$sum", 1)
     }
   }
} };

var query = sales.Aggregate()
  .Match(k => k.Timestamp >= startDate && k.Timestamp < endDate)
  .AppendStage<BsonDocument>(group)
  .Sort(new BsonDocument("_id", 1))
  .ToList();

Request sent to server:

[
  { "$match" : {
    "Timestamp" : {
      "$gte" : ISODate("2018-05-01T00:00:00Z"),
      "$lt" : ISODate("2018-06-01T00:00:00Z")
    }
  } },
  { "$group" : {
    "_id" : { 
      "$add" : [
        { "$subtract" : [ 
          { "$subtract" : [ "$Timestamp", ISODate("1970-01-01T00:00:00Z") ] },
          { "$mod" : [ 
            { "$subtract" : [ "$Timestamp", ISODate("1970-01-01T00:00:00Z") ] },
            900000
          ] }
        ] },
        ISODate("1970-01-01T00:00:00Z")
      ]
    },
    "count" : { "$sum" : 1 }
  } },
  { "$sort" : { "_id" : 1 } }
]

The big reason we cannot do this right now is because the current serialize of the statements basically disagrees of the point that the .NET Framework says that subtracting two DateTime values return a TimeSpan, and the MongoDB construct of subtracting two BSON Dates returns the "milliseconds since epoch", which is essentially how the math works.

The "literal" translation of the lamdba expression is essentially:

p =>  epoch.AddMilliseconds(
       (p.Timestamp - epoch).TotalMilliseconds
       - ((p.Timestamp - epoch).TotalMilliseconds % 1000 * 60 * 15))

But the mapping still needs some work in order to either recognize the statements or formalize on which sort of statements are actually intended for this purpose.

Notably MongoDB 4.0 introduces the <a href="h


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...