python - Grouping Messages by Time Intervals

Question

Welcome To Ask or Share your Answers For Others

python - Grouping Messages by Time Intervals

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Grouping Messages by Time Intervals

I'm currently trying to group messages that are sent out by 1 second time intervals. I'm currently calculating time latency with this:

def time_deltas(infile): 
entries = (line.split() for line in open(INFILE, "r")) 
ts = {}
for e in entries: 
    if " ".join(e[2:5]) == "T out: [O]": 
        ts[e[8]] = e[0]    
    elif " ".join(e[2:5]) == "T in: [A]":    
        in_ts, ref_id = e[0], e[7] 
        out_ts = ts.pop(ref_id, None) 
        yield (float(out_ts),ref_id[1:-1],(float(in_ts)*1000 - float(out_ts)*1000))

INFILE = 'C:/Users/klee/Documents/test.txt'
import csv 

with open('test.csv', 'w') as f: 
csv.writer(f).writerows(time_deltas(INFILE))

HOWEVER I want to calculate the number of "T in: [A]" messages per second that are sent out, and have been trying to work with this to do so:

import datetime
import bisect
import collections

data=[ (datetime.datetime(2010, 2, 26, 12, 8, 17), 5594813L), 
  (datetime.datetime(2010, 2, 26, 12, 7, 31), 5594810L), 
  (datetime.datetime(2010, 2, 26, 12, 6, 4) , 5594807L),
]
interval=datetime.timedelta(seconds=50)
start=datetime.datetime(2010, 2, 26, 12, 6, 4)
grid=[start+n*interval for n in range(10)]
bins=collections.defaultdict(list)
for date,num in data:
idx=bisect.bisect(grid,date)
   bins[idx].append(num)
for idx,nums in bins.iteritems():
print('{0} --- {1}'.format(grid[idx],len(nums)))

which can be found here: Python: group results by time intervals

(I realize the units would be off for what I want, but I'm just looking into the general idea...)

I've been mostly unsuccessful thus far and would appreciate any help.

Also, The data appears as:

082438.577652 - T in: [A] accepted. ordID [F25Q6] timestamp [082438.575880] RefNumber [6018786] State [L]

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:33:59+0000

Assuming you want to group your data by those issued within 1 second intervals on the second, we can make use of the fact that your data is ordered and that int(out_ts) truncates the timestamp to the second which we can use as a grouping key.

Simplest way to do the grouping would be to use itertools.groupby:

from itertools import groupby

data = get_time_deltas(INFILE)  
get_key = lambda x: int(x[0])  # function to get group key from data
bins = [(k, list(g)) for k, g in groupby(data, get_key)]

bins will be a list of tuples where the first value in the tuple is the key (integer, e.g. 082438) and the second value is the a list of data entries that were issued on that second (with timestamp = 082438.*).

Example usage:

# print out the number of messages for each second
for sec, data in bins:
    print('{0} --- {1}'.format(sec, len(data)))

# write (sec, msg_per_sec) out to CSV file
import csv
with open("test.csv", "w") as f:
    csv.writer(f).writerows((s, len(d)) for s, d in bins)

# get average message per second
message_counts = [len(d) for s, d in bins]
avg_msg_per_second = float(sum(message_count)) / len(message_count)

P.S. In this example, a list was used for bins so that the order of data is maintained. If you need random access to the data, consider using an OrderedDict instead.

Note that it is relatively straight-forward to adapt the solution to group by multiples of seconds. For example, to group by messages per minute (60 seconds), change the get_key function to:

get_key = lambda x: int(x[0] / 60)  # truncate timestamp to the minute

Categories

python - Grouping Messages by Time Intervals

python - Grouping Messages by Time Intervals

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags