Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
417 views
in Technique[技术] by (71.8m points)

clickhouse - How to match IP addresses to a Subnets and get SUM

Can you help me with this SELECT in Clickhouse ?

I want to measure the traffic accounting from some networks. I am using clickhouse db with 2 tables:

select * from network_account_db  

┌─network───────----─┬─source─┬─category─┐
│ 192.168.200.0/29   │ server │ general  │
│ 192.168.200.11/30  │ server │ general  │
│ 192.168.200.22/32  │ server │ general  │
└───────────────----─┴────────┴──────────┘

 select  packetDate,packetDateTime,sampleRatio,srcIp,dstIp,length from traffic  
 
┌─packetDate─┬──────packetDateTime─┬─sampleRatio─┬─────srcIp─┬──────dstIp─┬─length─┐
│ 2021-02-04 │ 2021-02-04 22:15:20 │           1 │ 232998210 │  767413237 │   1280 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │  918211986 │     40 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 1150185814 │  30088 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 1168387235 │     52 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 1169107244 │    104 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 1169107244 │     52 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 1224157376 │    617 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 1476066034 │   1425 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 1600411769 │   4656 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 1743465996 │     52 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 1746016762 │    108 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 1746284673 │    901 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 3194642526 │   1976 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 2315259109 │   2403 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 2540034693 │     52 │
│ 2021-02-04 │ 2021-02-04 22:15:19 │           1 │ 767413237 │ 2540034693 │     52 │

I want to measure the traffic something like

select sum(length * sampleRatio ) AS total, category  
from ( select network as net from network_account_db where source='server'   ) 
where  srcIp=IPv4StringToNum(net)

I need to get, for example:

category=general
total=242422

Can you help me to get the correct SELECT. I have a table with network and another with IP.

UPDATE:

Hi. I need to add another fields into the select like:

SELECT dictGet('network_account_db.reputation_rbl_db', 'category', tuple(srcIp)) AS category, dictGet('network_account_db.reputation_rbl_db', 'source', tuple(srcIp)) AS source,   sum(length * sampleRatio) AS total_bytes,   sum(numberOfPackets * sampleRatio) AS total_pps, IPv4StringToNum(srcIp), IPv4StringToNum(dstIp)   FROM traffic  WHERE dictHas('network_account_db.reputation_rbl_db', tuple(srcIp))  GROUP BY category   order by packetDateTime desc```
but appear an error like:

Column srcIp is not under aggregate function and not in GROUP BY:```

If i add it to GROUP BY i lost some rows in the response.

Is there any way to add this fields with UNION or somethink like this ?

question from:https://stackoverflow.com/questions/66054802/how-to-match-ip-addresses-to-a-subnets-and-get-sum

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try this direct way:

SELECT
    category,
    sum(length * sampleRatio) AS total
FROM
(
    WITH
        (
            SELECT groupArray((category, ip_num_from, ip_num_to))
            FROM
            (
                /* get the lower range and the higher range of the subnet */
                SELECT
                    category,
                    splitByChar('/', network) AS ip_parts,
                    IPv4CIDRToRange(IPv4StringToNum(ip_parts[1]), toUInt8(ip_parts[2])) AS ip_ranges,
                    toUInt32(ip_ranges.1) AS ip_num_from,
                    toUInt32(ip_ranges.2) AS ip_num_to
                FROM network_account_db
            )
        ) AS networks
    SELECT
        /* find the first subnet which covers IP  */
        arrayFirst(n -> srcIp BETWEEN n.2 AND n.3, networks) AS network,
        network.1 AS category,
        length,
        sampleRatio
    FROM traffic
    /* exclude orphans IPs */
    WHERE category != ''
)
GROUP BY category

Consider using ip-trie-dictionary to ease the searching subnet which covers IP:

CREATE DICTIONARY networks_dict (
  network String,
  source String,
  category String
)
PRIMARY KEY network
SOURCE(CLICKHOUSE(host 'localhost' port 9000 db 'test' table 'network_account_db' user 'default'))
LAYOUT(IP_TRIE())
LIFETIME(3600);
SELECT
    dictGet('test.networks_dict', 'category', tuple(srcIp)) AS category,
    sum(length * sampleRatio) AS total
FROM traffic
WHERE dictHas('test.networks_dict', tuple(srcIp))
GROUP BY category

These ideas were borrowed here Add function to check if an IPv4/6 is in a list of subnets #6808.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...