Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

regex - Python Repeated Capture Groups

I'm attempting to parse a series of SHOW CDP NEIGHBORS DETAIL outputs so I can capture each device and its ip address.

The issue that I am coming across is that some devices may have multiple ip addresses assigned to it, here is an example output.

Device ID: RTPER1.MFN21Mb.domain.local
Entry address(es): 
  IP address: 200.152.51.3
  IP address: 82.159.177.233
  IP address: 201.152.51.140
  IP address: 84.252.100.3
Platform: Cisco 2821,  Capabilities: Router Switch IGMP 

I wrote this regex to capture the input, and according to gskinner it matches all 4 ip addresses, but the capture is just the last one (as expected from regex)

Device ID: ([0-9A-Za-z-.&]+)s+Entry address(es):s+(?:IP address: ([0-9.]+)s+)+

So I went online to figure out how to do this. I tried teh regex suggested here Capturing repeating subpatterns in Python regex but using the regex module did not change the output. I still only get the last ip address on the list, and none of the others.

Following the example I get

temp = regex.match(r'Device ID: ([0-9A-Za-z-.&]+)s+Entry address(es):s+(?:IP address: ([0-9.]+)s+)+', file)
print temp

Temp returns None.

If I do findall. I get a return of just the last ip address 84.252.100.3

If I add multiple capture groups, such as

temp = regex.findall(r'Device ID: ([0-9A-Za-z-.&]+)s+Entry address(es):s+(?:IP address: ([0-9.]+)s+)?s+(?:IP address: ([0-9.]+)s+)?s+(?:IP address: ([0-9.]+)s+)?s+(?:IP address: ([0-9.]+)s+)?s+(?:IP address: ([0-9.]+)s+)?', file)
print temp

It only matches the ones that have mutliple ip addresses, and not the others

Hopefully someone can help.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As far as I'm aware, only .NET allows you to iterate through quantified (repeated) capturing groups. Consider this (finite) alternative:

Device ID: ([0-9A-Za-z-.&]+)s+Entry address(es):s+(?:IP address: ([0-9.]+)s+)(?:IP address: ([0-9.]+)s+)?(?:IP address: ([0-9.]+)s+)?(?:IP address: ([0-9.]+)s+)?
                                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This will capture up 1 IP address in $2 and up to three more in $3, $4, and $5. (I'm using the $ notation idiomatically, of course.) You can add as many as you want. If you need all of the IP addresses to be present in a single group, i.e. $2, then your only choice is to include the text with them:

Device ID: ([0-9A-Za-z-.&]+)s+Entry address(es):s+((?:IP address: (?:[0-9.]+)s+)+)
                                                      ^                ^^             ^

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...