Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
173 views
in Technique[技术] by (71.8m points)

Regex[Python] Extract from url path parameters

I have an URLs from the access log. Example: /someService/US/getPersonFromAllAccessoriesByDescription/67814/alloy%20nudge%20w

/someService/NZ/asdNmasdf423-asd342e/getDealerFromSomethingSomething/FS443GH/front%20parking%20sen

I cannot make any assumption on the service name or the function name.

I'm trying to find a regex that can only match in the first log:

67814
alloy%20nudge%20w

and in the second:

asdNmasdf423-asd342e
FS443GH
front%20parking%20sen

with some heuristic, I tried to use [a-zA-Z0-9_%-]{15,}|[A-Z0-9]{5,} match only long strings but the function names(getPersonFromAllAccessoriesByDescription, getDealerFromSomethingSomething) also had been caught.

I was thinking about regex that can do the same as [a-zA-Z0-9_%-]{15,} but with condition that it must be at least one digit, so this way the function names will be skipped.

Thank you

question from:https://stackoverflow.com/questions/65944214/regexpython-extract-from-url-path-parameters

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your heuristics is fine, use

(?=[a-zA-Z_%-]*[0-9])[a-zA-Z0-9_%-]{5,}

See proof.

Explanation

--------------------------------------------------------------------------------
                         the boundary between a word char (w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [a-zA-Z_%-]*             any character of: 'a' to 'z', 'A' to
                             'Z', '_', '%', '-' (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    [0-9]                    any character of: '0' to '9'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  [a-zA-Z0-9_%-]{5,}       any character of: 'a' to 'z', 'A' to 'Z',
                           '0' to '9', '_', '%', '-' (at least 5
                           times (matching the most amount possible))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...