Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
717 views
in Technique[技术] by (71.8m points)

parsing - Parse URL in shell script

I have url like:

sftp://user@host.net/some/random/path

I want to extract user, host and path from this string. Any part can be random length.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

[EDIT 2019] This answer is not meant to be a catch-all, works for everything solution it was intended to provide a simple alternative to the python based version and it ended up having more features than the original.


It answered the basic question in a bash-only way and then was modified multiple times by myself to include a hand full of demands by commenters. I think at this point however adding even more complexity would make it unmaintainable. I know not all things are straight forward (checking for a valid port for example requires comparing hostport and host) but I would rather not add even more complexity.


[Original answer]

Assuming your URL is passed as first parameter to the script:

#!/bin/bash

# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^(.*://).*,1,g')"
# remove the protocol
url="$(echo ${1/$proto/})"
# extract the user (if any)
user="$(echo $url | grep @ | cut -d@ -f1)"
# extract the host and port
hostport="$(echo ${url/$user@/} | cut -d/ -f1)"
# by request host without port    
host="$(echo $hostport | sed -e 's,:.*,,g')"
# by request - try to extract the port
port="$(echo $hostport | sed -e 's,^.*:,:,g' -e 's,.*:([0-9]*).*,1,g' -e 's,[^0-9],,g')"
# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"

echo "url: $url"
echo "  proto: $proto"
echo "  user: $user"
echo "  host: $host"
echo "  port: $port"
echo "  path: $path"

I must admit this is not the cleanest solution but it doesn't rely on another scripting language like perl or python. (Providing a solution using one of them would produce cleaner results ;) )

Using your example the results are:

url: user@host.net/some/random/path
  proto: sftp://
  user: user
  host: host.net
  port:
  path: some/random/path

This will also work for URLs without a protocol/username or path. In this case the respective variable will contain an empty string.

[EDIT]
If your bash version won't cope with the substitutions (${1/$proto/}) try this:

#!/bin/bash

# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^(.*://).*,1,g')"

# remove the protocol -- updated
url=$(echo $1 | sed -e s,$proto,,g)

# extract the user (if any)
user="$(echo $url | grep @ | cut -d@ -f1)"

# extract the host and port -- updated
hostport=$(echo $url | sed -e s,$user@,,g | cut -d/ -f1)

# by request host without port
host="$(echo $hostport | sed -e 's,:.*,,g')"
# by request - try to extract the port
port="$(echo $hostport | sed -e 's,^.*:,:,g' -e 's,.*:([0-9]*).*,1,g' -e 's,[^0-9],,g')"

# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...