Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
328 views
in Technique[技术] by (71.8m points)

python - Parsing a tweet to extract hashtags into an array

I am having a heck of a time taking the information in a tweet including hashtags, and pulling each hashtag into an array using Python. I am embarrassed to even put what I have been trying thus far.

For example, "I love #stackoverflow because #people are very #helpful!"

This should pull the 3 hashtags into an array.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A simple regex should do the job:

>>> import re
>>> s = "I love #stackoverflow because #people are very #helpful!"
>>> re.findall(r"#(w+)", s)
['stackoverflow', 'people', 'helpful']

Note though, that as suggested in other answers, this may also find non-hashtags, such as a hash location in a URL:

>>> re.findall(r"#(w+)", "http://example.org/#comments")
['comments']

So another simple solution would be the following (removes duplicates as a bonus):

>>> def extract_hash_tags(s):
...    return set(part[1:] for part in s.split() if part.startswith('#'))
...
>>> extract_hash_tags("#test http://example.org/#comments #test")
set(['test'])

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...