Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
642 views
in Technique[技术] by (71.8m points)

regex - how to extract the first number from each string in a vector in R?

I am new to regex in R. Here I have a vector where I am interested in extracting the first occurance of a number in each string of the vector .

I have a vector called "shootsummary" which looks like this.

> head(shootsummary)
[1] Aaron Alexis, 34, a military veteran and contractor from Texas, opened fire in the Navy installation, killing 12 people and wounding 8 before being shot dead by police.                                         
[2] Pedro Vargas, 42, set fire to his apartment, killed six people in the complex, and held another two hostages at gunpoint before a SWAT team stormed the building and fatally shot him.                           
[3] John Zawahri, 23, armed with a homemade assault rifle and high-capacity magazines, killed his brother and father at home and then headed to Santa Monica College, where he was eventually killed by police.      
[4] Dennis Clark III, 27, shot and killed his girlfriend in their shared apartment, and then shot two witnesses in the building's parking lot and a third victim in another apartment, before being killed by police.
[5] Kurt Myers, 64, shot six people in neighboring towns, killing two in a barbershop and two at a car care business, before being killed by officers in a shootout after a nearly 19-hour standoff.  

The first occurance of a number in each string denotes 'age' of the individual and I am interested in extracting ages from these strings without mixing them with other numbers in the lines listed .

I used:

as.numeric(gsub("\D", "", shootsummary))

It resulted in :

[1]  34128     42     23     27   6419  

I am looking for a result that looks like this with just the ages extracted from the sentence without extracting other numbers that occur after the age.

[1]  34     42     23     27   64
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

stringi would be faster

library(stringi)
stri_extract_first(shootsummary, regex="\d+")
#[1] "34" "42" "23" "27" "64"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...