Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
534 views
in Technique[技术] by (71.8m points)

regex - Extract all numbers from a single string in R

Let's imagine you have a string:

strLine <- "The transactions (on your account) were as follows: 0 3,000 (500) 0 2.25 (1,200)"

Is there a function that strips out the numbers into an array/vector producing the following required solution:

result <- c(0, 3000, -500, 0, 2.25, -1200)?

i.e.

result[3] = -500

Notice, the numbers are presented in accounting form so negative numbers appear between (). Also, you can assume that only numbers appear to the right of the first occurance of a number. I am not that good with regexp so would appreciate it if you could help if this would be required. Also, I don't want to assume the string is always the same so I am looking to strip out all words (and any special characters) before the location of the first number.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
library(stringr)
x <- str_extract_all(strLine,"\(?[0-9,.]+\)?")[[1]]
> x
[1] "0"       "3,000"   "(500)"   "0"       "2.25"    "(1,200)"

Change the parens to negatives:

x <- gsub("\((.+)\)","-\1",x)
x
[1] "0"      "3,000"  "-500"   "0"      "2.25"   "-1,200"

And then as.numeric() or taRifx::destring to finish up (the next version of destring will support negatives by default so the keep option won't be necessary):

library(taRifx)
destring( x, keep="0-9.-")
[1]    0 3000  -500    0    2.25 -1200

OR:

as.numeric(gsub(",","",x))
[1]     0  3000  -500     0     2.25 -1200

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...