r - 根据R中之前的特定单词和之后的％符号提取字符串或值(Extract a string or value based on specific word before and a % sign after in R)

Question

Welcome To Ask or Share your Answers For Others

r - 根据R中之前的特定单词和之后的％符号提取字符串或值(Extract a string or value based on specific word before and a % sign after in R)

posted Mar 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - 根据R中之前的特定单词和之后的％符号提取字符串或值(Extract a string or value based on specific word before and a % sign after in R)

I have a Text column with thousands of rows of paragraphs, and I want to extract the values of " Capacity > x% ".

(我有一个包含数千行段落的Text列，我想提取“ Capacity > x% ”的值。)

The operation sign can be >,<,=, ~... I basically need the operation sign and integer value (eg <40%) and place it in a column next to the it, same row.

(操作符号可以是>,<,=, ~...我基本上需要操作符号和整数值（例如<40％），并将其放在它旁边的同一行中。)

I have tried, removing before/after text, gsub, grep , grepl, string_extract , etc. None with good results.

(我已经尝试过，删除文本， gsub, grep ， grepl, string_extract等之前/之后。无，效果很好。)

I am not sure if the percentage sign is throwing it or I am just not getting the code structure.

(我不确定百分号是否在抛出它，或者我只是没有得到代码结构。)

Appreciate your assistance please.

(请感谢您的协助。)

Here are some codes I have tried (aa is the df, TEXT is col name):

(这是我尝试过的一些代码（aa是df，TEXT是col名称）：)

str_extract(string =aa$TEXT, pattern = perl("(?<=LVEF).*(?=%)"))

gsub(".*[Capacity]([^.]+)[%].*", "\1", aa$TEXT)

genXtract(aa$TEXT, "Capacity", "%")

gsub("%.*$", "%", aa$TEXT)

grep("^Capacity.*%$",aa$TEXT)

ask by Shawn translate from so

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-03-06T04:27:50+0000

Since you did not provide a reproducible example, I created one myself and used it here.

(由于您没有提供可复制的示例，因此我自己创建了一个示例，并在此处使用了它。)

We can use sub to extract everything after "Capacity" until a number and % sign.

(我们可以使用sub提取"Capacity"之后的所有内容，直到数字和%符号为止。)

sub(".*Capacity(.*\d+%).*", "\1", aa$TEXT)
#[1] " > 10%"  " < 40%"  " ~ 230%"

Or with str_extract

(或与str_extract)

stringr::str_extract(aa$TEXT, "(?<=Capacity).*\d+%")

data

(数据)

aa <- data.frame(TEXT = c("This is a temp text, Capacity > 10%", 
                    "This is a temp text, Capacity < 40%", 
                    "Capacity ~ 230% more text  ahead"), stringsAsFactors = FALSE)

Categories

r - 根据R中之前的特定单词和之后的％符号提取字符串或值(Extract a string or value based on specific word before and a % sign after in R)

r - 根据R中之前的特定单词和之后的％符号提取字符串或值(Extract a string or value based on specific word before and a % sign after in R)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags