Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.8k views
in Technique[技术] by (71.8m points)

nlp - Identifying synonymous rows of a text column in a dataframe using R

Suppose ABC is a dataframe as given below:

ABC <- data.frame(Column1 = c(1.222, 3.445, 5.621, 8.501, 9.302), 
                  Column2 = c(654231, 12347, -2365, 90000, 12897), 
                  Column3 = c('A1', 'B2', 'E3', 'C1', 'F5'), 
                  Column4 = c('I bought it', 'The flower has a beautiful fragrance', 'It was bought by me', 'I have bought it', 'The flower smells good'), 
                  Column5 = c('Good', 'Bad', 'Ok', 'Moderate', 'Perfect'))

My intention is to find synonymous strings in Column4. In this case, I bought it, It was bought by me and I have bought it are synonymous or similar strings and The flower has a beautiful fragrance and The flower smells good convey similar meaning.

I tried the approach of IVR in the following thread and got stuck: Find similar texts based on paraphrase detection

When I run the HLS.Extract code chunk, I get the following error message:

Error in strsplit(PlainTextDocument(synonyms(word)), ",") : non-character Argument

Using as.character didn't resolve the problem either:

Syns = function(word){  
    word <- as.character(word) ###
    wl    =   gsub("(.*[[:space:]].*)","",      
                   gsub("^c\(|[[:punct:]]+|^[[:space:]]+|[[:space:]]+$","",  
                        unlist(strsplit(PlainTextDocument(synonyms(word)),","))))
    wl = wl[wl!=""] 
    return(wl)     
  }  
  1. What is going wrong?

  2. Is there a better way to code it using R, and create a new column additionally which has, say a number 1 as an entry for the first synonymous strings and 2 as the entry for the next set of synonymous strings?

  3. Does it work with German text?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...