Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
247 views
in Technique[技术] by (71.8m points)

python - Columns and rows concatenation with a commun value in another column

In the below mentioned table, I want to concatenate the columns Tri_gram_sents and Value together and then all rows which has the same number in column sentence.

   Tri_gram_sents                   Value          sentence
  (('<s>', '<s>'), 'ABC')          0.161681         1
  (('<s>', 'ABC'), 'ABC')          0.472973         1
  (('ABC', 'ABC'), 'ABC')          0.305732         1
  (('ABC', 'ABC'), 'ABC')          0.005655         1
  (('ABC', 'ABC'), '</s>')         0.434783         1
  (('ABC', '</s>'), '</s>')        0.008547         1
  (('<s>', '<s>'), 'DEF')          0.111111         2
  (('<s>', 'DEF'), 'DEF')          0.039474         2
  (('DEF', 'DEF'), 'DEF')          0.207317         2
  (('DEF', 'DEF'), 'DEF')          0.074803         2
  (('DEF', 'DEF'), '</s>')         0.037940         2
  (('DEF', '</s>'), '</s>')        0.033163         2
  (('<s>', '<s>'), 'GHI')          0.250000         3
  (('<s>', 'GHI'), 'GHI')          0.103316         3
  (('GHI', 'GHI'), 'GHI')          0.024155         3
  (('GHI', 'GHI'), '</s>')         0.028302         3
  (('GHI', '</s>'), '</s>')        0.117647         3    `

For above set of rows, I will get a total of 3 rows in another table and my expected output looks:

(('<s>', '<s>'), 'ABC') 0.161681 (('<s>', 'ABC'), 'ABC') 0.472973 (('ABC', 'ABC'), 'ABC') 0.305732 (('ABC', 'ABC'), 'ABC') 0.005655 (('ABC', 'ABC'), '</s>') 0.434783 (('ABC', '</s>'), '</s>') 0.008547
(('<s>', '<s>'), 'DEF') 0.111111 (('<s>', 'DEF'), 'DEF') 0.039474 (('DEF', 'DEF'), 'DEF') 0.207317 (('DEF', 'DEF'), 'DEF') 0.074803 (('DEF', 'DEF'), '</s>') 0.037940 (('DEF', '</s>'), '</s>') 0.033163
(('<s>', '<s>'), 'GHI') 0.250000 (('<s>', 'GHI'), 'GHI') 0.103316 (('GHI', 'GHI'), 'GHI') 0.024155 (('GHI', 'GHI'), '</s>') 0.028302 (('GHI', '</s>'), '</s>') 0.117647
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use groupby and join to create the expected output. One way is to create a column to_join from the columns Tri_gram_sents and Value, and then agg this column:

df['to_join'] = df['Tri_gram_sents'] + ' ' + df['Value'].astype(str)
ser_output = df.groupby('sentence')['to_join'].agg(' '.join)

Or you can do everything in one line without create the column with apply:

ser_output = (df.groupby('sentence').apply(
             lambda df_g: ' '.join(df_g['Tri_gram_sents']+' '+df_g['Value'].astype(str))))

and you get ser_output:

sentence
1    (('<s>', '<s>'), 'ABC') 0.161681 (('<s>', 'ABC...
2    (('<s>', '<s>'), 'DEF') 0.111111 (('<s>', 'DEF...
...

where the first element looks as expected:

"(('<s>', '<s>'), 'ABC') 0.161681 (('<s>', 'ABC'), 'ABC') 0.472973 (('ABC', 'ABC'), 'ABC') 0.305732 (('ABC', 'ABC'), 'ABC') 0.005655 (('ABC', 'ABC'), '</s>') 0.434783 (('ABC', '</s>'), '</s>') 0.008547"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...