Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
356 views
in Technique[技术] by (71.8m points)

data.table - Best possible way to add the likely event and its probability from a cross table in R

Using the mtcars dataset, I have created a cross table as follows -

tab = with(mtcars, ftable(gear, cyl))
tab

Here is how it looks -

     cyl  4  6  8
gear             
3         1  2 12
4         8  4  0
5         2  1  2

For this crosstable, I have calculated the row-wise probability

tab_prob = tab %>% prop.table(1) %>% round(4) * 100
tab_prob
     cyl     4     6     8
gear                      
3         6.67 13.33 80.00
4        66.67 33.33  0.00
5        40.00 20.00 40.00

I want to add two columns to the original mtcars dataset

  1. Column 1 cyl_exp - Fill in the expected outcome based on cross-table. For example, in mtcars dataset, if the number of gears is 3, this new column (refer to the tab cross table) should have the value 8, since there is 80% probability that if the number of gears is 3, then cyl should be 8.
  2. Column 2 cyl_prob - Write the probability from table tab_prob in this column based on the value in cyl_exp column.

Here is the expected outcome -

head(mtcars)
    mpg cyl disp  hp drat    wt  qsec vs am gear carb cyl_prob cyl_exp
1: 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4    66.67       4
2: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4    66.67       4
3: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1    66.67       4
4: 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    80.00       8
5: 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2    80.00       8
6: 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1    80.00       8

Is there an easy way to accomplish this?

Thanks!

question from:https://stackoverflow.com/questions/65897542/best-possible-way-to-add-the-likely-event-and-its-probability-from-a-cross-table

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

With data.table, I would do it this way:

mtcars <- as.data.table(mtcars, keep.rownames = T)

tab <- mtcars[, .N, by = .(gear, cyl)]
tab[, prob := N/sum(N), by = .(gear)]
tab <- tab[order(-prob, cyl)][!duplicated(gear)]
mtcars[tab, `:=`(cyl_exp = i.cyl, cyl_prob = i.prob), on = .(gear)]

# > head(mtcars)
#                   rn  mpg cyl disp  hp drat    wt  qsec vs am gear carb cyl_exp  cyl_prob
# 1:         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4       4 0.6666667
# 2:     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4       4 0.6666667
# 3:        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1       4 0.6666667
# 4:    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1       8 0.8000000
# 5: Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2       8 0.8000000
# 6:           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1       8 0.8000000

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...