Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
217 views
in Technique[技术] by (71.8m points)

Selecting a unique value from an R data frame

If I have a table like this:

| FileName | Category| Value | Number |
|:--------:|:-------:|:-----:|:------:|
| File1    | Time    | 123   | 1      |
| File1    | Size    | 456   | 1      |
| File1    | Final   | 789   | 1      |
| File2    | Time    | 312   | 2      |
| File2    | Size    | 645   | 2      |
| File2    | Final   | 978   | 2      |
| File3    | Time    | 741   | 1      |
| File3    | Size    | 852   | 1      |
| File3    | Final   | 963   | 1      |
| File1    | Time    | 369   | 2      |
| File1    | Size    | 258   | 2      |
| File1    | Final   | 147   | 2      |
| File3    | Time    | 741   | 2      |
| File3    | Size    | 734   | 2      |
| File3    | Final   | 942   | 2      |
| File1    | Time    | 997   | 3      |
| File1    | Size    | 245   | 3      |
| File1    | Final   | 985   | 3      |
| File2    | Time    | 645   | 3      |
| File2    | Size    | 285   | 3      |
| File2    | Final   | 735   | 3      |
| File3    | Time    | 198   | 3      |
| File3    | Size    | 165   | 3      |
| File3    | Final   | 753   | 3      |

What means could I use in an R script to declare a variable that is the Value for each FileName where Number is minimum and Category is Time?

(EDIT: It should be noted that there are null entries in the Value column. Therefore, this code should be constructed to treat null entries as though they didn't exist so New Column doesn't end up filled with NA values.)

Then I'd like to merge this to form a new column on the existing table so that it now looks like this:

| FileName | Category | Value | Number | New Column |
|:--------:|:--------:|:-----:|:------:|------------|
| File1    | Time     | 123   | 1      | 123        |
| File1    | Size     | 456   | 1      | 123        |
| File1    | Final    | 789   | 1      | 123        |
| File2    | Time     | 312   | 2      | 312        |
| File2    | Size     | 645   | 2      | 312        |
| File2    | Final    | 978   | 2      | 312        |
| File3    | Time     | 741   | 1      | 741        |
| File3    | Size     | 852   | 1      | 741        |
| File3    | Final    | 963   | 1      | 741        |
| File1    | Time     | 369   | 2      | 369        |
| File1    | Size     | 258   | 2      | 369        |
| File1    | Final    | 147   | 2      | 369        |
| File3    | Time     | 741   | 2      | 741        |
| File3    | Size     | 734   | 2      | 741        |
| File3    | Final    | 942   | 2      | 741        |
| File1    | Time     | 997   | 3      | 997        |
| File1    | Size     | 245   | 3      | 997        |
| File1    | Final    | 985   | 3      | 997        |
| File2    | Time     | 645   | 3      | 645        |
| File2    | Size     | 285   | 3      | 645        |
| File2    | Final    | 735   | 3      | 645        |
| File3    | Time     | 198   | 3      | 198        |
| File3    | Size     | 165   | 3      | 198        |
| File3    | Final    | 753   | 3      | 198        |
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Using data.table:

(Edited to reflect @Frank's comments)

DT[, Benchmark := Value[Category == "Time"][which.min(Number[Category == "Time"])], by = FileName]

Breaking this down:

Number[Category == "Time"]

  • Take all Number where Category == Time

which.min(^^^)

  • Find which one is the minimum

Benchmark := Value[Category == "Time"][^^^]

  • Set the new column of benchmark to the value at this minimum

by = FileName

  • Do this by group

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...