Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
595 views
in Technique[技术] by (71.8m points)

dataframe - How do I remove specified rows from a data frame in R, but the rows are eliminated according to another column variable?

I have a data frame of values, with variable names that correspond to co-ordinates and event time per simulation.

head_data<-structure(list(x = c(987.353265152362, 570.817987386894, 1147.5681499552, 
637.526076016409, 1439.13510253106, 1396.6452808061), y = c(1802.08232812874, 
349.336242713164, 1789.49467712533, 361.611973188148, 1492.44148360367, 
1459.91771610835), id = 1:6, `simulation 1` = c(1100, 600, 1200, 
400, 900, 1000), `simulation 2` = c(1500, 1400, 1600, 1200, 1200, 
1300), `simulation 3` = c(1200, 1100, 1200, 1000, 900, 900), 
    `simulation 4` = c(1300, 800, 1200, 900, 1100, 1100), `simulation 5` = c(1500, 
    1200, 1400, 1100, 1300, 1200), `simulation 6` = c(200, 1400, 
    100, 1100, 600, 600)), row.names = c(NA, 6L), class = "data.frame")

I have rearranged this data using melt and arrange from the reshape2 and dplyr packages.

data_long <- melt(head_data, id.vars = c('x', 'y', 'id'), value.name = 'time', variable.name = 'sim')
data_long_sort<-data_long%>%arrange(sim,time)

There are 6 values of time per simulation, what I want to do is eliminate the 3 highest values within each simulation, so I have a table that looks like this

data_trim<-structure(list(x = c(637.526076016409, 570.817987386894, 1439.13510253106, 
637.526076016409, 1439.13510253106, 1396.6452808061, 1439.13510253106, 
1396.6452808061, 637.526076016409, 570.817987386894, 637.526076016409, 
1439.13510253106, 637.526076016409, 570.817987386894, 1396.6452808061, 
1147.5681499552, 987.353265152362, 1439.13510253106), y = c(361.611973188148, 
349.336242713164, 1492.44148360367, 361.611973188148, 1492.44148360367, 
1459.91771610835, 1492.44148360367, 1459.91771610835, 361.611973188148, 
349.336242713164, 361.611973188148, 1492.44148360367, 361.611973188148, 
349.336242713164, 1459.91771610835, 1789.49467712533, 1802.08232812874, 
1492.44148360367), id = c(4L, 2L, 5L, 4L, 5L, 6L, 5L, 6L, 4L, 
2L, 4L, 5L, 4L, 2L, 6L, 3L, 1L, 5L), sim = structure(c(1L, 1L, 
1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L
), .Label = c("simulation 1", "simulation 2", "simulation 3", 
"simulation 4", "simulation 5", "simulation 6"), class = "factor"), 
    time = c(400, 600, 900, 1200, 1200, 1300, 900, 900, 1000, 
    800, 900, 1100, 1100, 1200, 1200, 100, 200, 600)), row.names = c(1L, 
2L, 3L, 7L, 8L, 9L, 13L, 14L, 15L, 19L, 20L, 21L, 25L, 26L, 27L, 
31L, 32L, 33L), class = "data.frame")

I did this by doing

data_trim<-data_long_sort[c(1:3,7:9,13:15,19:21,25:27,31:33),]

But I need a more efficient way of doing so for a larger data frame.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here is a concise answer using the tidyr and dplyr package:

library(tidyr)
library(dplyr)

data_long_sort <- head_data %>% 
                  pivot_longer(cols=starts_with("sim"), names_to="sim", values_to="time") %>% 
                  arrange(sim,time)

answer <-data_long_sort %>% group_by(sim) %>% slice_head(n=3)

#a more general option with a variable number of simulation columns
data_long_sort %>% group_by(sim) %>% slice_head(n= nrow(.)-3)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...