Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
525 views
in Technique[技术] by (71.8m points)

r - How to add a variable in a data frame using another variable for indices?

Data

I have a data frame "veh" with a variable "Tim":

> dput(veh$Tim)
c(169.7, 169.8, 169.9, 170, 170.1, 170.2, 170.3, 170.4, 170.5, 
170.6, 170.7, 170.8, 170.9, 171, 171.1, 171.2, 171.3, 171.4, 
171.5, 171.6, 171.7, 171.8, 171.9, 172, 172.1, 172.2, 172.3, 
172.4, 172.5, 172.6, 172.7, 172.8, 172.9, 173, 173.1, 173.2, 
173.3, 173.4, 173.5, 173.6, 173.7, 173.8, 173.9, 174, 174.1, 
174.2, 174.3, 174.4, 174.5, 174.6, 174.7, 174.8, 174.9, 175, 
175.1, 175.2, 175.3, 175.4, 175.5, 175.6, 175.7, 175.8, 175.9, 
176, 176.1, 176.2, 176.3, 176.4, 176.5, 176.6, 176.7, 176.8, 
176.9, 177, 177.1, 177.2, 177.3, 177.4, 177.5, 177.6, 177.7, 
177.8, 177.9, 178, 178.1, 178.2, 178.3, 178.4, 178.5, 178.6, 
178.7, 178.8, 178.9, 179, 179.1, 179.2, 179.3, 179.4, 179.5, 
179.6, 179.7, 179.8, 179.9, 180, 180.1, 180.2, 180.3, 180.4, 
180.5, 180.6, 180.7, 180.8, 180.9, 181, 181.1, 181.2, 181.3, 
181.4, 181.5, 181.6, 181.7, 181.8, 181.9, 182, 182.1, 182.2, 
182.3, 182.4, 182.5, 182.6, 182.7, 182.8, 182.9, 183, 183.1, 
183.2, 183.3, 183.4, 183.5, 183.6, 183.7, 183.8, 183.9, 184, 
184.1, 184.2, 184.3, 184.4, 184.5, 184.6, 184.7, 184.8, 184.9, 
185, 185.1, 185.2)

Also, I have a vector "slopezz":

> slopezz
 [1] -2.1920  0.7034  0.6113 -1.2540  0.7513  2.3250  0.0791 -0.9713  1.1010  1.9490
[11] -1.4290  2.2500  0.8775

and another one-column data frame, "x":

> x
            psi
psi1.Tim  171.4
psi2.Tim  171.8
psi3.Tim  175.1
psi4.Tim  175.7
psi5.Tim  176.3
psi6.Tim  177.8
psi7.Tim  178.7
psi8.Tim  180.1
psi9.Tim  181.5
psi10.Tim 182.4
psi11.Tim 183.8
psi12.Tim 184.8

Goal

There are 13 values in the "slopezz" and 12 in x$psi. In the data frame "veh", I want to add a new column "slope" that contains the values from "slopezz" but at the indices from x$psi.

Example:

The first value in "slopezz" is -2.1920 and in x$psi is 171.4. x$psi corresponds to veh$Tim. So, between 169.7 (first value in veh$Time) and 171.4, the "slope" variable contains the first value of -2.1920. Then, between 171.4 and 171.8 the second value of slope, 0.7034. And so on.

What I have tried

I can successfully create the new column by using ifelse and putting in the values of x$psi and "slopezz" manually.

## Example:

library(dplyr)
veh <- veh %>% 
  mutate(slope = ifelse(Tim<=171.4,slopezz[1], 
                           ifelse(Tim>171.4 & Tim<=171.8, slopezz[2], ....

Code was long, so not putting the entire thing here.

But is there a better method where I don't have to manually put the Tim values taken from x$psi?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You had the right idea using dput() for veh$Tim; it would have helped had you used it for slopezz and x as well.

Here's a two line solution (where ix is a temporary index variable):

ix <- sapply(veh$Time, function(z) which.max(z <= c(x$psi, Inf)))
veh$slope <- slopezz[ix]

You were a bit ambiguous about what value of slopezz to use when, for example, veh$Tim equals 171.4. The code above uses intervals closed on the right.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...