Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
705 views
in Technique[技术] by (71.8m points)

csv - How to find mean for subset using R?

Using the pre-installed dataset in R, mtcars, I'm trying to find the mean of the "mpg" variable for only Mercedes cars. I am new to R and learning on my own. I've figured out the average for mpg of all cars using the following:

read.csv ("mtcars.csv") mean(mtcars$mpg)

I thought of using something like a GROUP BY, to group only the 'Mercedes cars, but can't seem to figure it out. I'm sure it's really simple so I'm a little frustrated I'm not seeing what to do here next....

This is what the file looks like: https://gist.github.com/seankross/a412dfbd88b3db70b74b

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In base R, mtcars is a built-in data frame. You can type mtcars in the console to view it.

Here I printed the first 10 rows of the mtcars data frame.

head(mtcars, 10)
#                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
# Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
# Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
# Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

The information you need, the model, is stored in the row names. To access that information, we can use the rownames function.

rownames(mtcars)
# [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
# [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
# [7] "Duster 360"          "Merc 240D"           "Merc 230"           
# [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
# [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
# [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
# [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
# [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
# [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
# [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
# [31] "Maserati Bora"       "Volvo 142E"

The next thing we need to do is filter the row names to see if there are any elements match "Merc". We can use grepl to achieve this, which returns a logical vector if there is a match. Here "^Merc" means to capture string with a beginning in "Merc".

grepl("^Merc", rownames(mtcars))
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [14]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [27] FALSE FALSE FALSE FALSE FALSE FALSE

Finally, we can use the logical vector to subset the mtcars data frame. After the subset, we can calculate the average of mpg of the subset.

mtcars_merc <- mtcars[grepl("^Merc", rownames(mtcars)), ]
mean(mtcars_merc$mpg)
# [1] 19.01429

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...