Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

statistics - How do I select specific coeficients in R when I am trying to find out a perfect fit in R

I am lookig for an opinion. I am new to R and for work I am trying to create a tarif pricing structure using the following: exposition, zone vehicle and drivers age (both categorical I was able to create some groups based on the age), fuel and brand of the car (also categorical).

Looking at the data I have noticed that I currently have some overdispersion so I went ahead and tried to fit a Negative Binomial. I also managed to improve the model a bit using likelihood tests, chi squared using the anova function.

However I did notice something odd. Looking at the brand coeficient (it goes from 2 to 14) some of the variables are significnat at a 5% level while others are not. I did perform a Likelihood ratio test and it is telling me that the brand coeficient is significant.

How can I tell R that I only want to estimate the models with brands 5,10 and 12 since the others are not significant meaning they insurers with those brands should pay the same as a stnadard insurer?

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.92426 0.10172 -18.916 < 2e-16 ***

zone2C       0.16620 0.05799 2.866   0.00416 **

zone2D       0.42580 0.05946 7.161 8.04e-13 ***

zone2E       0.57356 0.06088 9.421 < 2e-16 ***

zone2F       0.58382 0.13233 4.412 1.03e-05 ***

vehcut2[4,16) 0.09004 0.05096 1.767 0.07724 .

vehcut2[16,101) -0.19546 0.09267 -2.109 0.03494 *

agecut1[26,31) -0.51136 0.10015 -5.106 3.29e-07 ***

agecut1[31,41) -0.59369 0.08502 -6.983 2.89e-12 ***

agecut1[41,51) -0.58597 0.08455 -6.930 4.21e-12 ***

agecut1[51,61) -0.67614 0.08734 -7.741 9.85e-15 ***

agecut1[61,71) -0.70625 0.09992 -7.068 1.57e-12 ***

agecut1[71,81) -0.76348 0.11806 -6.467 1.00e-10 ***

agecut1[81,101) -0.96703 0.23006 -4.203 2.63e-05 ***

as.factor(brand)2 0.02324 0.05663 0.410 0.68154

as.factor(brand)3 0.11332 0.07796 1.454 0.14606

as.factor(brand)4 -0.09019 0.11436 -0.789 0.43032

as.factor(brand)5 0.16641 0.08982 1.853 0.06392 .

as.factor(brand)6 -0.14618 0.11194 -1.306 0.19158

as.factor(brand)10 0.24718 0.11889 2.079 0.03761 *

as.factor(brand)11 0.22740 0.13854 1.641 0.10072

as.factor(brand)12 -0.15984 0.07034 -2.272 0.02306 *

as.factor(brand)13 0.21873 0.13721 1.594 0.11092

as.factor(brand)14 -0.25814 0.27270 -0.947 0.34384

fuelE              -0.16247 0.04202 -3.867 0.00011 ***

Thank you!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could recode your brand variable as follows:

library(dplyr)
data <- data %>% 
  mutate(
    brand = case_when(
        brand == 5 ~ "5", 
        brand == 10 ~ "10", 
        brand == 12 ~ "12", 
        TRUE ~ "Other"), 
    brand = factor(brand, levels=c("Other", "5", "10", "12"))
)

and then re-run the model.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...