08/09, 2020

Modelos

¿Qué es un modelo?

  • Un modelo es una versión simplificada de la realidad que nos permite hacer inferencias o prediccións sobre una población
  • Un modelo es un resumen adecuado de la realidad
  • Un modelo es una simplificación or aproximación a la realidad y por ende no reflejará toda la realidad (Burnham y Anderson)
  • Todos los modelos estan equivocados, algunos son útiles (George Box)

Veamos un ejemplo

  • ¿Cuánto \(CO_2\) captan las plantas?
Plant Type Treatment conc uptake
Qn1 Quebec nonchilled 95 16.0
Qn1 Quebec nonchilled 175 30.4
Qn1 Quebec nonchilled 250 34.8
Qn1 Quebec nonchilled 350 37.2
Qn1 Quebec nonchilled 500 35.3
Qn1 Quebec nonchilled 675 39.2
Qn1 Quebec nonchilled 1000 39.7
Qn2 Quebec nonchilled 95 13.6
Qn2 Quebec nonchilled 175 27.3
Qn2 Quebec nonchilled 250 37.1
Qn2 Quebec nonchilled 350 41.8
Qn2 Quebec nonchilled 500 40.6
Qn2 Quebec nonchilled 675 41.4
Qn2 Quebec nonchilled 1000 44.3
Qn3 Quebec nonchilled 95 16.2
Qn3 Quebec nonchilled 175 32.4
Qn3 Quebec nonchilled 250 40.3
Qn3 Quebec nonchilled 350 42.1
Qn3 Quebec nonchilled 500 42.9
Qn3 Quebec nonchilled 675 43.9
Qn3 Quebec nonchilled 1000 45.5
Qc1 Quebec chilled 95 14.2
Qc1 Quebec chilled 175 24.1
Qc1 Quebec chilled 250 30.3
Qc1 Quebec chilled 350 34.6
Qc1 Quebec chilled 500 32.5
Qc1 Quebec chilled 675 35.4
Qc1 Quebec chilled 1000 38.7
Qc2 Quebec chilled 95 9.3
Qc2 Quebec chilled 175 27.3
Qc2 Quebec chilled 250 35.0
Qc2 Quebec chilled 350 38.8
Qc2 Quebec chilled 500 38.6
Qc2 Quebec chilled 675 37.5
Qc2 Quebec chilled 1000 42.4
Qc3 Quebec chilled 95 15.1
Qc3 Quebec chilled 175 21.0
Qc3 Quebec chilled 250 38.1
Qc3 Quebec chilled 350 34.0
Qc3 Quebec chilled 500 38.9
Qc3 Quebec chilled 675 39.6
Qc3 Quebec chilled 1000 41.4
Mn1 Mississippi nonchilled 95 10.6
Mn1 Mississippi nonchilled 175 19.2
Mn1 Mississippi nonchilled 250 26.2
Mn1 Mississippi nonchilled 350 30.0
Mn1 Mississippi nonchilled 500 30.9
Mn1 Mississippi nonchilled 675 32.4
Mn1 Mississippi nonchilled 1000 35.5
Mn2 Mississippi nonchilled 95 12.0
Mn2 Mississippi nonchilled 175 22.0
Mn2 Mississippi nonchilled 250 30.6
Mn2 Mississippi nonchilled 350 31.8
Mn2 Mississippi nonchilled 500 32.4
Mn2 Mississippi nonchilled 675 31.1
Mn2 Mississippi nonchilled 1000 31.5
Mn3 Mississippi nonchilled 95 11.3
Mn3 Mississippi nonchilled 175 19.4
Mn3 Mississippi nonchilled 250 25.8
Mn3 Mississippi nonchilled 350 27.9
Mn3 Mississippi nonchilled 500 28.5
Mn3 Mississippi nonchilled 675 28.1
Mn3 Mississippi nonchilled 1000 27.8
Mc1 Mississippi chilled 95 10.5
Mc1 Mississippi chilled 175 14.9
Mc1 Mississippi chilled 250 18.1
Mc1 Mississippi chilled 350 18.9
Mc1 Mississippi chilled 500 19.5
Mc1 Mississippi chilled 675 22.2
Mc1 Mississippi chilled 1000 21.9
Mc2 Mississippi chilled 95 7.7
Mc2 Mississippi chilled 175 11.4
Mc2 Mississippi chilled 250 12.3
Mc2 Mississippi chilled 350 13.0
Mc2 Mississippi chilled 500 12.5
Mc2 Mississippi chilled 675 13.7
Mc2 Mississippi chilled 1000 14.4
Mc3 Mississippi chilled 95 10.6
Mc3 Mississippi chilled 175 18.0
Mc3 Mississippi chilled 250 17.9
Mc3 Mississippi chilled 350 17.9
Mc3 Mississippi chilled 500 17.9
Mc3 Mississippi chilled 675 18.9
Mc3 Mississippi chilled 1000 19.9

¿Será la subespecie?

ggplot(CO2, aes(x = Type, y = uptake)) + geom_boxplot(aes(fill = Type), 
    notch = TRUE) + theme_classic()

¿Será el tratamiento?

ggplot(CO2, aes(x = Treatment, y = uptake)) + geom_boxplot(aes(fill = Treatment), 
    notch = TRUE) + theme_classic()

¿Será la concentración?

ggplot(CO2, aes(x = conc, y = uptake)) + geom_point() + 
    theme_classic()

¿Como lo determinamos?

Formula de un modelo

alguna_funcion(Y ~ X1 + X2 + ... + Xn, data = data.frame)
  • Y: Variable respuesta (Captación de \(CO_2\))
  • ~: Explicado por
  • \(X_n\): Variable explicativa n (Subespecie, tratamiento, etc.)
  • data.frame:* Base de datos (CO2)
  • alguna_funcion: El modelo a testear (nuestra simplificación de la realidad)

Algunos modelos en R

Modelos Funcion
Prueba de t t.test()
ANOVA aov()
Modelo lineal simple lm()
modelo lineal generalizado glm()
Modelo aditivo gam()
Modelo no lineal nls()
modelos lineales mixtos lmer()
Boosted regression trees gbm()

¿Cual usamos para estudiar lo de la planta?

Fit1 <- lm(uptake ~ Type, data = CO2)
  • Para este ejercicio usaremos un modelo lineal simple
  • Equivalente a un ANOVA

Usando broom para sacarle mas a tu modelo (glance)

  • Para ver datos generales del modelo
library(broom)
glance(Fit1)
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.35 0.34 8.79 43.52 0 1 -300.8 607.6 614.89 6341.44 82 84

Usando broom para sacarle mas a tu modelo (tidy)

  • Para ver parametros del modelo
tidy(Fit1)
term estimate std.error statistic p.value
(Intercept) 33.54286 1.356945 24.719384 0
TypeMississippi -12.65952 1.919011 -6.596901 0

Usando broom para sacarle mas a tu modelo (augment)

  • Para ver predicciones y residuales del modelo
augment(Fit1)
uptake Type .fitted .resid .std.resid .hat .sigma .cooksd
16.0 Quebec 33.54286 -17.5428571 -2.0190449 0.0238095 8.625388 0.0497139
30.4 Quebec 33.54286 -3.1428571 -0.3617181 0.0238095 8.841068 0.0015956
34.8 Quebec 33.54286 1.2571429 0.1446873 0.0238095 8.847000 0.0002553
37.2 Quebec 33.54286 3.6571429 0.4209084 0.0238095 8.838566 0.0021605
35.3 Quebec 33.54286 1.7571429 0.2022333 0.0238095 8.845923 0.0004988
39.2 Quebec 33.54286 5.6571429 0.6510927 0.0238095 8.825228 0.0051698
39.7 Quebec 33.54286 6.1571429 0.7086387 0.0238095 8.820995 0.0061240
13.6 Quebec 33.54286 -19.9428571 -2.2952661 0.0238095 8.559179 0.0642469
27.3 Quebec 33.54286 -6.2428571 -0.7185038 0.0238095 8.820233 0.0062957
37.1 Quebec 33.54286 3.5571429 0.4093992 0.0238095 8.839082 0.0020440
41.8 Quebec 33.54286 8.2571429 0.9503322 0.0238095 8.799269 0.0110138
40.6 Quebec 33.54286 7.0571429 0.8122217 0.0238095 8.812465 0.0080452
41.4 Quebec 33.54286 7.8571429 0.9042954 0.0238095 8.803900 0.0099726
44.3 Quebec 33.54286 10.7571429 1.2380626 0.0238095 8.765042 0.0186927
16.2 Quebec 33.54286 -17.3428571 -1.9960265 0.0238095 8.630502 0.0485869
32.4 Quebec 33.54286 -1.1428571 -0.1315339 0.0238095 8.847196 0.0002110
40.3 Quebec 33.54286 6.7571429 0.7776940 0.0238095 8.815439 0.0073757
42.1 Quebec 33.54286 8.5571429 0.9848599 0.0238095 8.795643 0.0118286
42.9 Quebec 33.54286 9.3571429 1.0769336 0.0238095 8.785334 0.0141437
43.9 Quebec 33.54286 10.3571429 1.1920257 0.0238095 8.771133 0.0173284
45.5 Quebec 33.54286 11.9571429 1.3761731 0.0238095 8.745356 0.0230958
14.2 Quebec 33.54286 -19.3428571 -2.2262108 0.0238095 8.576576 0.0604392
24.1 Quebec 33.54286 -9.4428571 -1.0867986 0.0238095 8.784174 0.0144040
30.3 Quebec 33.54286 -3.2428571 -0.3732274 0.0238095 8.840611 0.0016988
34.6 Quebec 33.54286 1.0571429 0.1216688 0.0238095 8.847331 0.0001805
32.5 Quebec 33.54286 -1.0428571 -0.1200247 0.0238095 8.847352 0.0001757
35.4 Quebec 33.54286 1.8571429 0.2137425 0.0238095 8.845664 0.0005571
38.7 Quebec 33.54286 5.1571429 0.5935466 0.0238095 8.829102 0.0042963
9.3 Quebec 33.54286 -24.2428571 -2.7901623 0.0238095 8.417641 0.0949391
27.3 Quebec 33.54286 -6.2428571 -0.7185038 0.0238095 8.820233 0.0062957
35.0 Quebec 33.54286 1.4571429 0.1677057 0.0238095 8.846612 0.0003430
38.8 Quebec 33.54286 5.2571429 0.6050558 0.0238095 8.828356 0.0044645
38.6 Quebec 33.54286 5.0571429 0.5820374 0.0238095 8.829833 0.0041313
37.5 Quebec 33.54286 3.9571429 0.4554360 0.0238095 8.836932 0.0025295
42.4 Quebec 33.54286 8.8571429 1.0193875 0.0238095 8.791887 0.0126726
15.1 Quebec 33.54286 -18.4428571 -2.1226279 0.0238095 8.601612 0.0549457
21.0 Quebec 33.54286 -12.5428571 -1.4435842 0.0238095 8.734974 0.0254138
38.1 Quebec 33.54286 4.5571429 0.5244913 0.0238095 8.833275 0.0033548
34.0 Quebec 33.54286 0.4571429 0.0526135 0.0238095 8.847980 0.0000338
38.9 Quebec 33.54286 5.3571429 0.6165650 0.0238095 8.827596 0.0046360
39.6 Quebec 33.54286 6.0571429 0.6971295 0.0238095 8.821870 0.0059267
41.4 Quebec 33.54286 7.8571429 0.9042954 0.0238095 8.803900 0.0099726
10.6 Mississippi 20.88333 -10.2833333 -1.1835308 0.0238095 8.772231 0.0170823
19.2 Mississippi 20.88333 -1.6833333 -0.1937384 0.0238095 8.846104 0.0004577
26.2 Mississippi 20.88333 5.3166667 0.6119065 0.0238095 8.827905 0.0045662
30.0 Mississippi 20.88333 9.1166667 1.0492567 0.0238095 8.788531 0.0134261
30.9 Mississippi 20.88333 10.0166667 1.1528396 0.0238095 8.776132 0.0162078
32.4 Mississippi 20.88333 11.5166667 1.3254778 0.0238095 8.752828 0.0214255
35.5 Mississippi 20.88333 14.6166667 1.6822634 0.0238095 8.694104 0.0345123
12.0 Mississippi 20.88333 -8.8833333 -1.0224018 0.0238095 8.791552 0.0127476
22.0 Mississippi 20.88333 1.1166667 0.1285196 0.0238095 8.847238 0.0002014
30.6 Mississippi 20.88333 9.7166667 1.1183119 0.0238095 8.780397 0.0152515
31.8 Mississippi 20.88333 10.9166667 1.2564225 0.0238095 8.762547 0.0192512
32.4 Mississippi 20.88333 11.5166667 1.3254778 0.0238095 8.752828 0.0214255
31.1 Mississippi 20.88333 10.2166667 1.1758580 0.0238095 8.773216 0.0168615
31.5 Mississippi 20.88333 10.6166667 1.2218949 0.0238095 8.767208 0.0182076
11.3 Mississippi 20.88333 -9.5833333 -1.1029663 0.0238095 8.782250 0.0148358
19.4 Mississippi 20.88333 -1.4833333 -0.1707200 0.0238095 8.846557 0.0003554
25.8 Mississippi 20.88333 4.9166667 0.5658697 0.0238095 8.830837 0.0039050
27.9 Mississippi 20.88333 7.0166667 0.8075632 0.0238095 8.812874 0.0079531
28.5 Mississippi 20.88333 7.6166667 0.8766185 0.0238095 8.806572 0.0093715
28.1 Mississippi 20.88333 7.2166667 0.8305816 0.0238095 8.810831 0.0084130
27.8 Mississippi 20.88333 6.9166667 0.7960540 0.0238095 8.813874 0.0077281
10.5 Mississippi 20.88333 -10.3833333 -1.1950400 0.0238095 8.770741 0.0174161
14.9 Mississippi 20.88333 -5.9833333 -0.6886346 0.0238095 8.822508 0.0057831
18.1 Mississippi 20.88333 -2.7833333 -0.3203398 0.0238095 8.842591 0.0012514
18.9 Mississippi 20.88333 -1.9833333 -0.2282661 0.0238095 8.845318 0.0006354
19.5 Mississippi 20.88333 -1.3833333 -0.1592108 0.0238095 8.846762 0.0003091
22.2 Mississippi 20.88333 1.3166667 0.1515380 0.0238095 8.846891 0.0002800
21.9 Mississippi 20.88333 1.0166667 0.1170103 0.0238095 8.847391 0.0001670
7.7 Mississippi 20.88333 -13.1833333 -1.5172980 0.0238095 8.723037 0.0280755
11.4 Mississippi 20.88333 -9.4833333 -1.0914571 0.0238095 8.783623 0.0145278
12.3 Mississippi 20.88333 -8.5833333 -0.9878742 0.0238095 8.795321 0.0119012
13.0 Mississippi 20.88333 -7.8833333 -0.9073097 0.0238095 8.803604 0.0100392
12.5 Mississippi 20.88333 -8.3833333 -0.9648558 0.0238095 8.797760 0.0113530
13.7 Mississippi 20.88333 -7.1833333 -0.8267452 0.0238095 8.811176 0.0083355
14.4 Mississippi 20.88333 -6.4833333 -0.7461807 0.0238095 8.818039 0.0067901
10.6 Mississippi 20.88333 -10.2833333 -1.1835308 0.0238095 8.772231 0.0170823
18.0 Mississippi 20.88333 -2.8833333 -0.3318490 0.0238095 8.842186 0.0013430
17.9 Mississippi 20.88333 -2.9833333 -0.3433582 0.0238095 8.841767 0.0014377
17.9 Mississippi 20.88333 -2.9833333 -0.3433582 0.0238095 8.841767 0.0014377
17.9 Mississippi 20.88333 -2.9833333 -0.3433582 0.0238095 8.841767 0.0014377
18.9 Mississippi 20.88333 -1.9833333 -0.2282661 0.0238095 8.845318 0.0006354
19.9 Mississippi 20.88333 -0.9833333 -0.1131739 0.0238095 8.847439 0.0001562

Selección de modelos

  • Basado en criterios de información
  • Trabajaremos con AIC
  • \(K\) número de parámetros
  • \(\ln{(\hat{L})}\) ajuste, mas positivo mejor, mas negativo es malo

\[AIC = 2 K - 2 \ln{(\hat{L})}\]

Modelos candidatos

Modelos candidatos

Fit1 <- lm(uptake ~ Type, data = CO2)
Fit2 <- lm(uptake ~ Treatment, data = CO2)
Fit3 <- lm(uptake ~ conc, data = CO2)
Fit4 <- lm(uptake ~ Type + Treatment + conc, data = CO2)
Fit5 <- lm(uptake ~ Type + conc + I(log(conc)), data = CO2)

Interpretando modelos

Modelo 1

  • uptake ~ Type

Modelo 2

  • uptake ~ Treatment

Modelo 3

  • uptake ~ conc

Modelo 4

  • uptake ~ Type + Treatment + conc

Modelo 5

  • uptake ~ Type + conc + I(log(conc))

Seleccion de modelos

Selección de modelos con broom

Modelo1 <- glance(Fit1) %>% dplyr::select(r.squared, 
    AIC) %>% mutate(Modelo = "Fit1")
Modelo2 <- glance(Fit2) %>% dplyr::select(r.squared, 
    AIC) %>% mutate(Modelo = "Fit2")
Modelo3 <- glance(Fit3) %>% dplyr::select(r.squared, 
    AIC) %>% mutate(Modelo = "Fit3")
Modelo4 <- glance(Fit4) %>% dplyr::select(r.squared, 
    AIC) %>% mutate(Modelo = "Fit4")
Modelo5 <- glance(Fit5) %>% dplyr::select(r.squared, 
    AIC) %>% mutate(Modelo = "Fit5")

Modelos <- bind_rows(Modelo1, Modelo2, Modelo3, Modelo4, 
    Modelo5) %>% arrange(AIC) %>% mutate(DeltaAIC = AIC - 
    min(AIC))

Selección de modelos con broom

r.squared AIC Modelo DeltaAIC
0.7488287 531.3074 Fit5 0.00000
0.6839043 550.6198 Fit4 19.31241
0.3467130 607.6014 Fit1 76.29403
0.2353971 620.8180 Fit3 89.51059
0.1017943 634.3456 Fit2 103.03817

Extra a ver en el próximo curso

GLM

Distribuciones

Estructura de error

  • family =
  • gaussian (variable dependiente continua)
  • binomial (variable dependiente 0 o 1)
  • poisson (variable dependiente cuentas 1, 2 ,3 ,4 ,5)
  • gamma (variable dependiente continua solo positiva)

Modelo lineal generalizado (familia: binomial)

Survived Pclass Sex Age Fare Cabin Embarked
0 3 male 22 7.2500 NA S
1 3 female 26 7.9250 NA S
1 1 female 35 53.1000 C123 S
0 3 male 35 8.0500 NA S
0 1 male 54 51.8625 E46 S
0 3 male 2 21.0750 NA S

Modelo lineal generalizado (familia: binomial)

Modelo lineal generalizado (familia: binomial)

term estimate std.error statistic p.value
(Intercept) 0.6165752 0.0333236 18.502645 0.00e+00
Fare 0.0018864 0.0004542 4.152808 3.73e-05
Sexmale -0.4829290 0.0350604 -13.774197 0.00e+00
null.deviance df.null logLik AIC BIC deviance df.residual nobs
143.8804 643 -327.3111 662.6221 680.4929 104.2004 641 644

\(R^2\): 0.3737256

Modelo lineal generalizado (familia: binomial)

Modelo lineal generalizado (familia: binomial)

Modelo lineal generalizado (familia: binomial)

term estimate std.error statistic p.value
(Intercept) 0.3418628 0.2187141 1.5630579 0.1180390
Fare 0.0138686 0.0055200 2.5124456 0.0119898
Sexmale -2.1310117 0.2700674 -7.8906661 0.0000000
Fare:Sexmale -0.0040442 0.0066630 -0.6069597 0.5438776
null.deviance df.null logLik AIC BIC deviance df.residual nobs
823.027 643 -321.812 651.6239 669.4947 643.6239 640 644

\(R^2\): 0.3370364

Modelo lineal generalizado (familia: binomial)

un ejemplo mas

Modelos de árboles

##     cp  Accuracy     Kappa AccuracySD   KappaSD
## 1 0.00 0.9385473 0.9067089 0.02772756 0.0420216
## 2 0.44 0.7517576 0.6388303 0.16559470 0.2360850
## 3 0.50 0.5069109 0.2934713 0.15528770 0.2089088

Para la próxima clase

  • Loops normales y con purrr
  • Plantillas de Journals para trabajar desde r (Instalar rticles)
  • Hay que poder knitear a pdf instalar tinytex