06/05, 2020

¿Como decidimos que modelo elegimos para explicar mejor un fenomeno?

Que es lo que no nos permite la inferencia multimodelo

  • Arreglar un estudio mal diseñado
  • Predecir en base a variables no relacionadas
  • Darnos una receta

Para que nos sirve la inferencia multimodelo

  • Seleccionar el o los modelos más parsimoniosos entre varios en competencia
    • No el que predice más
  • Nos entrega estrategia para seleccionar modelos

Que necesitamos para realizar inferencia multimodelo

  • Buen diseño de muestreo/experimental
  • Generación cuidadosa de hipótesis
  • Selección adecuada de variables para distinguir entre hipótesis

Paquete MuMIn

Multi Moldel Inference

data("mtcars")

fit1 <- lm(mpg ~ carb + cyl, data = mtcars)
fit2 <- lm(mpg ~ cyl + wt, data = mtcars)
fit3 <- lm(mpg ~ am + qsec + wt, data = mtcars)
fit4 <- lm(mpg ~ carb + cyl + wt, data = mtcars)
fit5 <- lm(mpg ~ am + carb + cyl + qsec + wt, data = mtcars)
fit6 <- lm(mpg ~ am + carb + cyl + hp + qsec, data = mtcars)

models <- list(fit1, fit2, fit3, fit4, fit5, fit6)

Paquete MuMIn (model.sel)

Select <- model.sel(models)
(Intercept) carb cyl wt am qsec hp df logLik AICc delta weight
9.62 NA NA -3.92 2.94 1.23 NA 5 -72.06 156.43 0.00 0.46
39.69 NA -1.51 -3.19 NA NA NA 4 -74.01 157.49 1.06 0.27
39.60 -0.49 -1.29 -3.16 NA NA NA 5 -72.81 157.93 1.50 0.22
20.04 -0.52 -0.46 -3.05 2.94 0.73 NA 7 -71.03 160.73 4.30 0.05
33.97 -0.80 -1.27 NA 4.19 -0.13 -0.02 7 -74.85 168.36 11.93 0.00
37.81 -0.53 -2.63 NA NA NA NA 4 -80.79 171.06 14.64 0.00
  • ¿Que modelos seleccionamos?

¿Pesos de Akaike?

  • Suman uno
  • A mayor peso, mejor modelo
  • Dependen del número de modelos
Selected <- subset(Select, delta <= 2)
(Intercept) carb cyl wt am qsec hp df logLik AICc delta weight
9.62 NA NA -3.92 2.94 1.23 NA 5 -72.06 156.43 0.00 0.49
39.69 NA -1.51 -3.19 NA NA NA 4 -74.01 157.49 1.06 0.29
39.60 -0.49 -1.29 -3.16 NA NA NA 5 -72.81 157.93 1.50 0.23
(Intercept) carb cyl wt am qsec hp df logLik AICc delta weight
9.62 NA NA -3.92 2.94 1.23 NA 5 -72.06 156.43 0.00 0.63
39.69 NA -1.51 -3.19 NA NA NA 4 -74.01 157.49 1.06 0.37

¿Hemos seleccionado 3 modelos ahora que?

  • Seleccionar el mejor modelo
BestModel <- get.models(Select, 1)[[1]]
broom::glance(BestModel)
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
0.85 0.83 2.46 52.75 0 4 -72.06 154.12 161.45 169.29 28

Mejor modelo

broom::tidy(BestModel)
term estimate std.error statistic p.value
(Intercept) 9.62 6.96 1.38 0.18
am 2.94 1.41 2.08 0.05
qsec 1.23 0.29 4.25 0.00
wt -3.92 0.71 -5.51 0.00
  • ¿p no es significativo?
    • ¡¡¡¡No importa!!!!

Promediar modelos

Promediar modelos usando MuMIn

  • Trabajemos con el numero de cilindros
S <- as.data.frame(Selected)
S <- as.data.frame(Selected) %>% select(cyl, weight)
cyl weight
3 NA 0.4854123
2 -1.51 0.2850763
4 -1.29 0.2295115
  • Dos métodos full y subset

Metodo full

\[\hat{\theta} = \sum_{i=1}^R w_i \times \theta_i\]

S_full <- S
S_full$cyl <- ifelse(is.na(S_full$cyl), 0, S_full$cyl)
S_full <- S_full %>% mutate(Theta_i = cyl * weight)
cyl weight Theta_i
0.00 0.4854123 0.0000000
-1.51 0.2850763 -0.4298366
-1.29 0.2295115 -0.2960210
Cyl_hat <- sum(S_full$Theta_i)
## [1] -0.7258576

Método subset

\[\hat{\theta} = \frac{\sum_{i=1}^Rw_i \times \theta_i}{\sum_{i=1}^Rw_i}\]

S_sub <- S %>% filter(!is.na(cyl))

S_sub <- S_sub %>% mutate(Theta_i = cyl * weight)
cyl weight Theta_i
-1.51 0.2850763 -0.4298366
-1.29 0.2295115 -0.2960210
Cyl_hat <- sum(S_sub$Theta_i)/sum(S_sub$weight)
## [1] -1.410561

MuMIn

AverageModel <- model.avg(Select, subset = delta < 2, fit = T)
AverageModel$coefficients
(Intercept) am qsec wt cyl carb
full 25.07 1.43 0.60 -3.54 -0.73 -0.11
subset 25.07 2.94 1.23 -3.54 -1.41 -0.49

Comparemos modelos

Comparemos modelos (cont.)

Multicolinearidad

Presión arterial y multicolinearidad

library(readr)
bloodpress <- read_delim("https://online.stat.psu.edu/onlinecourses/sites/stat501/files/data/bloodpress.txt", 
    "\t", escape_double = FALSE, col_types = cols(Pt = col_skip()), trim_ws = TRUE)
BP Age Weight BSA Dur Pulse Stress
105 47 85.4 1.75 5.1 63 33
115 49 94.2 2.10 3.8 70 14
116 49 95.3 1.98 8.2 72 10
117 50 94.7 2.01 5.8 73 99
112 51 89.4 1.89 7.0 72 95
121 48 99.5 2.25 9.3 71 10
121 49 99.8 2.25 2.5 69 42
110 47 90.9 1.90 6.2 66 8
110 49 89.2 1.83 7.1 69 62
114 48 92.7 2.07 5.6 64 35
114 47 94.4 2.07 5.3 74 90
115 49 94.1 1.98 5.6 71 21
114 50 91.6 2.05 10.2 68 47
106 45 87.1 1.92 5.6 67 80
125 52 101.3 2.19 10.0 76 98
114 46 94.5 1.98 7.4 69 95
106 46 87.0 1.87 3.6 62 18
113 46 94.5 1.90 4.3 70 12
110 48 90.5 1.88 9.0 71 99
122 56 95.7 2.09 7.0 75 99

Presión arterial y multicolinearidad (Cont)

rowname Age Weight BSA Dur Pulse Stress
Age NA 0.41 0.38 0.34 0.62 0.37
Weight 0.41 NA 0.88 0.20 0.66 0.03
BSA 0.38 0.88 NA 0.13 0.46 0.02
Dur 0.34 0.20 0.13 NA 0.40 0.31
Pulse 0.62 0.66 0.46 0.40 NA 0.51
Stress 0.37 0.03 0.02 0.31 0.51 NA

Presión arterial y multiolinearidad (Cont)

Presión arterial y multicolinearidad (Cont)

Model1 <- lm(BP ~ BSA, data = bloodpress)
term estimate std.error statistic p.value
(Intercept) 45.18 9.39 4.81 0
BSA 34.44 4.69 7.34 0

Presión arterial y multicolinearidad (Cont)

Model2 <- lm(BP ~ Weight, data = bloodpress)
term estimate std.error statistic p.value
(Intercept) 2.21 8.66 0.25 0.8
Weight 1.20 0.09 12.92 0.0

Presión arterial y multicolinearidad (Cont)

Model3 <- lm(BP ~ BSA + Weight, data = bloodpress)
term estimate std.error statistic p.value
(Intercept) 5.65 9.39 0.60 0.56
BSA 5.83 6.06 0.96 0.35
Weight 1.04 0.19 5.39 0.00

Presión arterial y multicolinearidad (Cont)

(Intercept) Age BSA Dur Pulse Stress Weight df logLik AICc delta weight
-13.6672 0.7016 4.6274 NA NA NA 0.9058 5 -9.5930 33.4717 0.000 0.7164
-12.8521 0.6834 4.8604 0.0665 NA NA 0.8970 6 -8.4316 35.3247 1.853 0.2836

Presión arterial y multicolinearidad (Cont)

(Intercept) Age BSA Weight Dur Stress Pulse
full -13.6427 0.6984 4.2171 0.922 0.0169 0.0009 -0.0103
subset -13.6427 0.6984 4.6139 0.922 0.0662 0.0041 -0.0571

Todos vs Delta 2

Todos los modelos
(Intercept) Age BSA Weight Dur Stress Pulse
full -13.6427 0.6984 4.2171 0.922 0.0169 0.0009 -0.0103
subset -13.6427 0.6984 4.6139 0.922 0.0662 0.0041 -0.0571
Delta 2
(Intercept) Age BSA Weight Dur
full -13.436 0.6964 4.6935 0.9033 0.0189
subset -13.436 0.6964 4.6935 0.9033 0.0665