In the US by State

May 2, 2024

- Compare how models perform on potentially under-reported data.
- Under-reported data is when the observed value is not the true value due to some form of measurement error.

- Under-reported data can bias analyses and affect the following decisions.
- Project will compare 3 models on two responses, Covid-19 and lung cancer counts

Covid-19 count data is likely under-reported due to many reasons:

- It can manifest in different severity levels,
- Diagnostic challenges,
- Stigma or social implications,
- Some people are apprehensive to get tested,
- etc.

Under-reported data can bias the estimates to be lower than they really are, can be thought of as unintentional missing data.

lung cancer as a more serious disease may not be as under-reported and might not benefit from using an under reported model.

It is reasonable to assume there is low under-reporting of lung cancer in the US;

- Serious illness,
- Robust disease surveillance infrastructure, and
- Test maturity.

- Covid-19 counts for Lower 48 & DC from April 2020
- 23 variables
- Response: Positive cases (count)
- Spatial Component: State (lattice)

Distribution of response

Characteristic |
N = 49^{1} |
---|---|

Positive tests | 7,562 (3,618, 21,742) |

Total tests | 81,465 (42,667, 161,181) |

Testing Rate | 0.018 (0.015, 0.027) |

Population Density | 106 (52, 231) |

Air Pollution | 7.40 (6.80, 8.20) |

Obesity | 30.9 (28.7, 34.4) |

Smoking | 16.10 (14.50, 19.00) |

Excessive Drinking | 18.20 (16.40, 19.40) |

^{1} Median (IQR) |

Not exhaustive list of variables and summary statistics

- Same covariates, new response variable.
- Nevada and Indiana did not meet USCS
^{1}publication criteria

- Non-spatial naive model
- Spatial model
- Under-reported spatial hierarchical model
- All models implemented in Nimble (de Valpine et al. 2017)

Model comparison methods

- Watanabe-Akaike information criterion (WAIC)

- Regression model ignoring spatial component
- multivariate Poisson regression model
- Model selection

- Spatial Poisson regression
- Using a log link on the Poisson mean
- ICAR normal prior on structured spatial effects

\[\begin{align*} y_i \sim \text{Poisson}(&\lambda_i) \\ &\downarrow \\ \log(&\lambda_i) = \alpha + \sum_{i=1}^{8} \beta_i x_i + \phi_i \\ &\phi_i \sim \text{Car}(0, \tau) \end{align*}\]

- Idea comes from paper for correcting under-reported lung cancer counts in Brazil (Stoner, Economou, and Drummond Marques da Silva 2019)
- Extension of Poisson-logistic regression model
- Adds an under-reporting component

let \(z_s\) be the observed (under-reported) counts, \(y_t\) be the true unknown counts, \(\pi_s\) be the under-reporting rate, and \(\lambda_s\) be the Poisson mean.

The hierarchical model can be written as, \[\begin{align*} z_{s} | y_{s} \sim \text{Binomial}(\pi_s, &y_{s}) \\ &\downarrow \\ &y_{s} \sim \text{Poisson}(\lambda_{s}) \end{align*}\] where \(\pi_s\) uses a logit link function and \(\lambda_s\) uses a log link function to determine values for the parameters.

- Bayesian Poisson regression
- Model selection with smallest WAIC
- Covid WAIC: 1,962,286.00
- Model includes: uninsured, smoking, and unemployment

- Lung cancer WAIC: 106,727.2
- Model includes: unemployment, population density, uninsured, air pollution, and drug deaths

- Covid WAIC: 628.1434
- Lung Cancer WAIC: 539.3873

- WAIC: 616.0229

- Estimated cases at 5%, 50%, and 95% quantiles of under-reporting

Under-reporting | Cases |
---|---|

Observed | 1,071,003.00 |

Predicted 5% | 1,127,881.50 |

Predicted 50% | 1,237,461.00 |

Predicted 95% | 1,458,510.20 |

- WAIC: 535.589

- Estimated cases at 5%, 50%, and 95% quantiles of under-reporting
- 75% for state counts because of increased variance

Under-reporting | Cases |
---|---|

Observed | 196,370.00 |

Predicted 5% | 195,568.65 |

Predicted 50% | 196,374.75 |

Predicted 95% | 197,219.53 |

Response | Method | WAIC |
---|---|---|

Covid | Simple | 1,962,286.00 |

Spatial | 628.14 | |

Under-reporting | 616.02 | |

Cancer | Simple | 106,727.20 |

Spatial | 539.39 | |

Under-reporting | 535.59 |

de Valpine, P., D. Turek, C. J. Paciorek, C. Anderson-Bergman, D. Temple Lang, and R. Bodik. 2017. “Programming with Models: Writing Statistical Algorithms for General Model Structures with NIMBLE.” *Journal of Computational and Graphical Statistics* 26: 403–17. https://doi.org/10.1080/10618600.2016.1172487.

Stoner, Oliver, Theo Economou, and Gabriela Drummond Marques da Silva. 2019. “A Hierarchical Framework for Correcting Under-Reporting in Count Data.” *Journal of the American Statistical Association* 114 (528): 1481–92. https://doi.org/10.1080/01621459.2019.1573732.

Nathen Byford