bayesplot

Diagnostic plots for Bayesian models

TJ Mahr
UW–Madison Waisman Center

Hello! 👋

I study how children with motor disorders learn to speak and communicate.

Bayesian stats let me handle repeated-measures, time-series data from heterogeneous populations.

My current modeling project

My current project looks at speech intelligibility (y) changes with age (x). The figure shows a spaghetti plot of model fits and observed data for one child, showing a nice fit to the data. The right shows three histograms that describe when the lines cross various intelligibility thresholds.

To get my cool model to work, I needed diagnostics…

library(ggplot2)
library(bayesplot)

Plotting functions for visual diagnostics and model criticism
Part of the Stan universe but works with generic MCMC samples
Built on top of ggplot2
Simple functions to make routine visualization easy
https://mc-stan.org/bayesplot/

Scottish Hill races

Try to predict race time from race distance and hill height.

stan_glm(time_min ~ distance_km, data = races, ...)

races
#> # A tibble: 90 x 4
#>    race                   distance_km climb_km time_min
#>    <chr>                        <dbl>    <dbl>    <dbl>
#>  1 Alva Games Hill Race           2.5    0.385     18.6
#>  2 Aonach Mor Uphill Race         4      0.61      22.2
#>  3 Arrochar Alps                 25      2.4      188. 
#>  4 Beinn Lora Hill Race           5      0.34      26.8
#>  5 Ben Aigan Hill Race            6.4    0.326     28.5
#>  6 Ben Lomond Hill Race          12.6    0.98      62.3
#>  7 Ben Nevis Race                14      1.36      85.6
#>  8 Ben Rinnes Hill Race          22.4    1.57     117  
#>  9 Ben Sheann Hill Race           4      0.426     22.9
#> 10 Bennachie Hill Race           12.8    0.55      55.2
#> # ... with 80 more rows

Bayesian models in 15 seconds

Classical regression: line of best fit (maximum likelihood)

Bayesian regression: all plausible lines given data and data-generating process (posterior distribution)

Model is a distribution

Marginal distributions of parameters

Three facets showing histograms of posterior samples for the model’s intercept, main predictor (distance) and the error term sigma.

Uncertainty/compatibility intervals

Plot showing the median and two compatibility intervals for each parameter. We use this compare the sign and magnitude of model parameters.

Maybe you can do better? Go for it.

mcmc_intervals_data(m1_draws) %>% 
  glimpse()
#> Observations: 3
#> Variables: 9
#> $ parameter   <fct> (Intercept), distance_km, sigma
#> $ outer_width <dbl> 0.9, 0.9, 0.9
#> $ inner_width <dbl> 0.5, 0.5, 0.5
#> $ point_est   <chr> "median", "median", "median"
#> $ ll          <dbl> -11.001341, 5.503966, 11.515591
#> $ l           <dbl> -8.614441, 5.688012, 12.376087
#> $ m           <dbl> -6.994535, 5.805850, 13.016453
#> $ h           <dbl> -5.393244, 5.921104, 13.701015
#> $ hh          <dbl> -3.084268, 6.105477, 14.862735

Intervals plus density

Like the previous interval plot but with density curves drawn instead.

Ridgelines help hierarchical models

A set of several partially overlapping density curves with shaded areas showing 80% intervals.

Joint distributions

A scatterplot of posterior draws of the intercept and distance effect with contour lines overlaid.

Hex bin

Another 2-d density plot but this one uses hexagonal tiles and uses shading to show density.

Model is generative

Bayesian models are generative

You specify a data-generating process.
Model provides a sample of parameter values for the process that are compatible with the data.

Posterior predictive checks

On each draw of posterior distribution, have the model re-predict the original dataset.
Does the replicated data look like the original data?

Boxplot of observed versus 6 replications

Density of observed versus 50 replications

Density of observed versus 50 replications. The model replications do not agree with the data.

Density from a better model

Density of observed versus 50 replications. The model replications agree with the data.

How well are individual data points predicted?

Plot of the observed data by distance. For each observation, there is a 95% interval showing the model’s range of simulations. As the distance increases, the intervals get farther from the observations, more or less.

Pointwise prediction error

Instead of showing observed versus simulation, this shows the average of observed minus simulated. The x axis is hill height. A LOESS smooth shows that error increases with hill height.

Model’s distribution comes from a sampling algorithm

Bayesian models are estimated by Markov Chain Monte Carlo.
Multiple chains sample the posterior distribution in parallel.
Did these chains adequately sample the posterior distribution?

Classic traceplot 🐛

The canonical traceplot. It looks like a hairy caterpillar. It’s good.

Traceplot with bad mixing of chains

Traceplot where one of the chains gets stuck. It’s bad.

New diagnostics are coming

Figure showing the abstract of the new Rhat paper. https://arxiv.org/abs/1903.08008

[wip] Do ranks mix well among chains?

Figure showing mixture of rankings among the chains from the bad traceplot. Chain 2 dominates one end of the rankings.

Plus dozens more plots

https://mc-stan.org/bayesplot/

Acknowledgments

Shoutout to Jonah Gabry, the lead author of the package
Rest of Stan team.
My work is supported by NIH R01DC009411, R01DC015653

https://github.com/tjmahr/bayesplot-satrdays-2019