Analysis code: 'Speech development between 30 and 119 months in typical children III: Interaction between speaking rate and intelligibility'

The following is a reproduction of the Supplemental Material (analysis code) for Mahr, Rathouz, & Hustad (2025). All code is in the R programming language.

Publication DOI: 10.1044/2024_JSLHR-24-00356
Supplemental Material DOI: 10.23641/asha.27964125.v2
Analysis notes: Bayesian mixed effects regression; marginal means; brms; Mundlak device

Data

The first set of analyses considered each child’s mean speaking rate and intelligibility for each utterance length.

library(tidyverse)
data_model <- targets::tar_read("data_model_anon")
data_model
#> # A tibble: 2,196 × 14
#>    child speaking_sps artic_sps age_months tocs_level length_longest age_bin
#>    <chr>        <dbl>     <dbl>      <dbl>      <dbl>          <dbl>   <dbl>
#>  1 c464          2.71      2.71         30          3              4      36
#>  2 c464          2.55      2.62         30          4              4      36
#>  3 c017          2.04      2.08         30          3              3      36
#>  4 c315          2.03      2.20         30          3              3      36
#>  5 c468          1.87      1.90         30          3              3      36
#>  6 c154          2.25      2.41         30          3              3      36
#>  7 c338          2.59      2.69         31          3              3      36
#>  8 c110          2.69      2.79         31          3              4      36
#>  9 c110          2.66      2.74         31          4              4      36
#> 10 c431          3.30      3.30         31          3              3      36
#> # ℹ 2,186 more rows
#> # ℹ 7 more variables: intelligibility <dbl>, mwi <dbl>, speaking_sps_3 <dbl>,
#> #   artic_sps_3 <dbl>, age_48 <dbl>, tocs_length <fct>, tocs_level_3 <dbl>

where

child: child identifier
speaking_sps, artic_sps, speaking_sps_3, artic_sps_3: speaking rate and articulation rate in syllables per second. The _3 versions have 3 subtracted so that the values are centered at 3 sps.
age_months, age_bin, age_48: child age in months, rough age group of child (for plotting), age in months minus 48 (so that the values are centered at 48 months)
tocs_level, tocs_length, tocs_level_3: utterance length as a number, factor, number centered at 3 word.
length_longest: the longest utterance reached by the child.
intelligibility, mwi: observed intelligibility and intelligibility compressed into the range .001–.999 for beta regression.

Correlations

data_for_cors <- data_model |> 
  group_by(tocs_length) |> 
  mutate(
    # age-residualized
    r_intel = residuals(lm(intelligibility ~ splines::ns(age_months, 3))),
    r_rate = residuals(lm(speaking_sps_3 ~ splines::ns(age_months, 3))),
  ) |> 
  ungroup()

data_for_cors |> 
  select(tocs_length, age_months, intelligibility, speaking_sps, r_intel, r_rate) |> 
  group_by(tocs_length) |> 
  reframe(
    pick(everything()) |> as.matrix() |> Hmisc::rcorr() |> broom::tidy()
  ) |> 
  mutate(
    estimate = round(estimate, 3),
    p.value = scales::label_pvalue()(p.value)
  ) |> 
  knitr::kable(align = "rllrrr")

tocs_length	column1	column2	estimate	n	p.value
3	intelligibility	age_months	0.688	538	<0.001
3	speaking_sps	age_months	0.330	538	<0.001
3	speaking_sps	intelligibility	0.216	538	<0.001
3	r_intel	age_months	0.000	538	>0.999
3	r_intel	intelligibility	0.571	538	<0.001
3	r_intel	speaking_sps	-0.069	538	0.107
3	r_rate	age_months	0.000	538	>0.999
3	r_rate	intelligibility	-0.042	538	0.329
3	r_rate	speaking_sps	0.942	538	<0.001
3	r_rate	r_intel	-0.074	538	0.087
4	intelligibility	age_months	0.655	478	<0.001
4	speaking_sps	age_months	0.440	478	<0.001
4	speaking_sps	intelligibility	0.281	478	<0.001
4	r_intel	age_months	0.000	478	>0.999
4	r_intel	intelligibility	0.653	478	<0.001
4	r_intel	speaking_sps	-0.071	478	0.120
4	r_rate	age_months	0.000	478	>0.999
4	r_rate	intelligibility	-0.052	478	0.255
4	r_rate	speaking_sps	0.891	478	<0.001
4	r_rate	r_intel	-0.080	478	0.081
5	intelligibility	age_months	0.566	422	<0.001
5	speaking_sps	age_months	0.438	422	<0.001
5	speaking_sps	intelligibility	0.176	422	<0.001
5	r_intel	age_months	0.000	422	>0.999
5	r_intel	intelligibility	0.768	422	<0.001
5	r_intel	speaking_sps	-0.126	422	0.010
5	r_rate	age_months	0.000	422	>0.999
5	r_rate	intelligibility	-0.108	422	0.027
5	r_rate	speaking_sps	0.894	422	<0.001
5	r_rate	r_intel	-0.141	422	0.004
6	intelligibility	age_months	0.514	386	<0.001
6	speaking_sps	age_months	0.466	386	<0.001
6	speaking_sps	intelligibility	0.230	386	<0.001
6	r_intel	age_months	0.000	386	>0.999
6	r_intel	intelligibility	0.826	386	<0.001
6	r_intel	speaking_sps	-0.026	386	0.616
6	r_rate	age_months	0.000	386	>0.999
6	r_rate	intelligibility	-0.024	386	0.639
6	r_rate	speaking_sps	0.883	386	<0.001
6	r_rate	r_intel	-0.029	386	0.570
7	intelligibility	age_months	0.525	372	<0.001
7	speaking_sps	age_months	0.508	372	<0.001
7	speaking_sps	intelligibility	0.214	372	<0.001
7	r_intel	age_months	0.000	372	>0.999
7	r_intel	intelligibility	0.829	372	<0.001
7	r_intel	speaking_sps	-0.082	372	0.113
7	r_rate	age_months	0.000	372	>0.999
7	r_rate	intelligibility	-0.080	372	0.125
7	r_rate	speaking_sps	0.857	372	<0.001
7	r_rate	r_intel	-0.096	372	0.064

Beta regression model

Analyses were carried in the R programming language (vers. 4.4.0, R Core Team, 2024). HMC sampling was performed using Stan (vers. 2.34.1, Stan Development Team, 2024) through the brms R package (vers. 2.21.0, Bürkner, 2017) and the cmdstanr backend (vers. 0.7.1, Gabry, Češnovar, & Johnson, 2024). We compared models with approximate leave-one-out cross validation with the loo R package (vers. 2.7.0, Vehtari, Gelman, & Gabry, 2017).

I wrote a custom function for fitting beta regression models. A user passes in the data and a flavor which flavor determines which model formula to use. Models were seeded using dates (seed) for reproducibility.

library(splines)
library(brms)
fit_beta_model <- function(
  data,
  flavor = NULL,
  priors = NULL,
  formula = NULL,
  file = NULL,
  seed = 20221013,
  adapt_delta = .8,
  iter = 2000
) {
  formulas <- list(
    ri_main_interaction = bf(
      mwi ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) +
        tocs_length +
        speaking_sps_3 +
        speaking_sps_3:tocs_length +
        (1 | child),
      phi ~ ns(age_48, knots = 17, Boundary.knots = c(-18, 71)),
      family = Beta()
    ),
    ri_no_main_length = bf(
      mwi ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) +
        speaking_sps +
        (1 | child),
      phi ~ ns(age_48, knots = 17, Boundary.knots = c(-18, 71)),
      family = Beta()
    ),
    ri_no_main_interaction = bf(
      mwi ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) +
        tocs_length +
        speaking_sps_3 +
        (1 | child),
      phi ~ ns(age_48, knots = 17, Boundary.knots = c(-18, 71)),
      family = Beta()
    ),
    rs_for_rate = bf(
      mwi ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) +
        tocs_length +
        speaking_sps_3 +
        speaking_sps_3:tocs_length +
        (speaking_sps_3 | child),
      phi ~ ns(age_48, knots = 17, Boundary.knots = c(-18, 71)),
      family = Beta()
    ),
    rs_monotonic_length = bf(
      mwi ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) +
        mo(tocs_level) * speaking_sps_3 +
        (mo(tocs_level) | child),
      phi ~ ns(age_48, knots = 17, Boundary.knots = c(-18, 71)),
      family = Beta()
    ),
    rs_for_length = bf(
      mwi ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) +
        tocs_length +
        speaking_sps_3 +
        speaking_sps_3:tocs_length +
        (tocs_length | child),
      phi ~ ns(age_48, knots = 17, Boundary.knots = c(-18, 71)),
      family = Beta()
    ),
    rs_for_length_plus_rate = bf(
      mwi ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) +
        tocs_length +
        speaking_sps_3 +
        speaking_sps_3:tocs_length +
        (speaking_sps_3 + tocs_length | child),
      phi ~ ns(age_48, knots = 17, Boundary.knots = c(-18, 71)),
      family = Beta()
    ),
    rs_for_length_colon_rate = bf(
      mwi ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) +
        tocs_length +
        speaking_sps_3 +
        speaking_sps_3:tocs_length +
        (speaking_sps_3:tocs_length | child),
      phi ~ ns(age_48, knots = 17, Boundary.knots = c(-18, 71)),
      family = Beta()
    ),
    rs_for_length_colon_rate_no_intercept = bf(
      mwi ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) +
        tocs_length +
        speaking_sps_3 +
        speaking_sps_3:tocs_length +
        (0 + speaking_sps_3:tocs_length | child),
      phi ~ ns(age_48, knots = 17, Boundary.knots = c(-18, 71)),
      family = Beta()
    ),
    rs_for_length_by_rate = bf(
      mwi ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) +
        tocs_length +
        speaking_sps_3 +
        speaking_sps_3:tocs_length +
        (speaking_sps_3 * tocs_length | child),
      phi ~ ns(age_48, knots = 17, Boundary.knots = c(-18, 71)),
      family = Beta()
    )
  )

  files <- c(
    ri_main_interaction = "models/ri_main_interaction",
    ri_no_main_length = "models/ri_no_main_length",
    ri_no_main_interaction = "models/ri_no_main_interaction",
    rs_for_rate = "models/rs_for_rate",
    rs_for_length = "models/rs_for_length",
    rs_for_length_plus_rate = "models/rs_for_length_plus_rate",
    rs_for_length_colon_rate = "models/rs_for_length_colon_rate",
    rs_for_length_colon_rate_no_intercept = "models/rs_for_length_colon_rate_no_intercept",
    rs_for_length_by_rate = "models/rs_for_length_by_rate",
    rs_monotonic_length = "models/rs_monotonic_length"
  )

  formula <- formula %||% formulas[[flavor]]
  file <- file %||% files[[flavor]]


  if (is.null(priors)) {
    priors <- c(
      set_prior("normal(0, 2.5)", class = "b"),
      set_prior("normal(0, 1)", class = "sd")
    )
    has_element <- function(x, y) any(x %in% y)
    has_correlation <- get_prior(formula, data = data) |>
      getElement("class") |>
      has_element("cor")
    if (has_correlation) {
      priors <- c(
        priors,
        set_prior("lkj(2)", class = "cor")
      )
    }
  }


  t <- brm(
    formula = formula,
    data = data,
    prior = priors,
    backend = "cmdstanr",
    seed = seed,
    file = file,
    file_refit = "on_change",
    chains = 4,
    cores = 4,
    iter = iter,
    save_pars = save_pars(all = TRUE),
    control = list(adapt_delta = adapt_delta),
    refresh = 50
  )
  add_criterion(t, c("loo", "loo_R2"))
}

The models compared with in the manuscript were as follows:

loo_comparison <- targets::tar_read("loo_comparison")
labels <- tibble::tribble(
  ~model, ~random_effects,
  "model_rs_length", "1 + length | child", 
  "model_rs_rate", "1 + rate | child", 
  "model_rs_length_plus_rate", "1 + length + rate | child", 
  "model_main", "1 | child", 
  "model_rs_length_colon_rate", "1 + length:rate | child", 
  "model_rs_monotonic_length", "1 + monotonic-length | child", 
)

loo_table_data <- loo_comparison |> 
  inner_join(labels, by = "model") |> 
  mutate(
    across(c(2:3), \(x) round(x, 1)),
    across(c(4:9), \(x) round(x, 0)),
    across(c(loo_r2_mean), \(x) round(x, 2)),
    across(c(loo_r2_sd), \(x) round(x, 3)),
  ) |> 
  relocate(model, random_effects) 

loo_table_data |> 
  knitr::kable()

model	random_effects	elpd_diff	se_diff	elpd_loo	se_elpd_loo	p_loo	se_p_loo	looic	se_looic	loo_r2_mean	loo_r2_sd
model_rs_length	1 + length \| child	0.0	0.0	3814	49	852	20	-7628	98	0.84	0.009
model_rs_length_plus_rate	1 + length + rate \| child	-28.8	5.3	3785	50	784	20	-7571	100	0.82	0.010
model_rs_monotonic_length	1 + monotonic-length \| child	-98.9	9.1	3715	51	523	16	-7431	102	0.79	0.011
model_rs_rate	1 + rate \| child	-101.4	9.8	3713	52	490	15	-7426	104	0.78	0.011
model_main	1 \| child	-107.2	9.7	3707	52	442	15	-7414	104	0.78	0.011
model_rs_length_colon_rate	1 + length:rate \| child	-107.9	9.8	3706	52	479	15	-7413	104	0.78	0.011

All models (except model_rs_monotonic_length) formula includes a categorical effect of utterance length tocs_length, speaking rate centered at 3 sps tocs_length, an adjustment to the rate effect for each length speaing_sps_3:tocs_length, and the 3-df spline for age ns(...). The phi ~ regression lines allows the dispersion of the beta distribution to change with age. What differentiated these models were the variables used in the random (by-child) effects. model_rs_monotonic_length model_rs_monotonic_length model applies the method in Bürkner & Charpentier (2020), using mo() to mark a variable as monotonic.

The winning model on the basis of model comparison was model_rs_length (by-child length effects).

model_rs_length <- targets::tar_read(model_rs_length) 
summary(model_rs_length)
#>  Family: beta 
#>   Links: mu = logit; phi = log 
#> Formula: mwi ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) + tocs_length + speaking_sps_3 + speaking_sps_3:tocs_length + (tocs_length | child) 
#>          phi ~ ns(age_48, knots = 17, Boundary.knots = c(-18, 71))
#>    Data: data (Number of observations: 2196) 
#>   Draws: 4 chains, each with iter = 3000; warmup = 1500; thin = 1;
#>          total post-warmup draws = 6000
#> 
#> Multilevel Hyperparameters:
#> ~child (Number of levels: 538) 
#>                                Estimate Est.Error l-95% CI u-95% CI Rhat
#> sd(Intercept)                      0.80      0.04     0.71     0.88 1.01
#> sd(tocs_length4)                   0.50      0.09     0.28     0.65 1.02
#> sd(tocs_length5)                   0.38      0.11     0.11     0.55 1.02
#> sd(tocs_length6)                   0.44      0.09     0.22     0.59 1.02
#> sd(tocs_length7)                   0.54      0.09     0.34     0.68 1.02
#> cor(Intercept,tocs_length4)       -0.27      0.10    -0.43    -0.02 1.01
#> cor(Intercept,tocs_length5)       -0.45      0.10    -0.62    -0.23 1.00
#> cor(tocs_length4,tocs_length5)     0.41      0.18    -0.08     0.66 1.01
#> cor(Intercept,tocs_length6)       -0.31      0.13    -0.49    -0.01 1.00
#> cor(tocs_length4,tocs_length6)     0.57      0.13     0.26     0.77 1.00
#> cor(tocs_length5,tocs_length6)     0.53      0.19     0.01     0.77 1.01
#> cor(Intercept,tocs_length7)       -0.26      0.12    -0.44     0.03 1.01
#> cor(tocs_length4,tocs_length7)     0.54      0.12     0.25     0.73 1.00
#> cor(tocs_length5,tocs_length7)     0.45      0.18    -0.03     0.70 1.01
#> cor(tocs_length6,tocs_length7)     0.86      0.10     0.67     0.96 1.01
#>                                Bulk_ESS Tail_ESS
#> sd(Intercept)                       191      280
#> sd(tocs_length4)                    120      163
#> sd(tocs_length5)                    125      181
#> sd(tocs_length6)                    136      160
#> sd(tocs_length7)                    144      163
#> cor(Intercept,tocs_length4)         290      306
#> cor(Intercept,tocs_length5)        2818      915
#> cor(tocs_length4,tocs_length5)      369      270
#> cor(Intercept,tocs_length6)         380      273
#> cor(tocs_length4,tocs_length6)      662      480
#> cor(tocs_length5,tocs_length6)      423      248
#> cor(Intercept,tocs_length7)         232      239
#> cor(tocs_length4,tocs_length7)      718      463
#> cor(tocs_length5,tocs_length7)      384      241
#> cor(tocs_length6,tocs_length7)      567      280
#> 
#> Regression Coefficients:
#>                                              Estimate Est.Error l-95% CI
#> Intercept                                       -0.42      0.11    -0.64
#> phi_Intercept                                    3.82      0.39     3.08
#> nsage_48knotsEQc925Boundary.knotsEQcM18711       3.25      0.15     2.95
#> nsage_48knotsEQc925Boundary.knotsEQcM18712       6.80      0.28     6.26
#> nsage_48knotsEQc925Boundary.knotsEQcM18713       3.63      0.21     3.22
#> tocs_length4                                    -0.10      0.05    -0.20
#> tocs_length5                                    -0.31      0.05    -0.41
#> tocs_length6                                    -0.17      0.06    -0.28
#> tocs_length7                                    -0.51      0.06    -0.63
#> speaking_sps_3                                  -0.08      0.11    -0.31
#> tocs_length4:speaking_sps_3                     -0.05      0.13    -0.30
#> tocs_length5:speaking_sps_3                     -0.45      0.12    -0.69
#> tocs_length6:speaking_sps_3                     -0.25      0.13    -0.51
#> tocs_length7:speaking_sps_3                     -0.34      0.13    -0.59
#> phi_nsage_48knotsEQ17Boundary.knotsEQcM18711     0.13      0.67    -1.22
#> phi_nsage_48knotsEQ17Boundary.knotsEQcM18712     0.53      0.23     0.07
#>                                              u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept                                       -0.19 1.00     2038     3248
#> phi_Intercept                                    4.59 1.02      118      212
#> nsage_48knotsEQc925Boundary.knotsEQcM18711       3.55 1.00     2596     3175
#> nsage_48knotsEQc925Boundary.knotsEQcM18712       7.35 1.00     1873     3560
#> nsage_48knotsEQc925Boundary.knotsEQcM18713       4.03 1.00     3657     3703
#> tocs_length4                                    -0.01 1.00     3297     3803
#> tocs_length5                                    -0.22 1.00     3023     4015
#> tocs_length6                                    -0.06 1.00     3206     4399
#> tocs_length7                                    -0.41 1.00     2993     3682
#> speaking_sps_3                                   0.14 1.00     1971     2564
#> tocs_length4:speaking_sps_3                      0.21 1.00     3431     4386
#> tocs_length5:speaking_sps_3                     -0.21 1.00     3030     4074
#> tocs_length6:speaking_sps_3                      0.01 1.00     3197     3824
#> tocs_length7:speaking_sps_3                     -0.08 1.00     2527     3713
#> phi_nsage_48knotsEQ17Boundary.knotsEQcM18711     1.33 1.02      140      352
#> phi_nsage_48knotsEQ17Boundary.knotsEQcM18712     0.97 1.01      290      843
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

However, it bears noting that a more appropriate model comparison would compare them using leave-one-child-out cross-validation. This approach was computational prohibitive because it would require fitting 538 versions of each model.

Marginal means

With a model in hand, we compute marginal predictions by simulating new children from each posterior sample. The basic recipe is as follows.

Create a grid of data to make predictions for, giving child an out-of-sample value. To make things easier, I break the data into batches in a list.

newdata <- tidyr::crossing(
  age_48 = (3:7) * 12 - 48,
  tocs_length = c("3", "4", "5", "6", "7"),
  speaking_sps_3 = c(0, .5),
  child = "fake"
) 
newdata
#> # A tibble: 50 × 4
#>    age_48 tocs_length speaking_sps_3 child
#>     <dbl> <chr>                <dbl> <chr>
#>  1    -12 3                      0   fake 
#>  2    -12 3                      0.5 fake 
#>  3    -12 4                      0   fake 
#>  4    -12 4                      0.5 fake 
#>  5    -12 5                      0   fake 
#>  6    -12 5                      0.5 fake 
#>  7    -12 6                      0   fake 
#>  8    -12 6                      0.5 fake 
#>  9    -12 7                      0   fake 
#> 10    -12 7                      0.5 fake 
#> # ℹ 40 more rows

newdata_batches <- newdata |>
  split(~ age_48:speaking_sps_3)

Next I sample 1000 children from each draw. The 1000 childen are 1000 draws from the multivariate normal distribution used in the random effects. First I extract the variance-covariance matrix of the random effects. I use an rvar() which allows me to write code for the 6000 x 5 x 5 array of posterior draws as if it were a 5 x 5 matrix. The random effects are centered at 0 so each row of newdata is given a mean of 0.

cov <- model_rs_length |>
  VarCorr(summary = FALSE) |>
  _$child$cov |>
  posterior::rvar()
cov
#> rvar<6000>[5,5] mean ± sd:
#>              Intercept      tocs_length4   tocs_length5   tocs_length6  
#> Intercept     0.64 ± 0.071  -0.11 ± 0.054  -0.14 ± 0.055  -0.11 ± 0.056 
#> tocs_length4 -0.11 ± 0.054   0.26 ± 0.084   0.09 ± 0.051   0.13 ± 0.056 
#> tocs_length5 -0.14 ± 0.055   0.09 ± 0.051   0.16 ± 0.075   0.10 ± 0.052 
#> tocs_length6 -0.11 ± 0.056   0.13 ± 0.056   0.10 ± 0.052   0.20 ± 0.075 
#> tocs_length7 -0.12 ± 0.062   0.15 ± 0.060   0.10 ± 0.055   0.21 ± 0.070 
#>              tocs_length7  
#> Intercept    -0.12 ± 0.062 
#> tocs_length4  0.15 ± 0.060 
#> tocs_length5  0.10 ± 0.055 
#> tocs_length6  0.21 ± 0.070 
#> tocs_length7  0.30 ± 0.087

means <- rep(0, ncol(cov))

rows_batches <- vapply(newdata_batches, nrow, integer(1))
stopifnot(all(rows_batches == length(means)))

num_children <- 1000
sim_children <- posterior::rdo(mvtnorm::rmvnorm(num_children, means, cov))

Given the simulated children, I walk through each batch of new data, get the linear predictions (predictions in logits using just the fixed effects), add the simulated children to each linear prediction, convert the logits into proportions and then average over the 1000 children for that draw.

l <- newdata_batches |>
  lapply(tidybayes::add_linpred_rvars, model_rs_length, re_formula = NA) |>
  lapply(
    function(x) {
      x$.marginal <- (sim_children + t(x$.linpred)) |>
        brms::inv_logit_scaled() |>
        posterior::rvar_apply(2, posterior::rvar_mean)
      x
    }
  ) |>
  bind_rows()

This part is time-consuming so I load in the precomputed value.

targets::tar_read(marginal_means_model_rs_length)
#> # A tibble: 50 × 6
#>    age_48 tocs_length speaking_sps_3 child        .linpred     .marginal
#>     <dbl> <chr>                <dbl> <chr>      <rvar[1d]>    <rvar[1d]>
#>  1    -12 3                        0 fake    0.231 ± 0.080  0.55 ± 0.018
#>  2    -12 4                        0 fake    0.130 ± 0.080  0.53 ± 0.019
#>  3    -12 5                        0 fake   -0.078 ± 0.077  0.48 ± 0.019
#>  4    -12 6                        0 fake    0.061 ± 0.082  0.51 ± 0.020
#>  5    -12 7                        0 fake   -0.283 ± 0.084  0.43 ± 0.020
#>  6      0 3                        0 fake    1.448 ± 0.065  0.78 ± 0.011
#>  7      0 4                        0 fake    1.347 ± 0.063  0.78 ± 0.010
#>  8      0 5                        0 fake    1.138 ± 0.058  0.75 ± 0.011
#>  9      0 6                        0 fake    1.278 ± 0.064  0.77 ± 0.011
#> 10      0 7                        0 fake    0.933 ± 0.066  0.71 ± 0.013
#> # ℹ 40 more rows

The .marginal column has 6000 marginal means for each row of the prediction grid. Subtracting between two rows would give 6000 differences. Estimates and comparisons were computed by taking the median and 95% quantile intervals of the means and differences.

Logistic regression model

For the within-child effects of rate, we use the item-level data for the 5-, 6-, and 7-word utterances.

data_model_by_item_anon <- targets::tar_read(data_model_by_item_anon)
data_model_by_item_anon
#> # A tibble: 11,282 × 18
#>    child item  speaking_sps artic_sps age_months tocs_level length_longest
#>    <chr> <chr>        <dbl>     <dbl>      <dbl>      <dbl>          <dbl>
#>  1 c130  S5T01         2.47      2.47         34          5              5
#>  2 c130  S5T02         2.26      2.26         34          5              5
#>  3 c130  S5T03         2.55      2.55         34          5              5
#>  4 c130  S5T04         2.77      2.77         34          5              5
#>  5 c130  S5T05         3.01      3.01         34          5              5
#>  6 c130  S5T06         1.89      2.12         34          5              5
#>  7 c130  S5T07         2.39      2.39         34          5              5
#>  8 c130  S5T08         2.20      2.50         34          5              5
#>  9 c130  S5T09         2.83      2.83         34          5              5
#> 10 c227  S5T01         3         3            35          5              5
#> # ℹ 11,272 more rows
#> # ℹ 11 more variables: age_bin <dbl>, intelligibility <dbl>,
#> #   n_words_correct <dbl>, n_words_total <dbl>, mwi <dbl>,
#> #   speaking_sps_3 <dbl>, artic_sps_3 <dbl>, age_48 <dbl>, tocs_length <fct>,
#> #   tocs_level_3 <dbl>, child_mean_rate <dbl>

New here are

item: item identifier
n_words_correct, n_words_total: number of words correctly transcribed by the listeners and number of words total, so the intelligibility is the proportion of words correctly transcribed
child_mean_rate: a child’s average speaking rate over all items

The fitted model was the following:

model_logistic <- targets::tar_read(
  "model_rs_rate_ri_item_no_length_binom_mundlak"
)
model_logistic
#>  Family: binomial 
#>   Links: mu = logit 
#> Formula: n_words_correct | trials(n_words_total) ~ ns(age_48, knots = c(9, 25), Boundary.knots = c(-18, 71)) + speaking_sps_3 + child_mean_rate + (speaking_sps_3 | child) + (1 | item) 
#>    Data: data (Number of observations: 11282) 
#>   Draws: 4 chains, each with iter = 3000; warmup = 1500; thin = 1;
#>          total post-warmup draws = 6000
#> 
#> Multilevel Hyperparameters:
#> ~child (Number of levels: 422) 
#>                               Estimate Est.Error l-95% CI u-95% CI Rhat
#> sd(Intercept)                     0.85      0.03     0.78     0.92 1.00
#> sd(speaking_sps_3)                0.50      0.03     0.44     0.56 1.00
#> cor(Intercept,speaking_sps_3)    -0.15      0.07    -0.28    -0.01 1.00
#>                               Bulk_ESS Tail_ESS
#> sd(Intercept)                     1367     2649
#> sd(speaking_sps_3)                2518     3793
#> cor(Intercept,speaking_sps_3)     1917     2597
#> 
#> ~item (Number of levels: 29) 
#>               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sd(Intercept)     0.71      0.09     0.55     0.92 1.00     1108     2461
#> 
#> Regression Coefficients:
#>                                            Estimate Est.Error l-95% CI u-95% CI
#> Intercept                                      0.49      0.33    -0.16     1.14
#> nsage_48knotsEQc925Boundary.knotsEQcM18711     2.93      0.20     2.52     3.33
#> nsage_48knotsEQc925Boundary.knotsEQcM18712     4.72      0.66     3.42     6.02
#> nsage_48knotsEQc925Boundary.knotsEQcM18713     3.30      0.27     2.74     3.83
#> speaking_sps_3                                -0.23      0.04    -0.30    -0.16
#> child_mean_rate                               -0.12      0.15    -0.42     0.17
#>                                            Rhat Bulk_ESS Tail_ESS
#> Intercept                                  1.00      879     1846
#> nsage_48knotsEQc925Boundary.knotsEQcM18711 1.00     1064     1742
#> nsage_48knotsEQc925Boundary.knotsEQcM18712 1.00     1098     1950
#> nsage_48knotsEQc925Boundary.knotsEQcM18713 1.00     1810     2818
#> speaking_sps_3                             1.00     4260     4387
#> child_mean_rate                            1.00     1031     2004
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

The random effects include a by-child random intercept, by-child rate effects, and by-item intercepts. The same age-spline is used as before for the fixed effects. The other two predictors are the speaking rate on the given utterance and the child’s average speaking rate. The child_mean_rate is meant to absorb child-level reflect a child’s habitual rate so that the speaking_sps_3 can estimate the within child effects of rate.

We can extract the posterior means of each child’s rate coefficient like so:

# Posterior draws of the slopes
child_slopes <- model_logistic |>
  coef(summary = FALSE) |>
  getElement("child") |>
  _[, , c("Intercept", "speaking_sps_3")] |> 
  posterior::rvar()

slope_means <- mean(child_slopes)[, "speaking_sps_3"]

f_percent <- scales::label_percent(.1)
lslopes <- list(
  n_positive = sum(slope_means > 0),
  n_negative = sum(slope_means < 0),
  pct_positive = f_percent(sum(slope_means > 0) / length(slope_means)),
  pct_negative = f_percent(sum(slope_means < 0) / length(slope_means))
)
str(lslopes)
#> List of 4
#>  $ n_positive  : int 107
#>  $ n_negative  : int 315
#>  $ pct_positive: chr "25.4%"
#>  $ pct_negative: chr "74.6%"

The plotted lines used tidybayes::add_epred_rvars() to get child-level predictions when trials is set to 1. We take a grid of 50 points along each child’s observed speaking rate, get posterior expectations for each of those point and plot the mean of those expectations for each child.

data_binom_rate_grid <- data_model_by_item_anon |>
  group_by(child, age_months, age_bin, age_48) |>
  reframe(
    child_mean_rate = mean(speaking_sps_3),
    speaking_sps_3 = c(
      seq(min(speaking_sps_3), max(speaking_sps_3), length.out = 50),
      child_mean_rate
    ),
    item = "fake",
    n_words_total = 1
  )

data_binom_rate_grid <- tidybayes::add_epred_rvars(
  data_binom_rate_grid,
  model_logistic,
  allow_new_levels = TRUE
)

References

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. doi:[10.18637/jss.v080.i01](https://doi.org/10.18637/jss.v080.i01)

Bürkner, P.-C., & Charpentier, E. (2020). Modelling monotonic effects of ordinal predictors in Bayesian regression models. British Journal of Mathematical and Statistical Psychology, 73(3), 420–451. doi:[10.1111/bmsp.12195](https://doi.org/10.1111/bmsp.12195)

Gabry, J., Češnovar, R., & Johnson, A. (2024). cmdstanr: R interface to CmdStan. Retrieved from https://mc-stan.org/cmdstanr/

R Core Team. (2024). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/

Stan Development Team. (2024). Stan modeling language users guide and reference manual. Retrieved from https://mc-stan.org

Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413–1432. doi:[10.1007/s11222-016-9696-4](https://doi.org/10.1007/s11222-016-9696-4)