Confusion matrix statistics on late talker diagnoses

Posterior predictive values and the like.
caret
r
Author

TJ Mahr

Published

October 6, 2015

Modified

July 28, 2023

How many late talkers are just late bloomers? More precisely, how many children identified as late talkers at 18 months catch up to the normal range by one year later? This is an important question. From a clinical perspective, we want to support children with language delays, but it is also inefficient to spend resources fixing a self-correcting problem.

Fernald and Marchman (2012) touch on this question. Children falling below the 20th percentile in vocabulary score at 18 months were labeled “late talkers”. These children, along with a control group of timely-talkers, participated in an eyetracking study at 18 months and had their vocabulary measured every 3 months until 30 months of age.

In their sample, 22 of 36 late talkers were late bloomers, catching up to the normal vocabulary range at 30 months, and 42 of 46 timely talkers remained in the normal range of vocab development. The authors later report that eyetracking reaction times at 18 months predicted rates of vocabulary growth in both groups. In particular, the late-bloomers were significantly faster than the children who did not catch up.

The authors repeatedly report confusion matrix statistics on different subsets of the data. Which make sense: The question of late bloomers is also a question about the positive predictive value of a late-talker diagnosis. In the majority of cases, a “late talker” label at 18 months did not predict continued delay one year later. Therefore, the diagnosis has poor positive predictive value (14/36 = 39%).

Confusion Matrix Measures in R

I would like to report similar classification quantities in my own analyses, so I figured out how to reproduce their results in R. And it’s as simple as calling the caret::confusionMatrix() function in the caret package.

First, let’s re-create their data. We’ll make a long dataframe with one row per child reported in the study. We will have fields for each child’s initial Group (late talking or within-normal-limits at 18 months), their Predicted group (assuming late-talking children remain delayed), and the observed Outcome.

library(dplyr)

# LT: late talking
# WNL: within normal limits
groups <- c("WNL at 18m", "LT at 18m")
outcomes <- c("WNL at 30m", "Delayed at 30m")

# Counts from paper
lt_still_delayed <- 14
lt_bloomed <- 22

wnl_still_wnl <- 42
wnl_delayed <- 4

# Reproduce their data-set (one row per reported child)
wnl_data <- tibble(
  Group = groups[1],
  Predicted = outcomes[1],
  Outcome = rep(outcomes, times = c(wnl_still_wnl, wnl_delayed))
)

lt_data <- tibble(
  Group = "LT at 18m",
  Outcome = rep(outcomes, times = c(lt_bloomed, lt_still_delayed)),
  Predicted = outcomes[2]
)

all_kids <- bind_rows(wnl_data, lt_data) %>%
  mutate(ChildID = seq_along(Outcome)) %>% 
  select(ChildID, Group, Predicted, Outcome) %>% 
  mutate(
    Predicted = factor(Predicted, outcomes),
    Outcome = factor(Outcome, outcomes)
  )

What we have looks like a real data-set now.

all_kids %>% 
  sample_n(8, replace = FALSE) %>% 
  arrange(Group, Predicted, Outcome)
#> # A tibble: 8 × 4
#>   ChildID Group      Predicted      Outcome   
#>     <int> <chr>      <fct>          <fct>     
#> 1      47 LT at 18m  Delayed at 30m WNL at 30m
#> 2      52 LT at 18m  Delayed at 30m WNL at 30m
#> 3      60 LT at 18m  Delayed at 30m WNL at 30m
#> 4       1 WNL at 18m WNL at 30m     WNL at 30m
#> 5      16 WNL at 18m WNL at 30m     WNL at 30m
#> 6      19 WNL at 18m WNL at 30m     WNL at 30m
#> 7      34 WNL at 18m WNL at 30m     WNL at 30m
#> 8      27 WNL at 18m WNL at 30m     WNL at 30m

Next, we just call caret::confusionMatrix() on the predicted values and the reference values.

conf_mat <- caret::confusionMatrix(all_kids$Predicted, all_kids$Outcome)
conf_mat
#> Confusion Matrix and Statistics
#> 
#>                 Reference
#> Prediction       WNL at 30m Delayed at 30m
#>   WNL at 30m             42              4
#>   Delayed at 30m         22             14
#>                                           
#>                Accuracy : 0.6829          
#>                  95% CI : (0.5708, 0.7813)
#>     No Information Rate : 0.7805          
#>     P-Value [Acc > NIR] : 0.9855735       
#>                                           
#>                   Kappa : 0.3193          
#>                                           
#>  Mcnemar's Test P-Value : 0.0008561       
#>                                           
#>             Sensitivity : 0.6562          
#>             Specificity : 0.7778          
#>          Pos Pred Value : 0.9130          
#>          Neg Pred Value : 0.3889          
#>              Prevalence : 0.7805          
#>          Detection Rate : 0.5122          
#>    Detection Prevalence : 0.5610          
#>       Balanced Accuracy : 0.7170          
#>                                           
#>        'Positive' Class : WNL at 30m      
#> 

Here, we can confirm the positive predictive value (true positives / positive calls)1 is 14/36 = 0.913. The negative predictive value is noteworthy; most children not diagnosed as late talkers did not show a delay one year later (NPV = 42/46 = 0.3889).


Session info
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.0 (2023-04-21 ucrt)
 os       Windows 11 x64 (build 22621)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/Chicago
 date     2023-07-28
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
 quarto   1.3.353

─ Packages ───────────────────────────────────────────────────────────────────
 package      * version    date (UTC) lib source
 caret          6.0-94     2023-03-21 [1] CRAN (R 4.3.1)
 class          7.3-21     2023-01-23 [2] CRAN (R 4.3.0)
 cli            3.6.1      2023-03-23 [1] CRAN (R 4.3.0)
 codetools      0.2-19     2023-02-01 [2] CRAN (R 4.3.0)
 colorspace     2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
 data.table     1.14.8     2023-02-17 [1] CRAN (R 4.3.0)
 digest         0.6.33     2023-07-07 [1] CRAN (R 4.3.1)
 dplyr        * 1.1.2      2023-04-20 [1] CRAN (R 4.3.0)
 e1071          1.7-13     2023-02-01 [1] CRAN (R 4.3.1)
 evaluate       0.21       2023-05-05 [1] CRAN (R 4.3.0)
 fansi          1.0.4      2023-01-22 [1] CRAN (R 4.3.0)
 fastmap        1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
 foreach        1.5.2      2022-02-02 [1] CRAN (R 4.3.1)
 future         1.33.0     2023-07-01 [1] CRAN (R 4.3.0)
 future.apply   1.11.0     2023-05-21 [1] CRAN (R 4.3.1)
 generics       0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2        3.4.2      2023-04-03 [1] CRAN (R 4.3.0)
 globals        0.16.2     2022-11-21 [1] CRAN (R 4.3.0)
 glue           1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
 gower          1.0.1      2022-12-22 [1] CRAN (R 4.3.0)
 gtable         0.3.3      2023-03-21 [1] CRAN (R 4.3.0)
 hardhat        1.3.0      2023-03-30 [1] CRAN (R 4.3.1)
 htmltools      0.5.5      2023-03-23 [1] CRAN (R 4.3.0)
 htmlwidgets    1.6.2      2023-03-17 [1] CRAN (R 4.3.0)
 ipred          0.9-14     2023-03-09 [1] CRAN (R 4.3.1)
 iterators      1.0.14     2022-02-05 [1] CRAN (R 4.3.1)
 jsonlite       1.8.7      2023-06-29 [1] CRAN (R 4.3.1)
 knitr          1.43       2023-05-25 [1] CRAN (R 4.3.0)
 lattice        0.21-8     2023-04-05 [2] CRAN (R 4.3.0)
 lava           1.7.2.1    2023-02-27 [1] CRAN (R 4.3.1)
 lifecycle      1.0.3      2022-10-07 [1] CRAN (R 4.3.0)
 listenv        0.9.0      2022-12-16 [1] CRAN (R 4.3.0)
 lubridate      1.9.2      2023-02-10 [1] CRAN (R 4.3.0)
 magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
 MASS           7.3-60     2023-05-04 [1] CRAN (R 4.3.0)
 Matrix         1.6-0      2023-07-08 [1] CRAN (R 4.3.1)
 ModelMetrics   1.2.2.2    2020-03-17 [1] CRAN (R 4.3.1)
 munsell        0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
 nlme           3.1-162    2023-01-31 [2] CRAN (R 4.3.0)
 nnet           7.3-19     2023-05-03 [1] CRAN (R 4.3.0)
 parallelly     1.36.0     2023-05-26 [1] CRAN (R 4.3.0)
 pillar         1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
 plyr           1.8.8      2022-11-11 [1] CRAN (R 4.3.0)
 pROC           1.18.4     2023-07-06 [1] CRAN (R 4.3.1)
 prodlim        2023.03.31 2023-04-02 [1] CRAN (R 4.3.1)
 proxy          0.4-27     2022-06-09 [1] CRAN (R 4.3.1)
 purrr          1.0.1      2023-01-10 [1] CRAN (R 4.3.0)
 R6             2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
 ragg           1.2.5      2023-01-12 [1] CRAN (R 4.3.0)
 Rcpp           1.0.11     2023-07-06 [1] CRAN (R 4.3.1)
 recipes        1.0.6      2023-04-25 [1] CRAN (R 4.3.1)
 reshape2       1.4.4      2020-04-09 [1] CRAN (R 4.3.0)
 rlang          1.1.1      2023-04-28 [1] CRAN (R 4.3.0)
 rmarkdown      2.23       2023-07-01 [1] CRAN (R 4.3.0)
 rpart          4.1.19     2022-10-21 [2] CRAN (R 4.3.0)
 rstudioapi     0.15.0     2023-07-07 [1] CRAN (R 4.3.1)
 scales         1.2.1      2022-08-20 [1] CRAN (R 4.3.0)
 sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 stringi        1.7.12     2023-01-11 [1] CRAN (R 4.3.0)
 stringr        1.5.0      2022-12-02 [1] CRAN (R 4.3.0)
 survival       3.5-5      2023-03-12 [2] CRAN (R 4.3.0)
 systemfonts    1.0.4      2022-02-11 [1] CRAN (R 4.3.0)
 textshaping    0.3.6      2021-10-13 [1] CRAN (R 4.3.0)
 tibble         3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidyselect     1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
 timechange     0.2.0      2023-01-11 [1] CRAN (R 4.3.0)
 timeDate       4022.108   2023-01-07 [1] CRAN (R 4.3.0)
 utf8           1.2.3      2023-01-31 [1] CRAN (R 4.3.0)
 vctrs          0.6.3      2023-06-14 [1] CRAN (R 4.3.1)
 withr          2.5.0      2022-03-03 [1] CRAN (R 4.3.0)
 xfun           0.39       2023-04-20 [1] CRAN (R 4.3.0)
 yaml           2.3.7      2023-01-23 [1] CRAN (R 4.3.0)

 [1] C:/Users/Tristan/AppData/Local/R/win-library/4.3
 [2] C:/Program Files/R/R-4.3.0/library

──────────────────────────────────────────────────────────────────────────────

Footnotes

  1. Technically, caret uses the sensitivity, specificity and prevalence form of the PPV calculation.↩︎