# Simulate tab completion (what happens when we press Tab in after
# `knitted_` in RStudio). The definition of `tab()` is not important.
"knitted_" %>% tab()
#> knitted_doc_url
#> knitted_when
#> knitted_where
#> knitted_with
I am going to describe my favorite knitr trick: Using lists to simplify inline reporting. Trick might not do it justice. I consider this a best practice for working with knitr.
Plug it in
Inline reporting lets you insert R expressions inside of markdown text. Those expressions are evaluated and their results are plugged in as text. The following example shows a common use case: Reporting descriptive statistics. The gamair::sitka
dataset describes the longitudinal growth of Sitka spruce trees in different ozone conditions, so here are some lines we might report:
```{r}
library(magrittr)
data("sitka", package = "gamair")
<- length(levels(sitka$id.num))
n_trees <- length(unique(sitka$ozone))
n_conditions ```
`r nrow(sitka)` tree size measurements
The dataset contains `r n_trees` trees grown in
from `r n_conditions` ozone treatment conditions.
which produces
The dataset contains 1027 tree size measurements
from 0 trees grown in 2 ozone treatment conditions.
If we update the dataset, the numbers will update automatically when the document is reknitted. Itβs just magical. Besides reporting statistics, I routinely use inline reporting for package versions, dates, file provenance, links to package documentation, and emoji. Here is an example of each:
```{r}
<- format(Sys.Date())
knitted_when <- knitr::current_input()
knitted_where <- packageVersion("knitr")
knitted_with <- downlit::autolink_url("knitr::knit()")
knitted_doc_url ```
`r knitted_when` from ``r knitted_where``
Reported prepared on `r knitted_with` `r emo::ji('smile')`.
with knitr version [`knitr::knit()`](`r knitted_doc_url`)`.
Read more about
::::
`index.rmarkdown`
Reported prepared on 2023-07-27 from
with knitr version 1.43 π.[`knitr::knit()`](https://rdrr.io/pkg/knitr/man/knit.html)`. Read more about
Are your variable names doing a listβs job?
In this last example, I used prefixes in variable names to convey that the data were related. knitted_when
, knitted_where
and knitted_with
are all facts about the knitting process. They are all reported in the narrative text pretty close to each other. The prefix informally bundles them together. The prefix also helps with writing our code because we have to remember less. We can type knitted_
and press Tab and let autocompletion remind us which variables are available.
Butβand here is the key insightβwhat if we change that underscore _
into a dollar sign $
, so to speak? That is, letβs bundle everything into a list and then report the results by accessing list elements.
```{r}
<- list(
knitted when = format(Sys.Date()),
where = knitr::current_input(),
with = packageVersion("knitr"),
doc_url = downlit::autolink_url("knitr::knit()")
)```
`r knitted$when` from ``r knitted$where``
Reported prepared on `r knitted$with` `r emo::ji('happy')`.
with knitr version [`knitr::knit()`](`r knitted$doc_url`)`.
Read more about
::::
`index.rmarkdown`
Reported prepared on 2023-07-27 from
with knitr version 1.43 π. [`knitr::knit()`](https://rdrr.io/pkg/knitr/man/knit.html)`. Read more about
We have structured names, and we still retain our Tab completion:
"knitted$" %>% tab()
#> knitted$when
#> knitted$where
#> knitted$with
#> knitted$doc_url
But we can also glance at our list to see everything about the knitting process all at once.
knitted
#> $when
#> [1] "2023-07-27"
#>
#> $where
#> [1] "index.rmarkdown"
#>
#> $with
#> [1] '1.43'
#>
#> $doc_url
#> [1] "https://rdrr.io/pkg/knitr/man/knit.html"
Basically, using a list formalizes the relationship we had implicitly set out by using our naming convention, but so what? How does this help inline reporting? Lists have all the nice benefits of using a naming convention, plus one important feature: We can create lists programmatically.
Set up model results with tidy()
Letβs say we model the growth of each tree in each ozone condition and want to know how much steeper the growth rate is for the ozone treatment condition. We fit a linear mixed model where we estimate the population averages for the intercept and slope for each ozone condition, and we use random effects to accord each tree its own intercept and slope.
library(lme4)
#> Warning: package 'lme4' was built under R version 4.3.1
#> Loading required package: Matrix
#> Warning: package 'Matrix' was built under R version 4.3.1
#>
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#>
#> expand, pack, unpack
# Rescale to get improve convergence
sitka$hund_days <- sitka$days / 100
m <- lmer(
log.size ~ hund_days * ozone + (hund_days | id.num),
sitka
)
summary(m)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: log.size ~ hund_days * ozone + (hund_days | id.num)
#> Data: sitka
#>
#> REML criterion at convergence: 767
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.6836 -0.4723 0.0078 0.4237 2.5919
#>
#> Random effects:
#> Groups Name Variance Std.Dev. Corr
#> id.num (Intercept) 0.390928 0.62524
#> hund_days 0.002459 0.04959 -0.23
#> Residual 0.082770 0.28770
#> Number of obs: 1027, groups: id.num, 79
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 4.25491 0.13100 32.481
#> hund_days 0.33903 0.01278 26.528
#> ozone -0.14101 0.15844 -0.890
#> hund_days:ozone -0.03611 0.01546 -2.336
#>
#> Correlation of Fixed Effects:
#> (Intr) hnd_dy ozone
#> hund_days -0.345
#> ozone -0.827 0.286
#> hund_dys:zn 0.286 -0.827 -0.345
Our job is get to numbers from this summary view into prose. For this example, we want to report that the two groups donβt have a statistically clear difference in their intercepts, as given by the ozone
line in the model summary. We also want to report that growth per 100 days is statistically significantly slower in the ozone group hund_days:ozone
.
First, we tidy()
the model summary using broom.mixed.
library(tidyverse)
library(broom.mixed)
tidy(m, conf.int = TRUE) %>%
filter(effect == "fixed")
#> # A tibble: 4 Γ 8
#> effect group term estimate std.error statistic conf.low conf.high
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 fixed <NA> (Intercept) 4.25 0.131 32.5 4.00 4.51
#> 2 fixed <NA> hund_days 0.339 0.0128 26.5 0.314 0.364
#> 3 fixed <NA> ozone -0.141 0.158 -0.890 -0.452 0.170
#> 4 fixed <NA> hund_days:ozone -0.0361 0.0155 -2.34 -0.0664 -0.00581
We are also going to format the numbers. I have my own R package for this job called printy. Below I use it to round numbersβround()
drops 0s off the ends of rounded numbers whereas printy::fmt_fixed_digits()
keeps them. I also use it for formatting minus signs.
text_ready <- tidy(m, conf.int = TRUE) %>%
filter(effect == "fixed") %>%
mutate(
# round the numbers
across(
c(estimate, conf.low, conf.high),
printy::fmt_fix_digits,
2
),
se = printy::fmt_fix_digits(std.error, 3),
# use a minus sign instead of a hyphen for negative numbers
across(
c(estimate, conf.low, conf.high),
printy::fmt_minus_sign
),
ci = glue::glue("[{conf.low}, {conf.high}]")
) %>%
select(term, estimate, se, ci)
#> Warning: There was 1 warning in `mutate()`.
#> βΉ In argument: `across(...)`.
#> Caused by warning:
#> ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
#> Supply arguments directly to `.fns` through an anonymous function instead.
#>
#> # Previously
#> across(a:b, mean, na.rm = TRUE)
#>
#> # Now
#> across(a:b, \(x) mean(x, na.rm = TRUE))
text_ready
#> # A tibble: 4 Γ 4
#> term estimate se ci
#> <chr> <chr> <chr> <glue>
#> 1 (Intercept) 4.25 0.131 [4.00, 4.51]
#> 2 hund_days 0.34 0.013 [0.31, 0.36]
#> 3 ozone −0.14 0.158 [−0.45, 0.17]
#> 4 hund_days:ozone −0.04 0.015 [−0.07, −0.01]
We could use dataframe functions to filter()
down to the down the terms and pull()
the values and use a list:
```{r}
<- list()
stats $b_intercept <- text_ready %>%
statsfilter(term == "(Intercept)") %>%
pull(estimate)
```
`r stats$b_intercept` units.
The average log-size in the control condition was
::::
The average log-size in the control condition was 4.25 units.
(The documentation for sitka$log.size
doesnβt say what units the data are in, so Iβm sticking with βunitsβ π.)
split()
makes lists
A much better approach is to use split()
to create a list using the values in a dataframe column. To make the list easier for typing, I use janitor::make_clean_names()
to clean up the term
value.
stats <- text_ready %>%
mutate(term = janitor::make_clean_names(term)) %>%
split(.$term)
Now we have a list of one-row dataframes:
str(stats)
#> List of 4
#> $ hund_days : tibble [1 Γ 4] (S3: tbl_df/tbl/data.frame)
#> ..$ term : chr "hund_days"
#> ..$ estimate: chr "0.34"
#> ..$ se : chr "0.013"
#> ..$ ci : 'glue' chr "[0.31, 0.36]"
#> $ hund_days_ozone: tibble [1 Γ 4] (S3: tbl_df/tbl/data.frame)
#> ..$ term : chr "hund_days_ozone"
#> ..$ estimate: chr "−0.04"
#> ..$ se : chr "0.015"
#> ..$ ci : 'glue' chr "[−0.07, −0.01]"
#> $ intercept : tibble [1 Γ 4] (S3: tbl_df/tbl/data.frame)
#> ..$ term : chr "intercept"
#> ..$ estimate: chr "4.25"
#> ..$ se : chr "0.131"
#> ..$ ci : 'glue' chr "[4.00, 4.51]"
#> $ ozone : tibble [1 Γ 4] (S3: tbl_df/tbl/data.frame)
#> ..$ term : chr "ozone"
#> ..$ estimate: chr "−0.14"
#> ..$ se : chr "0.158"
#> ..$ ci : 'glue' chr "[−0.45, 0.17]"
And we have structured, autocomplete-friendly names too:
Now, we can write up our results with inline reporting:
```{r}
<- text_ready %>%
stats mutate(term = janitor::make_clean_names(term)) %>%
split(.$term)
```
The average log-size in the control condition was `r stats$intercept$estimate` units, 95% Wald CI `r stats$intercept$ci`.
There was not a statistically clear difference between the
ozone conditions for their intercepts (day-0 values), `r stats$ozone$estimate`, `r stats$ozone$ci`.
*B* =
For the control group, the average growth rate was `r stats$hund_days$estimate` log-size units per 100 days,
`r stats$hund_days$ci`. The growth rate for
the ozone treatment group was significantly slower, `r stats$hund_days_ozone$estimate`,
*diff* = `r stats$hund_days_ozone$ci`.
::::
The average log-size in the control condition was [4.00, 4.51].
4.25 units, 95% Wald CI
There was not a statistically clear difference between the
ozone conditions for their intercepts (day-0 values), −0.14, [−0.45, 0.17].
*B* =
For the control group, the average growth rate was
0.34 log-size units per 100 days, [0.31, 0.36]. The growth rate for the ozone treatment group was
−0.04,
significantly slower, *diff* = [−0.07, −0.01].
Isnβt that RMarkdown text just a joy to read? Everything so neatly named and organized, and we got all of that for free by using tidy()
and split()
to make a list.
By the way, we can also make the RMarkdown source even neater by using my WrapRmd RStudio plugin which wraps the text so lines of text are all less than a given width. Some other tools for rewrapping markdown text will insert line breaks inside of spans of inline reporting and break them. I have my RStudio set so that Ctrl+Shift+Alt+/ will rewrap the RMarkdown text.
Splitting splits of splits
I am such a champion of this approach that I wrote my own split function for splitting by multiple variables. In the datasets::mtcars
dataset, suppose we want to report the mean mpg of 6- and 8-cylinder (cyl
) vehicles split by automatic versus manual (am
) vehicles. We compute the stats with some basic dplyring and we prepare names that work better with split()
.
car_means <- mtcars %>%
group_by(cyl, am) %>%
summarise(
n = n(),
mean_mpg = mean(mpg),
.groups = "drop"
) %>%
# make names for spliting
mutate(
a = paste0("am_", am),
c = paste0("cyl_", cyl),
)
car_means
#> # A tibble: 6 Γ 6
#> cyl am n mean_mpg a c
#> <dbl> <dbl> <int> <dbl> <chr> <chr>
#> 1 4 0 3 22.9 am_0 cyl_4
#> 2 4 1 8 28.1 am_1 cyl_4
#> 3 6 0 4 19.1 am_0 cyl_6
#> 4 6 1 3 20.6 am_1 cyl_6
#> 5 8 0 12 15.0 am_0 cyl_8
#> 6 8 1 2 15.4 am_1 cyl_8
Now enter super_split()
:
car_stats <- car_means %>%
printy::super_split(a, c)
# set `max.level` to not print individual tibble structures
str(car_stats, max.level = 3)
#> List of 2
#> $ am_0:List of 3
#> ..$ cyl_4: tibble [1 Γ 6] (S3: tbl_df/tbl/data.frame)
#> ..$ cyl_6: tibble [1 Γ 6] (S3: tbl_df/tbl/data.frame)
#> ..$ cyl_8: tibble [1 Γ 6] (S3: tbl_df/tbl/data.frame)
#> $ am_1:List of 3
#> ..$ cyl_4: tibble [1 Γ 6] (S3: tbl_df/tbl/data.frame)
#> ..$ cyl_6: tibble [1 Γ 6] (S3: tbl_df/tbl/data.frame)
#> ..$ cyl_8: tibble [1 Γ 6] (S3: tbl_df/tbl/data.frame)
Here, we have a list of lists of 1-row dataframes, and we can just use $
to drill down the lists during inline reporting.
`r car_stats$am_0$cyl_4$n` automatic,
The average mpg of the `r car_stats$am_0$cyl_4$mean_mpg`.
four-cylinder cars was
::::
The average mpg of the 3 automatic, four-cylinder cars was 22.9.
For the curious, super_split()
works behind the scenes by exploiting two functions:
-
split()
adds a level of depth to a list by splitting a list into sublists using a variable. -
purrr::map_depth()
applies a function.f
on the lists at a given.depth
.
So the function walks through each variable and applies split()
at successively deeper depths.
The first variable splits the list at depth 0, the second variable splits the sublists at depth 1 (which were created in the prior split), and so on. The business with enquos()
is there to let me refer to the variable names directly.
Session info:
.session_info
#> β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> setting value
#> version R version 4.3.0 (2023-04-21 ucrt)
#> os Windows 11 x64 (build 22621)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.utf8
#> ctype English_United States.utf8
#> tz America/Chicago
#> date 2023-07-27
#> pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> package * version date (UTC) lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.3.1)
#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0)
#> boot 1.3-28.1 2022-11-22 [2] CRAN (R 4.3.0)
#> broom 1.0.5 2023-06-09 [1] CRAN (R 4.3.1)
#> broom.mixed * 0.2.9.4 2022-04-17 [1] CRAN (R 4.3.0)
#> cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0)
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
#> codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.0)
#> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)
#> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1)
#> downlit 0.4.3 2023-06-29 [1] CRAN (R 4.3.0)
#> dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.0)
#> emo 0.0.0.9000 2023-07-27 [1] Github (hadley/emo@3f03b11)
#> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
#> forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
#> furrr 0.3.1 2022-08-15 [1] CRAN (R 4.3.0)
#> future 1.33.0 2023-07-01 [1] CRAN (R 4.3.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
#> ggplot2 * 3.4.2 2023-04-03 [1] CRAN (R 4.3.0)
#> globals 0.16.2 2022-11-21 [1] CRAN (R 4.3.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
#> gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0)
#> hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
#> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0)
#> htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0)
#> janitor 2.2.0 2023-02-02 [1] CRAN (R 4.3.0)
#> jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1)
#> knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0)
#> lattice 0.21-8 2023-04-05 [2] CRAN (R 4.3.0)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
#> listenv 0.9.0 2022-12-16 [1] CRAN (R 4.3.0)
#> lme4 * 1.1-34 2023-07-04 [1] CRAN (R 4.3.1)
#> lubridate * 1.9.2 2023-02-10 [1] CRAN (R 4.3.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
#> MASS 7.3-60 2023-05-04 [1] CRAN (R 4.3.0)
#> Matrix * 1.6-0 2023-07-08 [1] CRAN (R 4.3.1)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0)
#> minqa 1.2.5 2022-10-19 [1] CRAN (R 4.3.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
#> nlme 3.1-162 2023-01-31 [2] CRAN (R 4.3.0)
#> nloptr 2.0.3 2022-05-26 [1] CRAN (R 4.3.0)
#> parallelly 1.36.0 2023-05-26 [1] CRAN (R 4.3.0)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
#> printy 0.0.0.9003 2023-07-12 [1] Github (tjmahr/printy@df0d96e)
#> purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.3.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
#> ragg 1.2.5 2023-01-12 [1] CRAN (R 4.3.0)
#> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1)
#> readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0)
#> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
#> rmarkdown 2.23 2023-07-01 [1] CRAN (R 4.3.0)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)
#> scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
#> snakecase 0.11.0 2019-05-25 [1] CRAN (R 4.3.0)
#> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
#> stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.0)
#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.3.0)
#> textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.3.0)
#> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
#> tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
#> tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
#> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0)
#> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)
#> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.1)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0)
#> xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
#>
#> [1] C:/Users/Tristan/AppData/Local/R/win-library/4.3
#> [2] C:/Program Files/R/R-4.3.0/library
#>
#> ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ