--- title: "Datasets for categorical data analysis" author: "Michael Friendly" date: "`r Sys.Date()`" package: vcdExtra output: rmarkdown::html_vignette: fig_caption: yes bibliography: ["vcd.bib", "vcdExtra.bib"] csl: apa.csl vignette: > %\VignetteIndexEntry{Datasets for categorical data analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, message = FALSE, warning = FALSE, fig.height = 6, fig.width = 7, fig.path = "fig/datasets-", dev = "png", comment = "##" ) # save some typing knitr::set_alias(w = "fig.width", h = "fig.height", cap = "fig.cap") ``` The `vcdExtra` package contains `r nrow(vcdExtra::datasets("vcdExtra"))` datasets, taken from the literature on categorical data analysis, and selected to illustrate various methods of analysis and data display. These are in addition to the `r nrow(vcdExtra::datasets("vcd"))` datasets in the [vcd package](https://cran.r-project.org/package=vcd). To make it easier to find those which illustrate a particular method, the datasets in `vcdExtra` have been classified using method tags. This vignette creates an "inverse table", listing the datasets that apply to each method. It also illustrates a general method for classifying datasets in R packages. ```{r load} library(dplyr) library(tidyr) library(readxl) ``` ## Processing tags Using the result of `vcdExtra::datasets(package="vcdExtra")` I created a spreadsheet, `vcdExtra-datasets.xlsx`, and then added method tags. ```{r read-datasets} dsets_tagged <- read_excel(here::here("inst", "extdata", "vcdExtra-datasets.xlsx"), sheet="vcdExtra-datasets") dsets_tagged <- dsets_tagged |> dplyr::select(-Title, -dim) |> dplyr::rename(dataset = Item) head(dsets_tagged) ``` To invert the table, need to split tags into separate observations, then collapse the rows for the same tag. ```{r split-tags} dset_split <- dsets_tagged |> tidyr::separate_longer_delim(tags, delim = ";") |> dplyr::mutate(tag = stringr::str_trim(tags)) |> dplyr::select(-tags) #' ## collapse the rows for the same tag tag_dset <- dset_split |> arrange(tag) |> dplyr::group_by(tag) |> dplyr::summarise(datasets = paste(dataset, collapse = "; ")) |> ungroup() # get a list of the unique tags unique(tag_dset$tag) ``` ## Make this into a nice table Another sheet in the spreadsheet gives a more descriptive `topic` for corresponding to each tag. ```{r read-tags} tags <- read_excel(here::here("inst", "extdata", "vcdExtra-datasets.xlsx"), sheet="tags") head(tags) ``` Now, join this with the `tag_dset` created above. ```{r join-tags} tag_dset <- tag_dset |> dplyr::left_join(tags, by = "tag") |> dplyr::relocate(topic, .after = tag) tag_dset |> dplyr::select(-tag) |> head() ``` ### Add links to `help()` We're almost there. It would be nice if the dataset names could be linked to their documentation. This function is designed to work with the `pkgdown` site. There are different ways this can be done, but what seems to work is a link to `../reference/{dataset}.html` Unfortunately, this won't work in the actual vignette. ```{r add-links} add_links <- function(dsets, style = c("reference", "help", "rdrr.io"), sep = "; ") { style <- match.arg(style) names <- stringr::str_split_1(dsets, sep) names <- dplyr::case_when( style == "help" ~ glue::glue("[{names}](help({names}))"), style == "reference" ~ glue::glue("[{names}](../reference/{names}.html)"), style == "rdrr.io" ~ glue::glue("[{names}](https://rdrr.io/cran/vcdExtra/man/{names}.html)") ) glue::glue_collapse(names, sep = sep) } ``` ## Make the table {#table} Use `purrr::map()` to apply `add_links()` to all the datasets for each tag. (`mutate(datasets = add_links(datasets))` by itself doesn't work.) ```{r kable} tag_dset |> dplyr::select(-tag) |> dplyr::mutate(datasets = purrr::map(datasets, add_links)) |> knitr::kable() ``` Voila!