--- title: "Data sets in the heplots package" author: Michael Friendly date: "`r Sys.Date()`" package: heplots output: bookdown::html_document2: base_format: rmarkdown::html_vignette fig_caption: yes toc: true pkgdown: as_is: true bibliography: "HE-examples.bib" link-citations: yes csl: apa.csl vignette: > %\VignetteIndexEntry{Data sets in the heplots package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( message = FALSE, warning = FALSE, fig.height=5, fig.width=5, # results='hide', # fig.keep='none', fig.path='fig/datasets-', echo=TRUE, collapse = TRUE, comment = "#>" ) ``` ```{r setup, echo=FALSE} set.seed(1071) options(width=80, digits=5, continue=" ") library(heplots) library(candisc) library(ggplot2) library(dplyr) ``` ## Documenting package datasets {-} Datasets used in package examples are such an important part of making a package understandable and usable, but is often overlooked. In developing the `heplots` package I collected a large collection of data sets illustrating a variety of multivariate linear models with some an analyses, and graphical displays. Each of these have much more than the usual stub examples, that often look like: ```{r eval=FALSE} data(dataset) # str(dataset); plot(dataset) ``` But `.Rd`, and now `roxygen`, don't make it easy to work with numerous datasets in a package, or, more importantly, to document what they illustrate. I'm showing the work to create this vignette, in case these ideas are useful to others. In this release, I started with a file generated by: ```{r} vcdExtra::datasets("heplots") |> head(4) ``` Then, in the roxygen documentation, I added `@concept` tags to classify these datasets according to methods used. (`@concept` entries are indexed with the package, so they work via `help.search()`) For example, the documentation for the `AddHealth` data contains these lines: ```{r eval=FALSE} #' @name AddHealth #' @docType data ... #' @keywords datasets #' @concept MANOVA #' @concept ordered ``` With standard processing, these concepts along with the keywords, appear in the **Index** section of the manual constructed by `devtools::build_manual()`. In the `pkgdown` site for this package, they are also searchable in the **search** box. With a bit of extra processing, I created a dataset [datasets.csv](https://raw.githubusercontent.com/friendly/heplots/master/extra/datasets.csv) used below. ## Methods {-} The main methods used in the example datasets are shown in the table below: * **MANOVA**: Multivariate analysis of variance * **MANCOVA**: Multivariate of covariance * **MMRA**: Multivariate multiple regression * **cancor**: Canonical correlation (using the [candisc](https://github.com/friendly/candisc/) package) * **candisc**: Canonical discriminant analysis (using [candisc](https://github.com/friendly/candisc/)) * **repeated**: Repeated measures designs, analyzed from the multivariate perspective * **robust**: Robust estimation of MLMs In addition, a few examples illustrate special handling for linear hypotheses concerning factors: * **ordered**: ordered factors * **contrasts**: other contrasts The dataset names are linked to the documentation with graphical output on the `pkgdown` website, []. ## Dataset table {-} ```{r datasets} library(here) library(dplyr) library(tinytable) #dsets <- read.csv(here::here("extra", "datasets.csv")) # doesn't work in a vignette dsets <- read.csv("https://raw.githubusercontent.com/friendly/heplots/master/extra/datasets.csv") dsets <- dsets |> dplyr::select(-X) |> arrange(tolower(dataset)) # link dataset to pkgdown doc refurl <- "http://friendly.github.io/heplots/reference/" dsets <- dsets |> mutate(dataset = glue::glue("[{dataset}]({refurl}{dataset}.html)")) #knitr::kable(dsets) tinytable::tt(dsets) |> format_tt(markdown = TRUE) ``` ## Concept table {-} This table can be inverted to list the datasets that illustrate each concept: ```{r concepts} concepts <- dsets |> select(dataset, tags) |> tidyr::separate_longer_delim(tags, delim = " ") |> arrange(tags, dataset) |> summarize(datasets = toString(dataset), .by = tags) |> rename(concept = tags) #knitr::kable(concepts) tinytable::tt(concepts) |> format_tt(markdown = TRUE) ```