| Title: | Summarize Data for Publication |
|---|---|
| Description: | Tools for formatting and summarizing data for outcomes research. |
| Authors: | Eric Finnesgard [aut, cre] |
| Maintainer: | Eric Finnesgard <[email protected]> |
| License: | LGPL (>= 2) |
| Version: | 0.3.0 |
| Built: | 2026-06-01 11:04:15 UTC |
| Source: | https://github.com/efinite/utile.tools |
Calculates chunk indices of a data object for a given chunk size (number of items per chunk).
calc_chunks(x, size = 10, reverse = FALSE)calc_chunks(x, size = 10, reverse = FALSE)
x |
A data frame or vector. |
size |
An integer. The number of items (e.g. rows in a tibble) that make up a given chunk. Must be a positive integer. Caps out at data maximum. |
reverse |
A logical. Calculate chunks from back to front. |
An iterable list of row indices for each chunk of data.
# Create chunk map for a data frame chunks <- calc_chunks(mtcars, size = 6) # Iterate through chunks of data for (chunk in chunks) print(paste0(rownames(mtcars[chunk,]), collapse = ', '))# Create chunk map for a data frame chunks <- calc_chunks(mtcars, size = 6) # Iterate through chunks of data for (chunk in chunks) print(paste0(rownames(mtcars[chunk,]), collapse = ', '))
Calculates the duration of time between two provided date objects.
Supports vectorized data (i.e. dplyr::mutate()).
calc_duration(x, y, units = NULL)calc_duration(x, y, units = NULL)
x |
A date or datetime. The start date(s)/timestamp(s). |
y |
A date or datetime. The end date(s)/timestamp(s). |
units |
A character. Units of the returned duration (i.e. 'seconds', 'days', 'years'). |
If 'units' specified, returns numeric. If 'units' unspecified,
returns an object of class 'Duration'.
Supports multiple calculations against a single time point (i.e. multiple start dates with a single end date). Note that start and end must otherwise be of the same length.
When the start and end dates are of different types (i.e. x = date, y = datetime), a lossy cast will be performed which strips the datetime data of its time components. This is done to avoid an assumption of more time passing that would otherwise come with casting the date data to datetime.
library(lubridate) library(purrr) # Dates -> duration in years calc_duration( x = mdy(map_chr(sample(1:9, 5), ~ paste0('01/01/199', .x))), y = mdy(map_chr(sample(1:9, 5), ~ paste0('01/01/200', .x))), units = 'years' ) # datetimes -> durations calc_duration( x = mdy_hm(map_chr(sample(1:9, 5), ~ paste0('01/01/199', .x, ' 1', .x, ':00'))), y = mdy_hm(map_chr(sample(1:9, 5), ~ paste0('01/01/200', .x, ' 0', .x, ':00'))) ) # Mixed date classes -> durations calc_duration( x = mdy(map_chr(sample(1:9, 5), ~ paste0('01/01/199', .x))), y = mdy_hm(map_chr(sample(1:9, 5), ~ paste0('01/01/200', .x, ' 0', .x, ':00'))) )library(lubridate) library(purrr) # Dates -> duration in years calc_duration( x = mdy(map_chr(sample(1:9, 5), ~ paste0('01/01/199', .x))), y = mdy(map_chr(sample(1:9, 5), ~ paste0('01/01/200', .x))), units = 'years' ) # datetimes -> durations calc_duration( x = mdy_hm(map_chr(sample(1:9, 5), ~ paste0('01/01/199', .x, ' 1', .x, ':00'))), y = mdy_hm(map_chr(sample(1:9, 5), ~ paste0('01/01/200', .x, ' 0', .x, ':00'))) ) # Mixed date classes -> durations calc_duration( x = mdy(map_chr(sample(1:9, 5), ~ paste0('01/01/199', .x))), y = mdy_hm(map_chr(sample(1:9, 5), ~ paste0('01/01/200', .x, ' 0', .x, ':00'))) )
Creates a factory function which returns a different chunk of a given data object with each function call.
chunk_data_(x, size = 10, reverse = FALSE)chunk_data_(x, size = 10, reverse = FALSE)
x |
A data frame or vector. |
size |
An integer. The number of items (e.g. rows in a tibble) that make up a given chunk. Must be a positive integer. |
reverse |
A logical. Calculate chunks from back to front. |
A factory function which returns a chunk of data from the provided object with each call. Once all data has been returned, function returns NULL perpetually.
# Create chunk factory function chunked_data <- chunk_data_(mtcars, size = 6) # Chunk #1 (rows 1-6) paste0(rownames(chunked_data()), collapse = ', ') # Chunk #2 (rows 7-12) paste0(rownames(chunked_data()), collapse = ', ')# Create chunk factory function chunked_data <- chunk_data_(mtcars, size = 6) # Chunk #1 (rows 1-6) paste0(rownames(chunked_data()), collapse = ', ') # Chunk #2 (rows 7-12) paste0(rownames(chunked_data()), collapse = ', ')
Calculates the cumulative sum of failures for a series of procedures which can be used to create CUSUM charts.
cusum_failure(xi, p0, p1, by = NULL, alpha = 0.05, beta = 0.05)cusum_failure(xi, p0, p1, by = NULL, alpha = 0.05, beta = 0.05)
xi |
An integer. The dichotomous outcome variable (1 = Failure, 0 = Success) for the i-th procedure. |
p0 |
A double. The acceptable event rate. |
p1 |
A double. The unacceptable event rate. |
by |
A factor. Optional variable to stratify procedures by. |
alpha |
A double. The Type I Error rate. Probability of rejecting the null hypothesis when 'p0' is the true rate. |
beta |
A double. The Type II Error rate. Probability of failing to reject null hypothesis when it is false. |
An object of class data.frame.
Rogers, C. A., Reeves, B. C., Caputo, M., Ganesh, J. S., Bonser, R. S., & Angelini, G. D. (2004). Control chart methods for monitoring cardiac surgical performance and their interpretation. The Journal of Thoracic and Cardiovascular Surgery, 128(6), 811-819.
library(purrr) library(ggplot2) # Data df <- data.frame( xi = simplify( map( c(.1,.08,.05,.1,.13,.14,.14,.09,.25), ~ rbinom(50,1,.x))), p0 = simplify( map( c(.1,.1,.1,.1,.1,.1,.1,.15,.2), ~ rnorm(50,.x,.03))), by = rep( factor(paste('Subject', c('A','B','C'))), times = c(150,150,150)) ) # Overall event rate p0 <- sum(df$xi) / nrow(df) # Create CUSUM plot cusum_failure( xi = df$xi, p0 = p0, p1 = p0 * 1.5, by = df$by ) |> ggplot(aes(y = cusum, x = i)) + geom_step() + geom_line(mapping = aes(y = l0), linetype = 2) + geom_line(mapping = aes(y = l1), linetype = 2) + ylab("Cumulative Failures") + xlab("Case Number") + facet_wrap(~ by) + theme_bw()library(purrr) library(ggplot2) # Data df <- data.frame( xi = simplify( map( c(.1,.08,.05,.1,.13,.14,.14,.09,.25), ~ rbinom(50,1,.x))), p0 = simplify( map( c(.1,.1,.1,.1,.1,.1,.1,.15,.2), ~ rnorm(50,.x,.03))), by = rep( factor(paste('Subject', c('A','B','C'))), times = c(150,150,150)) ) # Overall event rate p0 <- sum(df$xi) / nrow(df) # Create CUSUM plot cusum_failure( xi = df$xi, p0 = p0, p1 = p0 * 1.5, by = df$by ) |> ggplot(aes(y = cusum, x = i)) + geom_step() + geom_line(mapping = aes(y = l0), linetype = 2) + geom_line(mapping = aes(y = l1), linetype = 2) + ylab("Cumulative Failures") + xlab("Case Number") + facet_wrap(~ by) + theme_bw()
Calculates the cumulative log likelihood ratio of failure for a series of procedures which can be used to create CUSUM charts.
cusum_loglike(xi, p0, p1, by = NULL, alpha = 0.05, beta = 0.05)cusum_loglike(xi, p0, p1, by = NULL, alpha = 0.05, beta = 0.05)
xi |
An integer. The dichotomous outcome variable (1 = Failure, 0 = Success) for the i-th procedure. |
p0 |
A double. The acceptable event rate. |
p1 |
A double. The unacceptable event rate. |
by |
A factor. Optional variable to stratify procedures by. |
alpha |
A double. The Type I Error rate. Probability of rejecting the null hypothesis when 'p0' is true. |
beta |
A double. The Type II Error rate. Probability of failing to reject null hypothesis when it is false. |
An object of class data.frame.
Rogers, C. A., Reeves, B. C., Caputo, M., Ganesh, J. S., Bonser, R. S., & Angelini, G. D. (2004). Control chart methods for monitoring cardiac surgical performance and their interpretation. The Journal of Thoracic and Cardiovascular Surgery, 128(6), 811-819.
library(purrr) library(ggplot2) # Data df <- data.frame( xi = simplify( map( c(.1,.08,.05,.1,.13,.14,.14,.09,.25), ~ rbinom(50,1,.x))), p0 = simplify( map( c(.1,.1,.1,.1,.1,.1,.1,.15,.2), ~ rnorm(50,.x,.03))), by = rep( factor(paste('Subject', c('A','B','C'))), times = c(150,150,150)) ) # Overall event rate p0 <- sum(df$xi) / nrow(df) # Create CUSUM plot cusum_loglike( xi = df$xi, p0 = p0, p1 = p0 * 1.5, by = df$by ) |> ggplot(aes(y = cusum, x = i)) + geom_step() + geom_hline(aes(yintercept = h0), linetype = 2) + geom_hline(aes(yintercept = h1), linetype = 2) + ylab("Cumulative Log-likelihood Ratio") + xlab("Case Number") + facet_wrap(~ by) + theme_bw()library(purrr) library(ggplot2) # Data df <- data.frame( xi = simplify( map( c(.1,.08,.05,.1,.13,.14,.14,.09,.25), ~ rbinom(50,1,.x))), p0 = simplify( map( c(.1,.1,.1,.1,.1,.1,.1,.15,.2), ~ rnorm(50,.x,.03))), by = rep( factor(paste('Subject', c('A','B','C'))), times = c(150,150,150)) ) # Overall event rate p0 <- sum(df$xi) / nrow(df) # Create CUSUM plot cusum_loglike( xi = df$xi, p0 = p0, p1 = p0 * 1.5, by = df$by ) |> ggplot(aes(y = cusum, x = i)) + geom_step() + geom_hline(aes(yintercept = h0), linetype = 2) + geom_hline(aes(yintercept = h1), linetype = 2) + ylab("Cumulative Log-likelihood Ratio") + xlab("Case Number") + facet_wrap(~ by) + theme_bw()
Calculates the cumulative observed-minus-expected failure for a series of procedures which can be used to create CUSUM charts.
cusum_ome(xi, p0, by = NULL)cusum_ome(xi, p0, by = NULL)
xi |
An integer. The dichotomous outcome variable (1 = Failure, 0 = Success) for the i-th procedure. |
p0 |
A double. The acceptable event rate. |
by |
A factor. Optional variable to stratify procedures by. |
An object of class data.frame.
Rogers, C. A., Reeves, B. C., Caputo, M., Ganesh, J. S., Bonser, R. S., & Angelini, G. D. (2004). Control chart methods for monitoring cardiac surgical performance and their interpretation. The Journal of Thoracic and Cardiovascular Surgery, 128(6), 811-819.
library(purrr) library(ggplot2) # Data df <- data.frame( xi = simplify( map( c(.1,.08,.05,.1,.13,.14,.14,.09,.25), ~ rbinom(50,1,.x))), p0 = simplify( map( c(.1,.1,.1,.1,.1,.1,.1,.15,.2), ~ rnorm(50,.x,.03))), by = rep( factor(paste('Subject', c('A','B','C'))), times = c(150,150,150)) ) # Create CUSUM plot cusum_ome( xi = df$xi, p0 = df$p0, by = df$by ) |> ggplot(aes(x = i, y = cusum)) + geom_hline(yintercept = 0, linetype = 6, linewidth = 0.5) + geom_step() + ylab("Cumulative Observed Minus Expected Failures") + xlab("Case Number") + facet_wrap(~ by) + theme_bw()library(purrr) library(ggplot2) # Data df <- data.frame( xi = simplify( map( c(.1,.08,.05,.1,.13,.14,.14,.09,.25), ~ rbinom(50,1,.x))), p0 = simplify( map( c(.1,.1,.1,.1,.1,.1,.1,.15,.2), ~ rnorm(50,.x,.03))), by = rep( factor(paste('Subject', c('A','B','C'))), times = c(150,150,150)) ) # Create CUSUM plot cusum_ome( xi = df$xi, p0 = df$p0, by = df$by ) |> ggplot(aes(x = i, y = cusum)) + geom_hline(yintercept = 0, linetype = 6, linewidth = 0.5) + geom_step() + ylab("Cumulative Observed Minus Expected Failures") + xlab("Case Number") + facet_wrap(~ by) + theme_bw()
Calculates the risk-adjusted sequential probability ratio test for a series of procedures which can be used to create CUSUM charts.
cusum_sprt(xi, p0, OR, by = NULL, alpha = 0.05, beta = 0.05)cusum_sprt(xi, p0, OR, by = NULL, alpha = 0.05, beta = 0.05)
xi |
An integer. The dichotomous outcome variable (1 = Failure, 0 = Success) for the i-th procedure. |
p0 |
A double. The individual acceptable event rate for each individual procedure (adjusted). |
OR |
A double. An odds-ratio reflecting the increase in relative risk of failure. |
by |
A factor. Optional variable to stratify procedures by. |
alpha |
A double. The Type I Error rate. Probability of rejecting the null hypothesis when 'p0' is true. |
beta |
A double. The Type II Error rate. Probability of failing to reject null hypothesis when it is false. |
An object of class data.frame.
Rogers, C. A., Reeves, B. C., Caputo, M., Ganesh, J. S., Bonser, R. S., & Angelini, G. D. (2004). Control chart methods for monitoring cardiac surgical performance and their interpretation. The Journal of Thoracic and Cardiovascular Surgery, 128(6), 811-819.
library(purrr) library(ggplot2) # Data df <- data.frame( xi = simplify( map( c(.1,.08,.05,.1,.13,.14,.14,.09,.25), ~ rbinom(50,1,.x))), p0 = simplify( map( c(.1,.1,.1,.1,.1,.1,.1,.15,.2), ~ rnorm(50,.x,.03))), by = rep( factor(paste('Subject', c('A','B','C'))), times = c(150,150,150)) ) # Create CUSUM plot cusum_sprt( xi = df$xi, p0 = df$p0, OR = 1.5, by = df$by ) |> ggplot(aes(y = cusum, x = i)) + geom_step() + geom_hline(aes(yintercept = h0), linetype = 2) + geom_hline(aes(yintercept = h1), linetype = 2) + ylab("Cumulative Log-likelihood Ratio") + xlab("Case Number") + facet_wrap(~ by) + theme_bw()library(purrr) library(ggplot2) # Data df <- data.frame( xi = simplify( map( c(.1,.08,.05,.1,.13,.14,.14,.09,.25), ~ rbinom(50,1,.x))), p0 = simplify( map( c(.1,.1,.1,.1,.1,.1,.1,.15,.2), ~ rnorm(50,.x,.03))), by = rep( factor(paste('Subject', c('A','B','C'))), times = c(150,150,150)) ) # Create CUSUM plot cusum_sprt( xi = df$xi, p0 = df$p0, OR = 1.5, by = df$by ) |> ggplot(aes(y = cusum, x = i)) + geom_step() + geom_hline(aes(yintercept = h0), linetype = 2) + geom_hline(aes(yintercept = h1), linetype = 2) + ylab("Cumulative Log-likelihood Ratio") + xlab("Case Number") + facet_wrap(~ by) + theme_bw()
An augmented version of base::paste() with options
to manage 'NA' values.
paste(..., sep = " ", collapse = NULL, na.rm = FALSE) paste0(..., collapse = NULL, na.rm = FALSE)paste(..., sep = " ", collapse = NULL, na.rm = FALSE) paste0(..., collapse = NULL, na.rm = FALSE)
... |
R objects to be converted to character vectors. |
sep |
A character. A string to separate the terms. |
collapse |
A character. An string to separate the results. |
na.rm |
A logical. Whether to remove NA values from 'x'. |
Character vector of concatenated values.
# Base paste() NA handling behavior paste( 'The', c('red', NA_character_, 'orange'), 'fox jumped', NA_character_, 'over the fence.', collapse = ' ' ) # Removal of NA values paste( 'The', c('red', NA_character_, 'orange'), 'fox jumped', NA_character_, 'over the fence.', collapse = ' ', na.rm = TRUE )# Base paste() NA handling behavior paste( 'The', c('red', NA_character_, 'orange'), 'fox jumped', NA_character_, 'over the fence.', collapse = ' ' ) # Removal of NA values paste( 'The', c('red', NA_character_, 'orange'), 'fox jumped', NA_character_, 'over the fence.', collapse = ' ', na.rm = TRUE )
Creates a formatted event-free-survival from a survfit object and a specified time point.
paste_efs(x, times, percent.sign = TRUE, digits = 1)paste_efs(x, times, percent.sign = TRUE, digits = 1)
x |
A |
times |
A numeric. Indicates time-points of interest. Units are whatever was used to create the survival fit. |
percent.sign |
A logical. Indicates percent sign should be printed for frequencies. |
digits |
Integer. Number of digits to round to. |
A named character vector of event-free survival(s).
library(survival) fit <- survfit(Surv(time, status) ~ 1, data = diabetic) paste_efs(fit, c(1, 3, 5))library(survival) fit <- survfit(Surv(time, status) ~ 1, data = diabetic) paste_efs(fit, c(1, 3, 5))
Creates a formatted frequency from count(able) data. Automatically tallies non-numeric data types (nrow or length) and supports vectorized data methods.
paste_freq(x, y, na.rm = TRUE, percent.sign = TRUE, digits = 1)paste_freq(x, y, na.rm = TRUE, percent.sign = TRUE, digits = 1)
x |
A data.frame, numeric, or non-numeric. The numerator. |
y |
A data.frame, numeric, or non-numeric. The denominator. A single denominator may be used for multiple numerators or one denominator for each numerator. |
na.rm |
A logical. Whether to ignore NA's when tallying non-numeric data. |
percent.sign |
A logical. Indicates percent sign should be printed with frequencies. |
digits |
An integer. Number of digits to round to. |
A character vector of count(s) with frequencies.
# Numeric paste_freq(20, 100) # data.frame df <- data.frame(x = c(1:100), y = TRUE) paste_freq(df[1:20,], df) # Mixed data types paste_freq(20, df) # Single denominator for multiple numerators paste_freq(c(10,20,30), 100)# Numeric paste_freq(20, 100) # data.frame df <- data.frame(x = c(1:100), y = TRUE) paste_freq(df[1:20,], df) # Mixed data types paste_freq(20, df) # Single denominator for multiple numerators paste_freq(c(10,20,30), 100)
Creates a formatted mean with standard deviation from numeric data.
paste_mean(x, less.than.one = FALSE, digits = 1)paste_mean(x, less.than.one = FALSE, digits = 1)
x |
A numeric. Data to summarize. |
less.than.one |
A logical. Indicates a mean that rounds to 0 should be printed as <1. |
digits |
An integer. Number of digits to round to. |
A character vector of the mean(s) with standard deviation(s).
paste_mean(mtcars$mpg)paste_mean(mtcars$mpg)
Creates a formatted median with inter-quartile range from numeric data.
paste_median(x, less.than.one = FALSE, digits = 1)paste_median(x, less.than.one = FALSE, digits = 1)
x |
A numeric. Data to summarize. |
less.than.one |
A logical. Indicates a median that rounds to 0 should be printed as <1. |
digits |
An integer. Number of digits to round to. |
A character vector of the median(s) with interquartile range(s).
paste_median(mtcars$mpg)paste_median(mtcars$mpg)
Creates a human-readable p.value using sensible defaults for 'format.pval()'.
paste_pval(x, digits = 1, p.digits = 4)paste_pval(x, digits = 1, p.digits = 4)
x |
A numeric. P-value to format. |
digits |
A numeric. Number of significant digits to round to. |
p.digits |
A numeric. Minimum number of digits to right of the decimal point. |
paste_pval(0.061126e-10)paste_pval(0.061126e-10)
Tests the null hypothesis that there is no difference between grouped data.
test_hypothesis( x, y, test, digits, p.digits, simulate.p.value, B, workspace, ... ) ## S3 method for class 'numeric' test_hypothesis( x, y, test = c("anova", "kruskal", "wilcoxon"), digits = 1, p.digits, ... ) ## S3 method for class 'factor' test_hypothesis( x, y, test = c("chisq", "fisher"), digits = 1, p.digits, simulate.p.value = FALSE, B = 2000, workspace = 2e+07, ... ) ## S3 method for class 'logical' test_hypothesis( x, y, test = c("chisq", "fisher"), digits = 1, p.digits, simulate.p.value = FALSE, B = 2000, workspace = 2e+07, ... )test_hypothesis( x, y, test, digits, p.digits, simulate.p.value, B, workspace, ... ) ## S3 method for class 'numeric' test_hypothesis( x, y, test = c("anova", "kruskal", "wilcoxon"), digits = 1, p.digits, ... ) ## S3 method for class 'factor' test_hypothesis( x, y, test = c("chisq", "fisher"), digits = 1, p.digits, simulate.p.value = FALSE, B = 2000, workspace = 2e+07, ... ) ## S3 method for class 'logical' test_hypothesis( x, y, test = c("chisq", "fisher"), digits = 1, p.digits, simulate.p.value = FALSE, B = 2000, workspace = 2e+07, ... )
x |
A numeric, factor, or logical. Observations. |
y |
A factor or logical. Categorical "by" grouping variable. |
test |
A character. Name of the statistical test to use. See note. |
digits |
An integer. Number of digits to round to. |
p.digits |
An integer. The number of p-value digits to the right of the decimal point. Note that p-values are still rounded using 'digits'. |
simulate.p.value |
A logical. Whether p-values in nominal variable testing should be computed with Monte Carlo simulation. |
B |
An integer. Number of replicates to use in Monte Carlo simulation for nominal testing. |
workspace |
An integer. Size of the workspace used for the Fisher's Exact Test network algorithm. |
... |
Additional arguments passed to the appropriate S3 method. |
A list containing the statistical test performed, test statistic, and p-value.
Statistical testing used is dependent on type of 'x' data. Supported testing for numeric data includes ANOVA ('anova'), Kruskal-Wallis ('kruskal'), and Wilcoxon Rank Sum ('wilcoxon') tests. For categorical data, supported testings includes Pearson's Chi-squared ('chisq') and Fisher's Exact Test ('fisher').
strata <- as.factor(mtcars$cyl) # Numeric data test_hypothesis(mtcars$mpg, strata) # Logical data test_hypothesis(as.logical(mtcars$vs), strata) # Factor data test_hypothesis(as.factor(mtcars$carb), strata)strata <- as.factor(mtcars$cyl) # Numeric data test_hypothesis(mtcars$mpg, strata) # Logical data test_hypothesis(as.logical(mtcars$vs), strata) # Factor data test_hypothesis(as.factor(mtcars$carb), strata)