Title: | Quickly Create Elegant Regression Results Tables and Plots when Modelling |
---|---|
Description: | Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'. |
Authors: | Ewen Harrison [aut, cre], Tom Drake [aut], Riinu Pius [aut] |
Maintainer: | Ewen Harrison <[email protected]> |
License: | MIT + file LICENCE |
Version: | 1.0.8 |
Built: | 2025-02-17 05:18:50 UTC |
Source: | https://github.com/ewenharrison/finalfit |
Quickly create elegant final results tables and plots when modelling.
finalfit
model wrappersglmuni, glmmulti, glmmulti_boot, glmmixed, lmuni, lmmulti, lmmixed, coxphuni, coxphmulti, crruni, crrmulti, svyglmuni, svyglmmulti.
finalfit
model extractorGeneric: fit2df
Methods (not called directly): fit2df.glm, fit2df.glmlist, fit2df.glmboot, fit2df.lm, fit2df.lmlist, fit2df.glmerMod, fit2df.lmerMod, fit2df.coxph, fit2df.coxphlist, fit2df.crr, fit2df.crrlist, fit2df.stanfit.
finalfit
all-in-one functionGeneric: finalfit. finalfit_permute.
Methods (not called directly): finalfit.glm, finalfit.lm, finalfit.coxph.
finalfit
plotting functionscoefficient_plot, or_plot, hr_plot, surv_plot, ff_plot.
finalfit
helper functionsff_glimpse, ff_label, ff_merge, ff_interaction.
finalfit
prediction functionsboot_predict, finalfit_newdata.
Methods (not called directly): boot_compare.
finalfit
missing data functionsmissing_glimpse, missing_pattern, missing_compare, missing_plot, missing_pairs.
Not usually called directly. Included in boot_predict
. Usually used in combination with A function that takes the output from summary_factorlist(...,
fit_id=TRUE)
and merges with any number of model dataframes, usually
produced with a model wrapper followed by the fit2df()
function
(see examples).
boot_compare( bs.out, confint_level = 0.95, confint_sep = " to ", comparison = "difference", condense = TRUE, compare_name = NULL, digits = c(2, 3), ref_symbol = 1 )
boot_compare( bs.out, confint_level = 0.95, confint_sep = " to ", comparison = "difference", condense = TRUE, compare_name = NULL, digits = c(2, 3), ref_symbol = 1 )
bs.out |
Output from |
confint_level |
The confidence level to use for the confidence interval. Must be strictly greater than 0 and less than 1. Defaults to 0.95, which corresponds to a 95 percent confidence interval. |
confint_sep |
String separating lower and upper confidence interval |
comparison |
Either "difference" or "ratio". |
condense |
Logical. FALSE gives numeric values, usually for plotting. TRUE gives table for final output. |
compare_name |
Name to be given to comparison metric. |
digits |
Rounding for estimate values and p-values, default c(2,3). |
ref_symbol |
Reference level symbol |
A dataframe of first differences or ratios for boostrapped distributions of a metric of interest.
finalfit
predict functions
# See boot_predict.
# See boot_predict.
Generate model predictions against a specified set of explanatory levels with
bootstrapped confidence intervals. Add a comparison by difference or ratio of
the first row of newdata
with all subsequent rows.
boot_predict( fit, newdata, type = "response", R = 100, estimate_name = NULL, confint_level = 0.95, conf.method = "perc", confint_sep = " to ", condense = TRUE, boot_compare = TRUE, compare_name = NULL, comparison = "difference", ref_symbol = "-", digits = c(2, 3) )
boot_predict( fit, newdata, type = "response", R = 100, estimate_name = NULL, confint_level = 0.95, conf.method = "perc", confint_sep = " to ", condense = TRUE, boot_compare = TRUE, compare_name = NULL, comparison = "difference", ref_symbol = "-", digits = c(2, 3) )
fit |
|
newdata |
Dataframe usually generated with
|
type |
the type of prediction required, see
|
R |
Number of simulations. Note default R=100 is very low. |
estimate_name |
Name to be given to prediction variable y-hat. |
confint_level |
The confidence level to use for the confidence interval. Must be strictly greater than 0 and less than 1. Defaults to 0.95, which corresponds to a 95 percent confidence interval. |
conf.method |
Passed to the type argument of boot::boot.ci(). Defaults to "perc". The allowed types are "perc", "basic", "bca", and "norm". Does not support "stud" or "all" |
confint_sep |
String separating lower and upper confidence interval |
condense |
Logical. FALSE gives numeric values, usually for plotting. TRUE gives table for final output. |
boot_compare |
Include a comparison with the first row of |
compare_name |
Name to be given to comparison metric. |
comparison |
Either "difference" or "ratio". |
ref_symbol |
Reference level symbol |
digits |
Rounding for estimate values and p-values, default c(2,3). |
To use this, first generate newdata
for specified levels of
explanatory variables using finalfit_newdata
. Pass model
objects from lm
, glm
, lmmulti
, and
glmmulti
. The comparison metrics are made on individual
bootstrap samples distribution returned as a mean with confidence intervals.
A p-value is generated on the proportion of values on the other side of the
null from the mean, e.g. for a ratio greater than 1.0, p is the number of
bootstrapped predictions under 1.0, multiplied by two so is two-sided.
A dataframe of predicted values and confidence intervals, with the
option of including a comparison of difference between first row and all
subsequent rows of newdata
.
library(finalfit) library(dplyr) # Predict probability of death across combinations of factor levels explanatory = c("age.factor", "extent.factor", "perfor.factor") dependent = 'mort_5yr' # Generate combination of factor levels colon_s %>% finalfit_newdata(explanatory = explanatory, newdata = list( c("<40 years", "Submucosa", "No"), c("<40 years", "Submucosa", "Yes"), c("<40 years", "Adjacent structures", "No"), c("<40 years", "Adjacent structures", "Yes") )) -> newdata # Run simulation colon_s %>% glmmulti(dependent, explanatory) %>% boot_predict(newdata, estimate_name = "Predicted probability of death", compare_name = "Absolute risk difference", R=100, digits = c(2,3)) # Plotting explanatory = c("nodes", "extent.factor", "perfor.factor") colon_s %>% finalfit_newdata(explanatory = explanatory, rowwise = FALSE, newdata = list( rep(seq(0, 30), 4), c(rep("Muscle", 62), rep("Adjacent structures", 62)), c(rep("No", 31), rep("Yes", 31), rep("No", 31), rep("Yes", 31)) )) -> newdata colon_s %>% glmmulti(dependent, explanatory) %>% boot_predict(newdata, boot_compare = FALSE, R=100, condense=FALSE) -> plot library(ggplot2) theme_set(theme_bw()) plot %>% ggplot(aes(x = nodes, y = estimate, ymin = estimate_conf.low, ymax = estimate_conf.high, fill=extent.factor))+ geom_line(aes(colour = extent.factor))+ geom_ribbon(alpha=0.1)+ facet_grid(.~perfor.factor)+ xlab("Number of postive lymph nodes")+ ylab("Probability of death")+ labs(fill = "Extent of tumour", colour = "Extent of tumour")+ ggtitle("Probability of death by lymph node count")
library(finalfit) library(dplyr) # Predict probability of death across combinations of factor levels explanatory = c("age.factor", "extent.factor", "perfor.factor") dependent = 'mort_5yr' # Generate combination of factor levels colon_s %>% finalfit_newdata(explanatory = explanatory, newdata = list( c("<40 years", "Submucosa", "No"), c("<40 years", "Submucosa", "Yes"), c("<40 years", "Adjacent structures", "No"), c("<40 years", "Adjacent structures", "Yes") )) -> newdata # Run simulation colon_s %>% glmmulti(dependent, explanatory) %>% boot_predict(newdata, estimate_name = "Predicted probability of death", compare_name = "Absolute risk difference", R=100, digits = c(2,3)) # Plotting explanatory = c("nodes", "extent.factor", "perfor.factor") colon_s %>% finalfit_newdata(explanatory = explanatory, rowwise = FALSE, newdata = list( rep(seq(0, 30), 4), c(rep("Muscle", 62), rep("Adjacent structures", 62)), c(rep("No", 31), rep("Yes", 31), rep("No", 31), rep("Yes", 31)) )) -> newdata colon_s %>% glmmulti(dependent, explanatory) %>% boot_predict(newdata, boot_compare = FALSE, R=100, condense=FALSE) -> plot library(ggplot2) theme_set(theme_bw()) plot %>% ggplot(aes(x = nodes, y = estimate, ymin = estimate_conf.low, ymax = estimate_conf.high, fill=extent.factor))+ geom_line(aes(colour = extent.factor))+ geom_ribbon(alpha=0.1)+ facet_grid(.~perfor.factor)+ xlab("Number of postive lymph nodes")+ ylab("Probability of death")+ labs(fill = "Extent of tumour", colour = "Extent of tumour")+ ggtitle("Probability of death by lymph node count")
This was written a few days after the retraction of a paper in JAMA due to an error in recoding the treatment variable (https://jamanetwork.com/journals/jama/fullarticle/2752474). This takes a data frame or tibble, fuzzy matches variable names, and produces crosstables of all matched variables. A visual inspection should reveal any miscoding.
check_recode( .data, dependent = NULL, explanatory = NULL, include_numerics = TRUE, ... )
check_recode( .data, dependent = NULL, explanatory = NULL, include_numerics = TRUE, ... )
.data |
Data frame or tibble. |
dependent |
Optional character vector: name(s) of depdendent variable(s). |
explanatory |
Optional character vector: name(s) of explanatory variable(s). |
include_numerics |
Logical. Include numeric variables in function. |
... |
Pass other arguments to |
List of length two. The first is an index of variable combiations. The second is a nested list of crosstables as tibbles.
library(dplyr) data(colon_s) colon_s_small = colon_s %>% select(-id, -rx, -rx.factor) %>% mutate( age.factor2 = forcats::fct_collapse(age.factor, "<60 years" = c("<40 years", "40-59 years")), sex.factor2 = forcats::fct_recode(sex.factor, # Intentional miscode "F" = "Male", "M" = "Female") ) # Check colon_s_small %>% check_recode(include_numerics = FALSE) out = colon_s_small %>% select(-extent, -extent.factor,-time, -time.years) %>% check_recode() out # Select a tibble and expand out$counts[[9]] # Note this variable (node4) appears miscoded in original dataset survival::colon. # Choose to only include variables that you actually use. # This uses standard Finalfit grammar. dependent = "mort_5yr" explanatory = c("age.factor2", "sex.factor2") colon_s_small %>% check_recode(dependent, explanatory)
library(dplyr) data(colon_s) colon_s_small = colon_s %>% select(-id, -rx, -rx.factor) %>% mutate( age.factor2 = forcats::fct_collapse(age.factor, "<60 years" = c("<40 years", "40-59 years")), sex.factor2 = forcats::fct_recode(sex.factor, # Intentional miscode "F" = "Male", "M" = "Female") ) # Check colon_s_small %>% check_recode(include_numerics = FALSE) out = colon_s_small %>% select(-extent, -extent.factor,-time, -time.years) %>% check_recode() out # Select a tibble and expand out$counts[[9]] # Note this variable (node4) appears miscoded in original dataset survival::colon. # Choose to only include variables that you actually use. # This uses standard Finalfit grammar. dependent = "mort_5yr" explanatory = c("age.factor2", "sex.factor2") colon_s_small %>% check_recode(dependent, explanatory)
Produce a coefficient and plot from a lm()
model.
coefficient_plot( .data, dependent, explanatory, random_effect = NULL, factorlist = NULL, lmfit = NULL, confint_type = "default", confint_level = 0.95, remove_ref = FALSE, breaks = NULL, column_space = c(-0.5, -0.1, 0.5), dependent_label = NULL, prefix = "", suffix = NULL, table_text_size = 4, title_text_size = 13, plot_opts = NULL, table_opts = NULL, ... )
coefficient_plot( .data, dependent, explanatory, random_effect = NULL, factorlist = NULL, lmfit = NULL, confint_type = "default", confint_level = 0.95, remove_ref = FALSE, breaks = NULL, column_space = c(-0.5, -0.1, 0.5), dependent_label = NULL, prefix = "", suffix = NULL, table_text_size = 4, title_text_size = 13, plot_opts = NULL, table_opts = NULL, ... )
.data |
Dataframe. |
dependent |
Character vector of length 1: name of depdendent variable (must be numeric/continuous). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
random_effect |
Character vector of length 1, name of random effect variable. |
factorlist |
Option to provide output directly from
|
lmfit |
Option to provide output directly from |
confint_type |
For for |
confint_level |
The confidence level required. |
remove_ref |
Logical. Remove reference level for factors. |
breaks |
Manually specify x-axis breaks in format |
column_space |
Adjust table column spacing. |
dependent_label |
Main label for plot. |
prefix |
Plots are titled by default with the dependent variable. This adds text before that label. |
suffix |
Plots are titled with the dependent variable. This adds text after that label. |
table_text_size |
Alter font size of table text. |
title_text_size |
Alter font size of title text. |
plot_opts |
A list of arguments to be appended to the ggplot call by "+". |
table_opts |
A list of arguments to be appended to the ggplot table call by "+". |
... |
Other parameters. |
Returns a table and plot produced in ggplot2
.
Other finalfit plot functions:
ff_plot()
,
hr_plot()
,
or_plot()
,
surv_plot()
library(finalfit) library(ggplot2) # Coefficient plot explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "nodes" colon_s %>% coefficient_plot(dependent, explanatory) colon_s %>% coefficient_plot(dependent, explanatory, table_text_size=4, title_text_size=14, plot_opts=list(xlab("Beta, 95% CI"), theme(axis.title = element_text(size=12))))
library(finalfit) library(ggplot2) # Coefficient plot explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "nodes" colon_s %>% coefficient_plot(dependent, explanatory) colon_s %>% coefficient_plot(dependent, explanatory, table_text_size=4, title_text_size=14, plot_opts=list(xlab("Beta, 95% CI"), theme(axis.title = element_text(size=12))))
This is a modified version of survival::colon
.These
are data from one of the first successful trials of adjuvant chemotherapy for
colon cancer. Levamisole is a low-toxicity compound previously used to treat
worm infestations in animals; 5-FU is a moderately toxic (as these things go)
chemotherapy agent. There are two records per person, one for recurrence and
one for death
data(colon_s)
data(colon_s)
A data frame with 929 rows and 33 variables
finalfit
model wrapperUsing finalfit
conventions, produces multivariable Cox
Proportional Hazard regression models for a set of explanatory variables
against a survival object.
coxphmulti(.data, dependent, explanatory, ...)
coxphmulti(.data, dependent, explanatory, ...)
.data |
Data frame. |
dependent |
Character vector of length 1: name of survival object in
form |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
... |
Other arguments to pass to |
Uses coxph
with finalfit
modelling
conventions. Output can be passed to fit2df
.
A multivariable coxph
fitted model
output. Output is of class coxph
.
Other finalfit model wrappers:
coxphuni()
,
crrmulti()
,
crruni()
,
glmmixed()
,
glmmulti_boot()
,
glmmulti()
,
glmuni()
,
lmmixed()
,
lmmulti()
,
lmuni()
,
svyglmmulti()
,
svyglmuni()
# Cox Proportional Hazards multivariable analysis. library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% coxphmulti(dependent, explanatory) %>% fit2df()
# Cox Proportional Hazards multivariable analysis. library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% coxphmulti(dependent, explanatory) %>% fit2df()
finalfit
model wrapperUsing finalfit
conventions, produces multiple univariable Cox Proportional Hazard
regression models for a set of explanatory variables against a survival object.
coxphuni(.data, dependent, explanatory)
coxphuni(.data, dependent, explanatory)
.data |
Data frame. |
dependent |
Character vector of length 1: name of survival object in form |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
Uses coxph
with finalfit
modelling conventions. Output can be
passed to fit2df
.
A list of univariable coxph
fitted model outputs.
Output is of class coxphlist
.
Other finalfit model wrappers:
coxphmulti()
,
crrmulti()
,
crruni()
,
glmmixed()
,
glmmulti_boot()
,
glmmulti()
,
glmuni()
,
lmmixed()
,
lmmulti()
,
lmuni()
,
svyglmmulti()
,
svyglmuni()
# Cox Proportional Hazards univariable analysis. library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% coxphuni(dependent, explanatory) %>% fit2df()
# Cox Proportional Hazards univariable analysis. library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% coxphuni(dependent, explanatory) %>% fit2df()
finalfit
model wrapperUsing finalfit
conventions, produces multivariable Competing Risks
Regression models for a set of explanatory variables.
crrmulti(.data, dependent, explanatory, ...)
crrmulti(.data, dependent, explanatory, ...)
.data |
Data frame or tibble. |
dependent |
Character vector of length 1: name of survival object in
form |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
... |
Other arguments to |
Uses crr
with finalfit
modelling conventions.
Output can be passed to fit2df
.
A multivariable crr
fitted model
class crr
.
Other finalfit model wrappers:
coxphmulti()
,
coxphuni()
,
crruni()
,
glmmixed()
,
glmmulti_boot()
,
glmmulti()
,
glmuni()
,
lmmixed()
,
lmmulti()
,
lmuni()
,
svyglmmulti()
,
svyglmuni()
library(dplyr) melanoma = boot::melanoma melanoma = melanoma %>% mutate( # Cox PH to determine cause-specific hazards status_coxph = ifelse(status == 2, 0, # "still alive" ifelse(status == 1, 1, # "died of melanoma" 0)), # "died of other causes is censored" # Fine and Gray to determine subdistribution hazards status_crr = ifelse(status == 2, 0, # "still alive" ifelse(status == 1, 1, # "died of melanoma" 2)), # "died of other causes" sex = factor(sex), ulcer = factor(ulcer) ) dependent_coxph = c("Surv(time, status_coxph)") dependent_crr = c("Surv(time, status_crr)") explanatory = c("sex", "age", "ulcer") # Create single well-formatted table melanoma %>% summary_factorlist(dependent_crr, explanatory, column = TRUE, fit_id = TRUE) %>% ff_merge( melanoma %>% coxphmulti(dependent_coxph, explanatory) %>% fit2df(estimate_suffix = " (Cox PH multivariable)") ) %>% ff_merge( melanoma %>% crrmulti(dependent_crr, explanatory) %>% fit2df(estimate_suffix = " (competing risks multivariable)") ) %>% select(-fit_id, -index) %>% dependent_label(melanoma, dependent_crr)
library(dplyr) melanoma = boot::melanoma melanoma = melanoma %>% mutate( # Cox PH to determine cause-specific hazards status_coxph = ifelse(status == 2, 0, # "still alive" ifelse(status == 1, 1, # "died of melanoma" 0)), # "died of other causes is censored" # Fine and Gray to determine subdistribution hazards status_crr = ifelse(status == 2, 0, # "still alive" ifelse(status == 1, 1, # "died of melanoma" 2)), # "died of other causes" sex = factor(sex), ulcer = factor(ulcer) ) dependent_coxph = c("Surv(time, status_coxph)") dependent_crr = c("Surv(time, status_crr)") explanatory = c("sex", "age", "ulcer") # Create single well-formatted table melanoma %>% summary_factorlist(dependent_crr, explanatory, column = TRUE, fit_id = TRUE) %>% ff_merge( melanoma %>% coxphmulti(dependent_coxph, explanatory) %>% fit2df(estimate_suffix = " (Cox PH multivariable)") ) %>% ff_merge( melanoma %>% crrmulti(dependent_crr, explanatory) %>% fit2df(estimate_suffix = " (competing risks multivariable)") ) %>% select(-fit_id, -index) %>% dependent_label(melanoma, dependent_crr)
finalfit
model wrapperUsing finalfit
conventions, produces univariable Competing Risks
Regression models for a set of explanatory variables.
crruni(.data, dependent, explanatory, ...)
crruni(.data, dependent, explanatory, ...)
.data |
Data frame or tibble. |
dependent |
Character vector of length 1: name of survival object in
form |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
... |
Other arguments to |
Uses crr
with finalfit
modelling conventions.
Output can be passed to fit2df
.
A list of univariable crr
fitted models class
crrlist
.
Other finalfit model wrappers:
coxphmulti()
,
coxphuni()
,
crrmulti()
,
glmmixed()
,
glmmulti_boot()
,
glmmulti()
,
glmuni()
,
lmmixed()
,
lmmulti()
,
lmuni()
,
svyglmmulti()
,
svyglmuni()
library(dplyr) melanoma = boot::melanoma melanoma = melanoma %>% mutate( # Cox PH to determine cause-specific hazards status_coxph = ifelse(status == 2, 0, # "still alive" ifelse(status == 1, 1, # "died of melanoma" 0)), # "died of other causes is censored" # Fine and Gray to determine subdistribution hazards status_crr = ifelse(status == 2, 0, # "still alive" ifelse(status == 1, 1, # "died of melanoma" 2)), # "died of other causes" sex = factor(sex), ulcer = factor(ulcer) ) dependent_coxph = c("Surv(time, status_coxph)") dependent_crr = c("Surv(time, status_crr)") explanatory = c("sex", "age", "ulcer") # Create single well-formatted table melanoma %>% summary_factorlist(dependent_crr, explanatory, column = TRUE, fit_id = TRUE) %>% ff_merge( melanoma %>% coxphmulti(dependent_coxph, explanatory) %>% fit2df(estimate_suffix = " (Cox PH multivariable)") ) %>% ff_merge( melanoma %>% crrmulti(dependent_crr, explanatory) %>% fit2df(estimate_suffix = " (competing risks multivariable)") ) %>% select(-fit_id, -index) %>% dependent_label(melanoma, dependent_crr)
library(dplyr) melanoma = boot::melanoma melanoma = melanoma %>% mutate( # Cox PH to determine cause-specific hazards status_coxph = ifelse(status == 2, 0, # "still alive" ifelse(status == 1, 1, # "died of melanoma" 0)), # "died of other causes is censored" # Fine and Gray to determine subdistribution hazards status_crr = ifelse(status == 2, 0, # "still alive" ifelse(status == 1, 1, # "died of melanoma" 2)), # "died of other causes" sex = factor(sex), ulcer = factor(ulcer) ) dependent_coxph = c("Surv(time, status_coxph)") dependent_crr = c("Surv(time, status_crr)") explanatory = c("sex", "age", "ulcer") # Create single well-formatted table melanoma %>% summary_factorlist(dependent_crr, explanatory, column = TRUE, fit_id = TRUE) %>% ff_merge( melanoma %>% coxphmulti(dependent_coxph, explanatory) %>% fit2df(estimate_suffix = " (Cox PH multivariable)") ) %>% ff_merge( melanoma %>% crrmulti(dependent_crr, explanatory) %>% fit2df(estimate_suffix = " (competing risks multivariable)") ) %>% select(-fit_id, -index) %>% dependent_label(melanoma, dependent_crr)
Can be add dependent label to final results dataframe.
dependent_label(df.out, .data, dependent, prefix = "Dependent: ", suffix = "")
dependent_label(df.out, .data, dependent, prefix = "Dependent: ", suffix = "")
df.out |
Dataframe (results table) to be altered. |
.data |
Original dataframe. |
dependent |
Character vector of length 1: quoted name of depdendent
variable. Can be continuous, a binary factor, or a survival object of form
|
prefix |
Prefix for dependent label |
suffix |
Suffix for dependent label |
Returns the label for the dependent variable, if specified.
library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = 'mort_5yr' # Separate tables colon_s %>% summary_factorlist(dependent, explanatory, fit_id=TRUE) -> example.summary colon_s %>% glmuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") -> example.univariable colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)") -> example.multivariable colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel") -> example.multilevel # Pipe together example.summary %>% finalfit_merge(example.univariable) %>% finalfit_merge(example.multivariable) %>% finalfit_merge(example.multilevel) %>% select(-c(fit_id, index)) %>% dependent_label(colon_s, dependent) -> example.final example.final
library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = 'mort_5yr' # Separate tables colon_s %>% summary_factorlist(dependent, explanatory, fit_id=TRUE) -> example.summary colon_s %>% glmuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") -> example.univariable colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)") -> example.multivariable colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel") -> example.multilevel # Pipe together example.summary %>% finalfit_merge(example.univariable) %>% finalfit_merge(example.multivariable) %>% finalfit_merge(example.multilevel) %>% select(-c(fit_id, index)) %>% dependent_label(colon_s, dependent) -> example.final example.final
Variable labels can be created using ff_label
. Some functions
strip variable labels (variable attributes), e.g. forcats::fct_recode
.
Use this function to create a vector of variable labels from a data frame.
Then use ff_relabel
to relabel variables in data frame.
extract_variable_label(.data)
extract_variable_label(.data)
.data |
Dataframe containing labelled variables. |
colon_s %>% extract_variable_label
colon_s %>% extract_variable_label
summary_factorlist()
outputAdd column totals to summary_factorlist()
output
ff_column_totals( df.in, .data, dependent, na_include_dependent = FALSE, percent = TRUE, digits = c(1, 0), label = NULL, prefix = "", weights = NULL ) finalfit_column_totals( df.in, .data, dependent, na_include_dependent = FALSE, percent = TRUE, digits = c(1, 0), label = NULL, prefix = "", weights = NULL )
ff_column_totals( df.in, .data, dependent, na_include_dependent = FALSE, percent = TRUE, digits = c(1, 0), label = NULL, prefix = "", weights = NULL ) finalfit_column_totals( df.in, .data, dependent, na_include_dependent = FALSE, percent = TRUE, digits = c(1, 0), label = NULL, prefix = "", weights = NULL )
df.in |
|
.data |
Data frame used to create |
dependent |
Character. Name of dependent variable. |
na_include_dependent |
Logical. When TRUE, missing data in the dependent variable is included in totals. |
percent |
Logical. Include percentage. |
digits |
Integer length 2. Number of digits for (1) percentage, (2) weighted count. |
label |
Character. Label for total row. |
prefix |
Character. Prefix for column totals, e.g "N=". |
weights |
Character vector of length 1: name of column to use for weights. |
Data frame.
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% summary_factorlist(dependent, explanatory) %>% ff_column_totals(colon_s, dependent) # Ensure works with missing data in dependent colon_s = colon_s %>% dplyr::mutate( mort_5yr = forcats::fct_na_value_to_level(mort_5yr, level = "(Missing)") ) colon_s %>% summary_factorlist(dependent, explanatory) %>% ff_column_totals(colon_s, dependent)
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% summary_factorlist(dependent, explanatory) %>% ff_column_totals(colon_s, dependent) # Ensure works with missing data in dependent colon_s = colon_s %>% dplyr::mutate( mort_5yr = forcats::fct_na_value_to_level(mort_5yr, level = "(Missing)") ) colon_s %>% summary_factorlist(dependent, explanatory) %>% ff_column_totals(colon_s, dependent)
When producing conditional estimates from a regression model, it is often useful to set variables not of interest to their mode for factors and mean or median for numerics when creating the newdata object, and combine these with all levels for factors of interest.
ff_expand(.data, ..., cont = "mean") finalfit_expand(.data, ..., cont = "mean")
ff_expand(.data, ..., cont = "mean") finalfit_expand(.data, ..., cont = "mean")
.data |
A data frame or tibble. |
... |
Factors to expand. |
cont |
One of "mean" or "median": the summary estimate for continuous variables. |
A data frame or tibble with the mode for factors and mean/median for continuous variables, with given factors expanded to include all levels.
library(dplyr) colon_s %>% select(-hospital) %>% ff_expand(age.factor, sex.factor)
library(dplyr) colon_s %>% select(-hospital) %>% ff_expand(age.factor, sex.factor)
Useful when passing finalfit dependent and explanatory lists to base R functions
ff_formula(dependent, explanatory, random_effect = NULL) finalfit_formula(dependent, explanatory, random_effect = NULL)
ff_formula(dependent, explanatory, random_effect = NULL) finalfit_formula(dependent, explanatory, random_effect = NULL)
dependent |
Optional character vector: name(s) of depdendent variable(s). |
explanatory |
Optional character vector: name(s) of explanatory variable(s). |
random_effect |
Optional character vector: name(s) of random effect variable(s). |
Character vector
explanatory = c("age", "nodes", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" ff_formula(dependent, explanatory) explanatory = c("age", "nodes", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" random_effect = "(age.factor | hospital)" ff_formula(dependent, explanatory)
explanatory = c("age", "nodes", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" ff_formula(dependent, explanatory) explanatory = c("age", "nodes", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" random_effect = "(age.factor | hospital)" ff_formula(dependent, explanatory)
Everyone has a funcion like this, str, glimpse, glance etc. This one is
specifically designed for use with finalfit
language. It is different
in dividing variables by numeric vs factor.
ff_glimpse( .data, dependent = NULL, explanatory = NULL, digits = 1, levels_cut = 5 ) finalfit_glimpse( .data, dependent = NULL, explanatory = NULL, digits = 1, levels_cut = 5 )
ff_glimpse( .data, dependent = NULL, explanatory = NULL, digits = 1, levels_cut = 5 ) finalfit_glimpse( .data, dependent = NULL, explanatory = NULL, digits = 1, levels_cut = 5 )
.data |
Dataframe. |
dependent |
Optional character vector: name(s) of depdendent variable(s). |
explanatory |
Optional character vector: name(s) of explanatory variable(s). |
digits |
Significant digits for continuous variable summaries |
levels_cut |
Max number of factor levels to include in factor levels summary (in order to avoid the long printing of variables with many factors). |
Dataframe on summary data.
library(finalfit) dependent = 'mort_5yr' explanatory = c("age", "nodes", "age.factor", "extent.factor", "perfor.factor") colon_s %>% finalfit_glimpse(dependent, explanatory)
library(finalfit) dependent = 'mort_5yr' explanatory = c("age", "nodes", "age.factor", "extent.factor", "perfor.factor") colon_s %>% finalfit_glimpse(dependent, explanatory)
Combine two factor variables to make an interaction variable. Factor level order is determined by the order in the variables themselves. Note, names of the factor variables should not be quoted. The name of the variable is created from the names of the two factors. The variable is also labelled with a name derived from any pre-existing labels.
ff_interaction(.data, ..., levels_sep = "_", var_sep = "_", label_sep = ":") finalfit_interaction( .data, ..., levels_sep = "_", var_sep = "_", label_sep = ":" )
ff_interaction(.data, ..., levels_sep = "_", var_sep = "_", label_sep = ":") finalfit_interaction( .data, ..., levels_sep = "_", var_sep = "_", label_sep = ":" )
.data |
Data frame. |
... |
The unquoted names of two factors. |
levels_sep |
Quoted character: how levels are separated in new variable. |
var_sep |
Quoted character: how variable name is separated. |
label_sep |
Quoted character: how variable label is separated |
Original data frame with new variable added via 'dplyr::mutate'.
colon_s %>% ff_interaction(sex.factor, perfor.factor) %>% summary_factorlist("mort_5yr", "sex.factor_perfor.factor")
colon_s %>% ff_interaction(sex.factor, perfor.factor) %>% summary_factorlist("mort_5yr", "sex.factor_perfor.factor")
Label a variable
ff_label(.var, variable_label) finalfit_label(.var, variable_label)
ff_label(.var, variable_label) finalfit_label(.var, variable_label)
.var |
Quoted variable name |
variable_label |
Quoted variable label |
Labelled variable
extract_variable_label
ff_relabel
colon_s$sex.factor %>% ff_label("Sex") %>% str()
colon_s$sex.factor %>% ff_label("Sex") %>% str()
summary_factorlist()
table with any number of model
results tables.A function that takes the output from summary_factorlist(...,
fit_id=TRUE)
and merges with any number of model dataframes, usually
produced with a model wrapper followed by the fit2df()
function
(see examples).
ff_merge( factorlist, fit2df_df, ref_symbol = "-", estimate_name = NULL, last_merge = FALSE ) finalfit_merge( factorlist, fit2df_df, ref_symbol = "-", estimate_name = NULL, last_merge = FALSE )
ff_merge( factorlist, fit2df_df, ref_symbol = "-", estimate_name = NULL, last_merge = FALSE ) finalfit_merge( factorlist, fit2df_df, ref_symbol = "-", estimate_name = NULL, last_merge = FALSE )
factorlist |
Output from |
fit2df_df |
Output from model wrappers followed by
|
ref_symbol |
Reference symbol for model reference levels, typically "-" or "1.0". |
estimate_name |
If you have chosen a new 'estimate name' (e.g. "Odds ratio") when running a model wrapper (e.g. 'glmuni'), then you need to pass this new name to 'finalfit_merge' to generate correct table. Defaults to OR/HR/Coefficient |
last_merge |
Logical. Set to try for the final merge in a series to remove index and fit_id columns. |
Returns a dataframe of combined tables.
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = "mort_5yr" # Create separate tables colon_s %>% summary_factorlist(dependent, explanatory, fit_id=TRUE) -> example.summary colon_s %>% glmuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") -> example.univariable colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)") -> example.multivariable colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel)") -> example.multilevel # Pipe together example.summary %>% ff_merge(example.univariable) %>% ff_merge(example.multivariable) %>% ff_merge(example.multilevel, last_merge = TRUE) # Using finalfit() colon_s %>% finalfit(dependent, explanatory, keep_fit_id = TRUE) %>% ff_merge(example.multilevel, last_merge = TRUE)
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = "mort_5yr" # Create separate tables colon_s %>% summary_factorlist(dependent, explanatory, fit_id=TRUE) -> example.summary colon_s %>% glmuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") -> example.univariable colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)") -> example.multivariable colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel)") -> example.multilevel # Pipe together example.summary %>% ff_merge(example.univariable) %>% ff_merge(example.multivariable) %>% ff_merge(example.multilevel, last_merge = TRUE) # Using finalfit() colon_s %>% finalfit(dependent, explanatory, keep_fit_id = TRUE) %>% ff_merge(example.multilevel, last_merge = TRUE)
Generate common metrics for regression model results
ff_metrics(.data) ## S3 method for class 'lm' ff_metrics(.data) ## S3 method for class 'lmlist' ff_metrics(.data) ## S3 method for class 'glm' ff_metrics(.data) ## S3 method for class 'glmlist' ff_metrics(.data) ## S3 method for class 'lmerMod' ff_metrics(.data) ## S3 method for class 'glmerMod' ff_metrics(.data) ## S3 method for class 'coxph' ff_metrics(.data) ## S3 method for class 'coxphlist' ff_metrics(.data)
ff_metrics(.data) ## S3 method for class 'lm' ff_metrics(.data) ## S3 method for class 'lmlist' ff_metrics(.data) ## S3 method for class 'glm' ff_metrics(.data) ## S3 method for class 'glmlist' ff_metrics(.data) ## S3 method for class 'lmerMod' ff_metrics(.data) ## S3 method for class 'glmerMod' ff_metrics(.data) ## S3 method for class 'coxph' ff_metrics(.data) ## S3 method for class 'coxphlist' ff_metrics(.data)
.data |
Model output. |
Model metrics vector for output.
library(finalfit) # glm fit = glm(mort_5yr ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data=colon_s, family="binomial") fit %>% ff_metrics() # glmlist explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmmulti(dependent, explanatory) %>% ff_metrics() # glmerMod explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "mort_5yr" colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% ff_metrics() # lm fit = lm(nodes ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data=colon_s) fit %>% ff_metrics() # lmerMod explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "nodes" colon_s %>% lmmixed(dependent, explanatory, random_effect) %>% ff_metrics() # coxphlist explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% coxphmulti(dependent, explanatory) %>% ff_metrics() # coxph fit = survival::coxph(survival::Surv(time, status) ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data = colon_s) fit %>% ff_metrics()
library(finalfit) # glm fit = glm(mort_5yr ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data=colon_s, family="binomial") fit %>% ff_metrics() # glmlist explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmmulti(dependent, explanatory) %>% ff_metrics() # glmerMod explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "mort_5yr" colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% ff_metrics() # lm fit = lm(nodes ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data=colon_s) fit %>% ff_metrics() # lmerMod explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "nodes" colon_s %>% lmmixed(dependent, explanatory, random_effect) %>% ff_metrics() # coxphlist explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% coxphmulti(dependent, explanatory) %>% ff_metrics() # coxph fit = survival::coxph(survival::Surv(time, status) ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data = colon_s) fit %>% ff_metrics()
When producing conditional estimates from a regression model, it is often useful to set variables not of interest to their mode when creating the newdata object.
ff_mode(...) finalfit_mode(...)
ff_mode(...) finalfit_mode(...)
... |
Unquoted factor names. |
The most frequent level in a factor.
library(dplyr) colon_s %>% summarise(age.factor = ff_mode(age.factor)) colon_s %>% select(sex.factor, rx.factor, obstruct.factor, perfor.factor) %>% summarise(across(everything(), ff_mode)) colon_s %>% reframe(across(where(is.factor), ff_mode)) # Note, 4 rows is returned in this example because 4 factor levels within `hospital` # have the same frequency.
library(dplyr) colon_s %>% summarise(age.factor = ff_mode(age.factor)) colon_s %>% select(sex.factor, rx.factor, obstruct.factor, perfor.factor) %>% summarise(across(everything(), ff_mode)) colon_s %>% reframe(across(where(is.factor), ff_mode)) # Note, 4 rows is returned in this example because 4 factor levels within `hospital` # have the same frequency.
Generate newdata while respecting the variable types and factor levels in the primary data frame used to run model.
ff_newdata( .data, dependent = NULL, explanatory = NULL, rowwise = TRUE, newdata ) finalfit_newdata( .data, dependent = NULL, explanatory = NULL, rowwise = TRUE, newdata )
ff_newdata( .data, dependent = NULL, explanatory = NULL, rowwise = TRUE, newdata ) finalfit_newdata( .data, dependent = NULL, explanatory = NULL, rowwise = TRUE, newdata )
.data |
Dataframe. |
dependent |
Optional character vector of length 1: name of depdendent variable. Not usually specified in bootstrapping model predictions. |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
rowwise |
Logical. Format |
newdata |
A list of rows or columns coresponding exactly to the order of explanatory variables. Useful errors generated if requirements not fulfilled |
Generate model predictions against a specified set of explanatory levels with
bootstrapped confidence intervals. Add a comparison by difference or ratio of
the first row of newdata
with all subsequent rows.
A list of multivariable glm
fitted model
outputs. Output is of class glmlist
.
# See boot_predict. library(finalfit) library(dplyr) # Predict probability of death across combinations of factor levels explanatory = c("age.factor", "extent.factor", "perfor.factor") dependent = 'mort_5yr' # Generate combination of explanatory variable levels rowwise colon_s %>% finalfit_newdata(explanatory = explanatory, newdata = list( c("<40 years", "Submucosa", "No"), c("<40 years", "Submucosa", "Yes"), c("<40 years", "Adjacent structures", "No"), c("<40 years", "Adjacent structures", "Yes") )) -> newdata # Generate combination of explanatory variable levels colwise. explanatory = c("nodes", "extent.factor", "perfor.factor") colon_s %>% finalfit_newdata(explanatory = explanatory, rowwise = FALSE, newdata = list( rep(seq(0, 30), 4), c(rep("Muscle", 62), rep("Adjacent structures", 62)), c(rep("No", 31), rep("Yes", 31), rep("No", 31), rep("Yes", 31)) )) -> newdata
# See boot_predict. library(finalfit) library(dplyr) # Predict probability of death across combinations of factor levels explanatory = c("age.factor", "extent.factor", "perfor.factor") dependent = 'mort_5yr' # Generate combination of explanatory variable levels rowwise colon_s %>% finalfit_newdata(explanatory = explanatory, newdata = list( c("<40 years", "Submucosa", "No"), c("<40 years", "Submucosa", "Yes"), c("<40 years", "Adjacent structures", "No"), c("<40 years", "Adjacent structures", "Yes") )) -> newdata # Generate combination of explanatory variable levels colwise. explanatory = c("nodes", "extent.factor", "perfor.factor") colon_s %>% finalfit_newdata(explanatory = explanatory, rowwise = FALSE, newdata = list( rep(seq(0, 30), 4), c(rep("Muscle", 62), rep("Adjacent structures", 62)), c(rep("No", 31), rep("Yes", 31), rep("No", 31), rep("Yes", 31)) )) -> newdata
Parse a formula to finalfit grammar
ff_parse_formula(.formula)
ff_parse_formula(.formula)
.formula |
an object of class "formula" (or one that can be coerced to that class). |
A list containing dependent, explanatory and random effects variables
ff_parse_formula(mort ~ age + sex + (1 | hospital))
ff_parse_formula(mort ~ age + sex + (1 | hospital))
summary_factorlist
outputInclude only percentages for factors in summary_factorlist
output
ff_percent_only(.data) finalfit_percent_only(.data)
ff_percent_only(.data) finalfit_percent_only(.data)
.data |
Output from |
Data frame.
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% summary_factorlist(dependent, explanatory) %>% ff_percent_only()
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% summary_factorlist(dependent, explanatory) %>% ff_percent_only()
Permuate explanatory variables to produce multiple output tables for common regression models
ff_permute( .data, dependent = NULL, explanatory_base = NULL, explanatory_permute = NULL, multiple_tables = FALSE, include_base_model = TRUE, include_full_model = TRUE, base_on_top = TRUE, ... ) finalfit_permute( .data, dependent = NULL, explanatory_base = NULL, explanatory_permute = NULL, multiple_tables = FALSE, include_base_model = TRUE, include_full_model = TRUE, base_on_top = TRUE, ... )
ff_permute( .data, dependent = NULL, explanatory_base = NULL, explanatory_permute = NULL, multiple_tables = FALSE, include_base_model = TRUE, include_full_model = TRUE, base_on_top = TRUE, ... ) finalfit_permute( .data, dependent = NULL, explanatory_base = NULL, explanatory_permute = NULL, multiple_tables = FALSE, include_base_model = TRUE, include_full_model = TRUE, base_on_top = TRUE, ... )
.data |
Data frame or tibble. |
dependent |
Character vector of length 1: quoted name of dependent
variable. Can be continuous, a binary factor, or a survival object of form
|
explanatory_base |
Character vector of any length: quoted name(s) of base model explanatory variables. |
explanatory_permute |
Character vector of any length: quoted name(s) of explanatory variables to permute through models. |
multiple_tables |
Logical. Multiple model tables as a list, or a single table including multiple models. |
include_base_model |
Logical. Include model using |
include_full_model |
Logical. Include model using all |
base_on_top |
Logical. Base variables at top of table, or bottom of table. |
... |
Other arguments to |
Returns a list of data frame with the final model table.
explanatory_base = c("age.factor", "sex.factor") explanatory_permute = c("obstruct.factor", "perfor.factor", "node4.factor") # Linear regression colon_s %>% finalfit_permute("nodes", explanatory_base, explanatory_permute) # Cox proportional hazards regression colon_s %>% finalfit_permute("Surv(time, status)", explanatory_base, explanatory_permute) # Logistic regression # colon_s %>% # finalfit_permute("mort_5yr", explanatory_base, explanatory_permute) # Logistic regression with random effect (glmer) # colon_s %>% # finalfit_permute("mort_5yr", explanatory_base, explanatory_permute, # random_effect = "hospital")
explanatory_base = c("age.factor", "sex.factor") explanatory_permute = c("obstruct.factor", "perfor.factor", "node4.factor") # Linear regression colon_s %>% finalfit_permute("nodes", explanatory_base, explanatory_permute) # Cox proportional hazards regression colon_s %>% finalfit_permute("Surv(time, status)", explanatory_base, explanatory_permute) # Logistic regression # colon_s %>% # finalfit_permute("mort_5yr", explanatory_base, explanatory_permute) # Logistic regression with random effect (glmer) # colon_s %>% # finalfit_permute("mort_5yr", explanatory_base, explanatory_permute, # random_effect = "hospital")
Wraps or_plot
, hr_plot
, and
coefficient_plot
and sends to the appropriate method depending
on the dependent variable type.
ff_plot(.data, dependent, explanatory, ...) finalfit_plot(.data, dependent, explanatory, ...)
ff_plot(.data, dependent, explanatory, ...) finalfit_plot(.data, dependent, explanatory, ...)
.data |
Data frame. |
dependent |
Character vector of length 1. |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
... |
Pass arguments |
A table and a plot using ggplot2
.
Other finalfit plot functions:
coefficient_plot()
,
hr_plot()
,
or_plot()
,
surv_plot()
# Coefficient plot explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "nodes" colon_s %>% ff_plot(dependent, explanatory) # Odds ratio plot dependent = "mort_5yr" colon_s %>% ff_plot(dependent, explanatory) # Hazard ratio plot dependent = "Surv(time, status)" colon_s %>% ff_plot(dependent, explanatory, dependent_label = "Survival")
# Coefficient plot explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "nodes" colon_s %>% ff_plot(dependent, explanatory) # Odds ratio plot dependent = "mort_5yr" colon_s %>% ff_plot(dependent, explanatory) # Hazard ratio plot dependent = "Surv(time, status)" colon_s %>% ff_plot(dependent, explanatory, dependent_label = "Survival")
Variable labels can be created using ff_label
. Some functions
strip variable labels (variable attributes), e.g. forcats::fct_recode
.
Use this function to create a vector of variable labels from a data frame.
Then use ff_relabel
to relabel variables in data frame.
ff_relabel(.data, .labels) finalfit_relabel(.data, .labels)
ff_relabel(.data, .labels) finalfit_relabel(.data, .labels)
.data |
Data frame to be relabelled |
.labels |
Vector of variable labels (usually created using
|
# Label variable colon_s$sex.factor %>% ff_label("Sex") %>% str() # Make factor level "Unknown" NA colon_s %>% dplyr::mutate_if(is.factor, forcats::fct_recode, NULL = "Unknown") %>% str() # Reset data data(colon_s) # Extract variable labels vlabels = colon_s %>% extract_variable_label() # Run function where labels are lost colon_s %>% dplyr::mutate_if(is.factor, forcats::fct_recode, NULL = "Unknown") %>% str() # Relabel colon_s %<>% ff_relabel(vlabels) colon_s %>% str()
# Label variable colon_s$sex.factor %>% ff_label("Sex") %>% str() # Make factor level "Unknown" NA colon_s %>% dplyr::mutate_if(is.factor, forcats::fct_recode, NULL = "Unknown") %>% str() # Reset data data(colon_s) # Extract variable labels vlabels = colon_s %>% extract_variable_label() # Run function where labels are lost colon_s %>% dplyr::mutate_if(is.factor, forcats::fct_recode, NULL = "Unknown") %>% str() # Relabel colon_s %<>% ff_relabel(vlabels) colon_s %>% str()
Relabel variables from data frame after tidyverse functions
ff_relabel_df(.data, .df) finalfit_relabel_df(.data, .df)
ff_relabel_df(.data, .df) finalfit_relabel_df(.data, .df)
.data |
Data frame or tibble after applicaton of label stripping functions. |
.df |
Original data frame which contains labels. |
Data frame or tibble
This will work with finalfit
and any fit2df
output.
ff_remove_p(.data) finalfit_remove_p(.data)
ff_remove_p(.data) finalfit_remove_p(.data)
.data |
Output from |
Data frame.
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory) %>% ff_remove_p()
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory) %>% ff_remove_p()
This looks for a column with a name including "Coefficient", "OR", or "HR"
(finalfit
defaults) and removes any rows with "-" (the default
for the reference level). Can also be combined to produce an
or_plot
, see below.
ff_remove_ref(.data, only_binary = TRUE) finalfit_remove_ref(.data, only_binary = TRUE)
ff_remove_ref(.data, only_binary = TRUE) finalfit_remove_ref(.data, only_binary = TRUE)
.data |
Output from |
only_binary |
Logical. Remove reference level only for two-level factors. When set to false, reference level for all factors removed. |
Data frame.
# Table example explanatory = c("age.factor", "age", "sex.factor", "nodes", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory, add_dependent_label = FALSE) %>% ff_remove_ref() %>% dependent_label(colon_s, dependent) # Plot example explanatory = c("age.factor", "age", "sex.factor", "nodes", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% summary_factorlist(dependent, explanatory, total_col = TRUE, fit_id=TRUE) %>% ff_merge( glmuni(colon_s, dependent, explanatory) %>% fit2df()) %>% ff_remove_ref() %>% dplyr::select(-`OR`) -> factorlist_plot colon_s %>% or_plot(dependent, explanatory, factorlist = factorlist_plot)
# Table example explanatory = c("age.factor", "age", "sex.factor", "nodes", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory, add_dependent_label = FALSE) %>% ff_remove_ref() %>% dependent_label(colon_s, dependent) # Plot example explanatory = c("age.factor", "age", "sex.factor", "nodes", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% summary_factorlist(dependent, explanatory, total_col = TRUE, fit_id=TRUE) %>% ff_merge( glmuni(colon_s, dependent, explanatory) %>% fit2df()) %>% ff_remove_ref() %>% dplyr::select(-`OR`) -> factorlist_plot colon_s %>% or_plot(dependent, explanatory, factorlist = factorlist_plot)
summary_factorlist()
outputThis adds a total and missing count to variables. This is useful for
continuous variables. Compare this to summary_factorlist(total_col =
TRUE)
which includes a count for each dummy variable as a factor and mean
(sd) or median (iqr) for continuous variables.
ff_row_totals( df.in, .data, dependent, explanatory, missing_column = TRUE, percent = TRUE, digits = 1, na_include_dependent = FALSE, na_complete_cases = FALSE, total_name = "Total N", na_name = "Missing N" ) finalfit_row_totals( df.in, .data, dependent, explanatory, missing_column = TRUE, percent = TRUE, digits = 1, na_include_dependent = FALSE, na_complete_cases = FALSE, total_name = "Total N", na_name = "Missing N" )
ff_row_totals( df.in, .data, dependent, explanatory, missing_column = TRUE, percent = TRUE, digits = 1, na_include_dependent = FALSE, na_complete_cases = FALSE, total_name = "Total N", na_name = "Missing N" ) finalfit_row_totals( df.in, .data, dependent, explanatory, missing_column = TRUE, percent = TRUE, digits = 1, na_include_dependent = FALSE, na_complete_cases = FALSE, total_name = "Total N", na_name = "Missing N" )
df.in |
|
.data |
Data frame used to create |
dependent |
Character. Name of dependent variable. |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
missing_column |
Logical. Include a column of counts of missing data. |
percent |
Logical. Include percentage. |
digits |
Integer length 1. Number of digits for percentage. |
na_include_dependent |
Logical. When TRUE, missing data in the dependent variable is included in totals. |
na_complete_cases |
Logical. When TRUE, missing data counts for variables are for compelte cases across all included variables. |
total_name |
Character. Name of total column. |
na_name |
Character. Name of missing column. |
Data frame.
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% summary_factorlist(dependent, explanatory) %>% ff_row_totals(colon_s, dependent, explanatory)
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% summary_factorlist(dependent, explanatory) %>% ff_row_totals(colon_s, dependent, explanatory)
Help making stratified summary_factorlist tables
ff_stratify_helper(df.out, .data)
ff_stratify_helper(df.out, .data)
df.out |
Output from |
.data |
Original data frame used for |
library(dplyr) explanatory = c("age.factor", "sex.factor") dependent = "perfor.factor" # Pick option below split = "rx.factor" split = c("rx.factor", "node4.factor") # Piped function to generate stratified crosstabs table colon_s %>% group_by(!!! syms(split)) %>% #Looks awkward, but avoids unquoted var names group_modify(~ summary_factorlist(.x, dependent, explanatory)) %>% ff_stratify_helper(colon_s)
library(dplyr) explanatory = c("age.factor", "sex.factor") dependent = "perfor.factor" # Pick option below split = "rx.factor" split = c("rx.factor", "node4.factor") # Piped function to generate stratified crosstabs table colon_s %>% group_by(!!! syms(split)) %>% #Looks awkward, but avoids unquoted var names group_modify(~ summary_factorlist(.x, dependent, explanatory)) %>% ff_stratify_helper(colon_s)
An "all-in-one" function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a final table for publication including summary statistics. The appropriate model is selected on the basis of dependent variable and whether a random effect is specified.
finalfit.lm
method (not called directly)
finalfit.glm
method (not called directly)
finalfit.coxph
method (not called directly)
finalfit( .data, dependent = NULL, explanatory = NULL, explanatory_multi = NULL, random_effect = NULL, formula = NULL, model_args = list(), weights = NULL, cont_cut = 5, column = NULL, keep_models = FALSE, metrics = FALSE, add_dependent_label = TRUE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", keep_fit_id = FALSE, ... ) finalfit.lm( .data, dependent, explanatory, explanatory_multi = NULL, random_effect = NULL, model_args = NULL, weights = NULL, cont_cut = 5, column = FALSE, keep_models = FALSE, metrics = FALSE, add_dependent_label = TRUE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", keep_fit_id = FALSE, ... ) finalfit.glm( .data, dependent, explanatory, explanatory_multi = NULL, random_effect = NULL, model_args = NULL, weights = NULL, cont_cut = 5, column = FALSE, keep_models = FALSE, metrics = FALSE, add_dependent_label = TRUE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", keep_fit_id = FALSE, ... ) finalfit.coxph( .data, dependent, explanatory, explanatory_multi = NULL, random_effect = NULL, model_args = NULL, column = TRUE, cont_cut = 5, keep_models = FALSE, metrics = FALSE, add_dependent_label = TRUE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", keep_fit_id = FALSE, ... )
finalfit( .data, dependent = NULL, explanatory = NULL, explanatory_multi = NULL, random_effect = NULL, formula = NULL, model_args = list(), weights = NULL, cont_cut = 5, column = NULL, keep_models = FALSE, metrics = FALSE, add_dependent_label = TRUE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", keep_fit_id = FALSE, ... ) finalfit.lm( .data, dependent, explanatory, explanatory_multi = NULL, random_effect = NULL, model_args = NULL, weights = NULL, cont_cut = 5, column = FALSE, keep_models = FALSE, metrics = FALSE, add_dependent_label = TRUE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", keep_fit_id = FALSE, ... ) finalfit.glm( .data, dependent, explanatory, explanatory_multi = NULL, random_effect = NULL, model_args = NULL, weights = NULL, cont_cut = 5, column = FALSE, keep_models = FALSE, metrics = FALSE, add_dependent_label = TRUE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", keep_fit_id = FALSE, ... ) finalfit.coxph( .data, dependent, explanatory, explanatory_multi = NULL, random_effect = NULL, model_args = NULL, column = TRUE, cont_cut = 5, keep_models = FALSE, metrics = FALSE, add_dependent_label = TRUE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", keep_fit_id = FALSE, ... )
.data |
Data frame or tibble. |
dependent |
Character vector of length 1: quoted name of dependent
variable. Can be continuous, a binary factor, or a survival object of form
|
explanatory |
Character vector of any length: quoted name(s) of explanatory variables. |
explanatory_multi |
Character vector of any length: quoted name(s) of a
subset of explanatory variables to generate reduced multivariable model
(must only contain variables contained in |
random_effect |
Character vector of length 1, either, (1) name of random
intercept variable, e.g. "var1", (automatically convered to "(1 | var1)");
or, (2) the full |
formula |
an object of class "formula" (or one that can be coerced to that class). Optional instead of standard dependent/explanatory format. Do not include if using dependent/explanatory. |
model_args |
|
weights |
Character vector of length 1: quoted name of weights variable.
Passed to |
cont_cut |
Numeric: number of unique values in continuous variable at which to consider it a factor. |
column |
Logical: Compute margins by column rather than row. |
keep_models |
Logical: include full multivariable model in output when
working with reduced multivariable model ( |
metrics |
Logical: include useful model metrics in output in publication format. |
add_dependent_label |
Add the name of the dependent label to the top left of table. |
dependent_label_prefix |
Add text before dependent label. |
dependent_label_suffix |
Add text after dependent label. |
keep_fit_id |
Keep original model output coefficient label (internal). |
... |
Other arguments to pass to |
Returns a data frame with the final model table.
library(finalfit) library(dplyr) # Summary, univariable and multivariable analyses of the form: # glm(depdendent ~ explanatory, family="binomial") # lmuni(), lmmulti(), lmmixed(), glmuni(), glmmulti(), glmmixed(), glmmultiboot(), # coxphuni(), coxphmulti() data(colon_s) # Modified from survival::colon explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory) # Multivariable analysis with subset of explanatory # variable set used in univariable analysis explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") dependent = "mort_5yr" colon_s %>% finalfit(dependent, explanatory, explanatory_multi) # Summary, univariable and multivariable analyses of the form: # lme4::glmer(dependent ~ explanatory + (1 | random_effect), family="binomial") explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = "mort_5yr" # colon_s %>% # finalfit(dependent, explanatory, explanatory_multi, random_effect) # Include model metrics: colon_s %>% finalfit(dependent, explanatory, explanatory_multi, metrics=TRUE) # Summary, univariable and multivariable analyses of the form: # survival::coxph(dependent ~ explanatory) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% finalfit(dependent, explanatory) # Rather than going all-in-one, any number of subset models can # be manually added on to a summary_factorlist() table using finalfit.merge(). # This is particularly useful when models take a long-time to run or are complicated. # Note requirement for fit_id=TRUE. # `fit2df` is a subfunction extracting most common models to a dataframe. explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory, metrics=TRUE) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = 'mort_5yr' # Separate tables colon_s %>% summary_factorlist(dependent, explanatory, fit_id=TRUE) -> example.summary colon_s %>% glmuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") -> example.univariable colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)") -> example.multivariable # Edited as CRAN slow to run these # colon_s %>% # glmmixed(dependent, explanatory, random_effect) %>% # fit2df(estimate_suffix=" (multilevel") -> example.multilevel # Pipe together example.summary %>% finalfit_merge(example.univariable) %>% finalfit_merge(example.multivariable, last_merge = TRUE) # finalfit_merge(example.multilevel)
library(finalfit) library(dplyr) # Summary, univariable and multivariable analyses of the form: # glm(depdendent ~ explanatory, family="binomial") # lmuni(), lmmulti(), lmmixed(), glmuni(), glmmulti(), glmmixed(), glmmultiboot(), # coxphuni(), coxphmulti() data(colon_s) # Modified from survival::colon explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory) # Multivariable analysis with subset of explanatory # variable set used in univariable analysis explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") dependent = "mort_5yr" colon_s %>% finalfit(dependent, explanatory, explanatory_multi) # Summary, univariable and multivariable analyses of the form: # lme4::glmer(dependent ~ explanatory + (1 | random_effect), family="binomial") explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = "mort_5yr" # colon_s %>% # finalfit(dependent, explanatory, explanatory_multi, random_effect) # Include model metrics: colon_s %>% finalfit(dependent, explanatory, explanatory_multi, metrics=TRUE) # Summary, univariable and multivariable analyses of the form: # survival::coxph(dependent ~ explanatory) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% finalfit(dependent, explanatory) # Rather than going all-in-one, any number of subset models can # be manually added on to a summary_factorlist() table using finalfit.merge(). # This is particularly useful when models take a long-time to run or are complicated. # Note requirement for fit_id=TRUE. # `fit2df` is a subfunction extracting most common models to a dataframe. explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory, metrics=TRUE) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = 'mort_5yr' # Separate tables colon_s %>% summary_factorlist(dependent, explanatory, fit_id=TRUE) -> example.summary colon_s %>% glmuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") -> example.univariable colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)") -> example.multivariable # Edited as CRAN slow to run these # colon_s %>% # glmmixed(dependent, explanatory, random_effect) %>% # fit2df(estimate_suffix=" (multilevel") -> example.multilevel # Pipe together example.summary %>% finalfit_merge(example.univariable) %>% finalfit_merge(example.multivariable, last_merge = TRUE) # finalfit_merge(example.multilevel)
finalfit
model
extractorsTakes output from finalfit
model wrappers and extracts to a dataframe,
convenient for further processing in preparation for final results table.
fit2df.lm
is the model extract method for lm
.
fit2df.lmlist
is the model extract method for lmuni
and
lmmulti
.
fit2df.glm
is the model extract method for standard
glm
models, which have not used finalfit
model
wrappers.
fit2df.glmboot
is the model extract method for glmmulti_boot
models.
fit2df.glmlist
is the model extract method for glmuni
and glmmulti
.
fit2df.svyglmlist
is the model extract method for svyglmuni
and svyglmmulti
.
fit2df.lmerMod
is the model extract method for standard
lme4::lmer
models and for the
finalfit::lmmixed
model wrapper.
fit2df.glmerMod
is the model extract method for standard
lme4::glmer
models and for the
finalfit::glmmixed
model wrapper.
fit2df.coxph
is the model extract method for survival::coxph
.
fit2df.coxphlist
is the model extract method for coxphuni
and coxphmulti
.
fit2df.crr
is the model extract method for cmprsk::crr
.
fit2df.coxme
is the model extract method for eoxme::coxme
.
fit2df.crr
is the model extract method for
crruni
and crrmulti
.
fit2df.stanfit
is the model extract method for our standard Bayesian
hierarchical binomial logistic regression models. These models will be fully
documented separately. However this should work for a single or multilevel
Bayesian logistic regression done in Stan, as long as the fixed effects are
specified in the parameters block as a vector named beta
, of length
P
, where P
is the number of fixed effect parameters. e.g.
parameters( vector[P] beta; )
fit2df.mipo
is the model extract method for the mipo
object
created using mice::pool
.
fit2df(...) ## S3 method for class 'lm' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "Coefficient", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_level = 0.95, confint_sep = " to ", ... ) ## S3 method for class 'lmlist' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "Coefficient", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_level = 0.95, confint_sep = " to ", ... ) ## S3 method for class 'glm' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "OR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = TRUE, confint_type = "profile", confint_level = 0.95, confint_sep = "-", ... ) ## S3 method for class 'glmboot' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "OR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = TRUE, confint_level = 0.95, confint_sep = "-", ... ) ## S3 method for class 'glmlist' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "OR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = TRUE, confint_type = "profile", confint_level = 0.95, confint_sep = "-", ... ) ## S3 method for class 'svyglmlist' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "Coefficient", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = FALSE, confint_type = "profile", confint_level = 0.95, confint_sep = "-", ... ) ## S3 method for class 'lmerMod' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "Coefficient", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_type = "Wald", confint_level = 0.95, confint_sep = " to ", ... ) ## S3 method for class 'glmerMod' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "OR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = TRUE, confint_type = "Wald", confint_level = 0.95, confint_sep = "-", ... ) ## S3 method for class 'coxph' fit2df( .data, condense = TRUE, metrics = FALSE, explanatory_name = "explanatory", estimate_name = "HR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'coxphlist' fit2df( .data, condense = TRUE, metrics = FALSE, explanatory_name = "explanatory", estimate_name = "HR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'crr' fit2df( .data, condense = TRUE, metrics = FALSE, explanatory_name = "explanatory", estimate_name = "HR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'coxme' fit2df( .data, condense = TRUE, metrics = FALSE, explanatory_name = "explanatory", estimate_name = "HR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'crrlist' fit2df( .data, condense = TRUE, metrics = FALSE, explanatory_name = "explanatory", estimate_name = "HR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'stanfit' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "OR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'mipo' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "Coefficient", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = FALSE, confint_level = 0.95, confint_sep = "-", ... )
fit2df(...) ## S3 method for class 'lm' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "Coefficient", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_level = 0.95, confint_sep = " to ", ... ) ## S3 method for class 'lmlist' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "Coefficient", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_level = 0.95, confint_sep = " to ", ... ) ## S3 method for class 'glm' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "OR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = TRUE, confint_type = "profile", confint_level = 0.95, confint_sep = "-", ... ) ## S3 method for class 'glmboot' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "OR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = TRUE, confint_level = 0.95, confint_sep = "-", ... ) ## S3 method for class 'glmlist' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "OR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = TRUE, confint_type = "profile", confint_level = 0.95, confint_sep = "-", ... ) ## S3 method for class 'svyglmlist' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "Coefficient", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = FALSE, confint_type = "profile", confint_level = 0.95, confint_sep = "-", ... ) ## S3 method for class 'lmerMod' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "Coefficient", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_type = "Wald", confint_level = 0.95, confint_sep = " to ", ... ) ## S3 method for class 'glmerMod' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "OR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = TRUE, confint_type = "Wald", confint_level = 0.95, confint_sep = "-", ... ) ## S3 method for class 'coxph' fit2df( .data, condense = TRUE, metrics = FALSE, explanatory_name = "explanatory", estimate_name = "HR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'coxphlist' fit2df( .data, condense = TRUE, metrics = FALSE, explanatory_name = "explanatory", estimate_name = "HR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'crr' fit2df( .data, condense = TRUE, metrics = FALSE, explanatory_name = "explanatory", estimate_name = "HR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'coxme' fit2df( .data, condense = TRUE, metrics = FALSE, explanatory_name = "explanatory", estimate_name = "HR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'crrlist' fit2df( .data, condense = TRUE, metrics = FALSE, explanatory_name = "explanatory", estimate_name = "HR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'stanfit' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "OR", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), confint_sep = "-", ... ) ## S3 method for class 'mipo' fit2df( .data, condense = TRUE, metrics = FALSE, remove_intercept = TRUE, explanatory_name = "explanatory", estimate_name = "Coefficient", estimate_suffix = "", p_name = "p", digits = c(2, 2, 3), exp = FALSE, confint_level = 0.95, confint_sep = "-", ... )
... |
Other arguments: |
.data |
Output from |
condense |
Logical: when true, effect estimates, confidence intervals and p-values are pasted conveniently together in single cell. |
metrics |
Logical: when true, useful model metrics are extracted. |
remove_intercept |
Logical: remove the results for the intercept term. |
explanatory_name |
Name for this column in output |
estimate_name |
Name for this column in output |
estimate_suffix |
Appeneded to estimate name |
p_name |
Name given to p-value estimate |
digits |
Number of digits to round to (1) estimate, (2) confidence interval limits, (3) p-value. |
confint_level |
The confidence level required. |
confint_sep |
String to separate confidence intervals, typically "-" or " to ". |
exp |
Currently GLM only. Exponentiate coefficients and confidence intervals. Defaults to TRUE. |
confint_type |
One of |
fit2df
is a generic (S3) function for model extract.
A dataframe of model parameters. When metrics=TRUE
output is a
list of two dataframes, one is model parameters, one is model metrics.
length two
library(finalfit) library(dplyr) library(survival) # glm fit = glm(mort_5yr ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data=colon_s, family="binomial") fit %>% fit2df(estimate_suffix=" (multivariable)") # glmlist explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") # glmerMod explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "mort_5yr" colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel)") # glmboot ## Note number of draws set to 100 just for speed in this example explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmmulti_boot(dependent, explanatory, R = 100) %>% fit2df(estimate_suffix=" (multivariable (BS CIs))") # lm fit = lm(nodes ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data=colon_s) fit %>% fit2df(estimate_suffix=" (multivariable)") # lmerMod explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "nodes" colon_s %>% lmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel") # coxphlist explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% coxphuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") colon_s %>% coxphmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)") # coxph fit = coxph(Surv(time, status) ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data = colon_s) fit %>% fit2df(estimate_suffix=" (multivariable)") # crr: competing risks melanoma = boot::melanoma melanoma = melanoma %>% mutate( status_crr = ifelse(status == 2, 0, # "still alive" ifelse(status == 1, 1, # "died of melanoma" 2)), # "died of other causes" sex = factor(sex), ulcer = factor(ulcer) ) dependent = c("Surv(time, status_crr)") explanatory = c("sex", "age", "ulcer") melanoma %>% summary_factorlist(dependent, explanatory, column = TRUE, fit_id = TRUE) %>% ff_merge( melanoma %>% crrmulti(dependent, explanatory) %>% fit2df(estimate_suffix = " (competing risks)") ) %>% select(-fit_id, -index) %>% dependent_label(melanoma, dependent)
library(finalfit) library(dplyr) library(survival) # glm fit = glm(mort_5yr ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data=colon_s, family="binomial") fit %>% fit2df(estimate_suffix=" (multivariable)") # glmlist explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") # glmerMod explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "mort_5yr" colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel)") # glmboot ## Note number of draws set to 100 just for speed in this example explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmmulti_boot(dependent, explanatory, R = 100) %>% fit2df(estimate_suffix=" (multivariable (BS CIs))") # lm fit = lm(nodes ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data=colon_s) fit %>% fit2df(estimate_suffix=" (multivariable)") # lmerMod explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "nodes" colon_s %>% lmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel") # coxphlist explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% coxphuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") colon_s %>% coxphmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)") # coxph fit = coxph(Surv(time, status) ~ age.factor + sex.factor + obstruct.factor + perfor.factor, data = colon_s) fit %>% fit2df(estimate_suffix=" (multivariable)") # crr: competing risks melanoma = boot::melanoma melanoma = melanoma %>% mutate( status_crr = ifelse(status == 2, 0, # "still alive" ifelse(status == 1, 1, # "died of melanoma" 2)), # "died of other causes" sex = factor(sex), ulcer = factor(ulcer) ) dependent = c("Surv(time, status_crr)") explanatory = c("sex", "age", "ulcer") melanoma %>% summary_factorlist(dependent, explanatory, column = TRUE, fit_id = TRUE) %>% ff_merge( melanoma %>% crrmulti(dependent, explanatory) %>% fit2df(estimate_suffix = " (competing risks)") ) %>% select(-fit_id, -index) %>% dependent_label(melanoma, dependent)
Internal, function, not called directly
format_n_percent(n, percent, digits, digits_n = 0, na_include = TRUE)
format_n_percent(n, percent, digits, digits_n = 0, na_include = TRUE)
n |
Value |
percent |
Value |
digits |
Value |
digits_n |
Value. Used when using weighted frequency counts |
na_include |
When proportion missing, include in parentheses? |
finalfit
model wrapperUsing finalfit
conventions, produces mixed effects binomial logistic
regression models for a set of explanatory variables against a binary dependent.
glmmixed(.data, dependent, explanatory, random_effect, ...)
glmmixed(.data, dependent, explanatory, random_effect, ...)
.data |
Dataframe. |
dependent |
Character vector of length 1, name of depdendent variable (must have 2 levels). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
random_effect |
Character vector of length 1, either, (1) name of random
intercept variable, e.g. "var1", (automatically convered to "(1 | var1)");
or, (2) the full |
... |
Other arguments to pass to |
Uses lme4::glmer
with finalfit
modelling conventions. Output can be
passed to fit2df
. This is only currently set-up to take a single random effect
as a random intercept. Can be updated in future to allow multiple random intercepts,
random gradients and interactions on random effects if there is a need
A list of multivariable lme4::glmer
fitted model outputs.
Output is of class glmerMod
.
Other finalfit model wrappers:
coxphmulti()
,
coxphuni()
,
crrmulti()
,
crruni()
,
glmmulti_boot()
,
glmmulti()
,
glmuni()
,
lmmixed()
,
lmmulti()
,
lmuni()
,
svyglmmulti()
,
svyglmuni()
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "mort_5yr" colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel)")
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "mort_5yr" colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel)")
finalfit
model
wrapperUsing finalfit
conventions, produces a multivariable binomial
logistic regression model for a set of explanatory variables against a
binary dependent.
glmmulti(.data, dependent, explanatory, family = "binomial", weights = "", ...)
glmmulti(.data, dependent, explanatory, family = "binomial", weights = "", ...)
.data |
Data frame. |
dependent |
Character vector of length 1: name of dependent variable (must have 2 levels). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
family |
Character vector quoted or unquoted of the error distribution
and link function to be used in the model, see |
weights |
Character vector of length 1: name of variabe for weighting. 'Prior weights' to be used in the fitting process. |
... |
Other arguments to pass to |
Uses glm
with finalfit
modelling conventions.
Output can be passed to fit2df
.
A multivariable glm
fitted model.
Other finalfit model wrappers:
coxphmulti()
,
coxphuni()
,
crrmulti()
,
crruni()
,
glmmixed()
,
glmmulti_boot()
,
glmuni()
,
lmmixed()
,
lmmulti()
,
lmuni()
,
svyglmmulti()
,
svyglmuni()
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)")
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)")
finalfit
model wrapperUsing finalfit
conventions, produces a multivariable binomial logistic
regression models for a set of explanatory variables against a binary
dependent.
glmmulti_boot(.data, dependent, explanatory, R = 1000)
glmmulti_boot(.data, dependent, explanatory, R = 1000)
.data |
Dataframe. |
dependent |
Character vector length 1: name of depdendent variable (must have 2 levels). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
R |
Number of draws. |
Uses glm
with finalfit
modelling conventions.
boot::boot
is used to draw bootstrapped confidence
intervals on fixed effect model coefficients. Output can be passed to
fit2df
.
A multivariable glm
fitted model with
bootstrapped confidence intervals. Output is of class glmboot
.
Other finalfit model wrappers:
coxphmulti()
,
coxphuni()
,
crrmulti()
,
crruni()
,
glmmixed()
,
glmmulti()
,
glmuni()
,
lmmixed()
,
lmmulti()
,
lmuni()
,
svyglmmulti()
,
svyglmuni()
library(finalfit) library(dplyr) ## Note number of draws set to 100 just for speed in this example explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmmulti_boot(dependent, explanatory, R=100) %>% fit2df(estimate_suffix="(multivariable (BS CIs))")
library(finalfit) library(dplyr) ## Note number of draws set to 100 just for speed in this example explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmmulti_boot(dependent, explanatory, R=100) %>% fit2df(estimate_suffix="(multivariable (BS CIs))")
finalfit
model wrapperUsing finalfit
conventions, produces multiple univariable binomial logistic
regression models for a set of explanatory variables against a binary dependent.
glmuni(.data, dependent, explanatory, family = "binomial", weights = "", ...)
glmuni(.data, dependent, explanatory, family = "binomial", weights = "", ...)
.data |
Data frame. |
dependent |
Character vector of length 1: name of depdendent variable (must have 2 levels). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
family |
Character vector quoted or unquoted of the error distribution
and link function to be used in the model, see |
weights |
Character vector of length 1: name of variabe for weighting. 'Prior weights' to be used in the fitting process. |
... |
Other arguments to pass to |
Uses glm
with finalfit
modelling conventions. Output can be
passed to fit2df
.
A list of univariable glm
fitted model outputs.
Output is of class glmlist
.
Other finalfit model wrappers:
coxphmulti()
,
coxphuni()
,
crrmulti()
,
crruni()
,
glmmixed()
,
glmmulti_boot()
,
glmmulti()
,
lmmixed()
,
lmmulti()
,
lmuni()
,
svyglmmulti()
,
svyglmuni()
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)")
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% glmuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)")
Produce hazard ratio table and plot from a Cox Proportional Hazards analysis, survival::coxph()
.
hr_plot( .data, dependent, explanatory, factorlist = NULL, coxfit = NULL, remove_ref = FALSE, breaks = NULL, column_space = c(-0.5, 0, 0.5), dependent_label = "Survival", prefix = "", suffix = ": HR (95% CI, p-value)", table_text_size = 4, title_text_size = 13, plot_opts = NULL, table_opts = NULL, ... )
hr_plot( .data, dependent, explanatory, factorlist = NULL, coxfit = NULL, remove_ref = FALSE, breaks = NULL, column_space = c(-0.5, 0, 0.5), dependent_label = "Survival", prefix = "", suffix = ": HR (95% CI, p-value)", table_text_size = 4, title_text_size = 13, plot_opts = NULL, table_opts = NULL, ... )
.data |
Dataframe. |
dependent |
Character vector of length 1: name of survival object in form |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
factorlist |
Option to provide output directly from |
coxfit |
Option to provide output directly from |
remove_ref |
Logical. Remove reference level for factors. |
breaks |
Manually specify x-axis breaks in format |
column_space |
Adjust table column spacing. |
dependent_label |
Main label for plot. |
prefix |
Plots are titled by default with the dependent variable. This adds text before that label. |
suffix |
Plots are titled with the dependent variable. This adds text after that label. |
table_text_size |
Alter font size of table text. |
title_text_size |
Alter font size of title text. |
plot_opts |
A list of arguments to be appended to the ggplot call by "+". |
table_opts |
A list of arguments to be appended to the ggplot table call by "+". |
... |
Other parameters passed to |
Returns a table and plot produced in ggplot2
.
Other finalfit plot functions:
coefficient_plot()
,
ff_plot()
,
or_plot()
,
surv_plot()
# HR plot library(finalfit) library(dplyr) library(ggplot2) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% hr_plot(dependent, explanatory, dependent_label = "Survival") colon_s %>% hr_plot(dependent, explanatory, dependent_label = "Survival", table_text_size=4, title_text_size=14, plot_opts=list(xlab("HR, 95% CI"), theme(axis.title = element_text(size=12))))
# HR plot library(finalfit) library(dplyr) library(ggplot2) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% hr_plot(dependent, explanatory, dependent_label = "Survival") colon_s %>% hr_plot(dependent, explanatory, dependent_label = "Survival", table_text_size=4, title_text_size=14, plot_opts=list(xlab("HR, 95% CI"), theme(axis.title = element_text(size=12))))
Labels to column names
labels_to_column(.data)
labels_to_column(.data)
.data |
Data frame or tibble. |
Data frame or tibble
library(dplyr) colon_s %>% select(sex.factor) %>% labels_to_column()
library(dplyr) colon_s %>% select(sex.factor) %>% labels_to_column()
For use with forcats::fct_relabel.
labels_to_level(.data, .labels)
labels_to_level(.data, .labels)
.data |
Data frame or tibble. |
.labels |
Output from |
Data frame or tibble
library(dplyr) vlabels = extract_variable_label(colon_s) colon_s %>% select(sex.factor, obstruct.factor) %>% tidyr::gather() %>% mutate( key = forcats::fct_relabel(key, labels_to_level, vlabels) )
library(dplyr) vlabels = extract_variable_label(colon_s) colon_s %>% select(sex.factor, obstruct.factor) %>% tidyr::gather() %>% mutate( key = forcats::fct_relabel(key, labels_to_level, vlabels) )
finalfit
model wrapperUsing finalfit
conventions, produces mixed effects linear regression
models for a set of explanatory variables against a continuous dependent.
lmmixed(.data, dependent, explanatory, random_effect, ...)
lmmixed(.data, dependent, explanatory, random_effect, ...)
.data |
Dataframe. |
dependent |
Character vector of length 1, name of depdendent variable (must be continuous vector). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
random_effect |
Character vector of length 1, either, (1) name of random
intercept variable, e.g. "var1", (automatically convered to "(1 | var1)");
or, (2) the full |
... |
Other arguments to pass to |
Uses lme4::lmer
with finalfit
modelling
conventions. Output can be passed to fit2df
. This is only
currently set-up to take a single random effect as a random intercept. Can be
updated in future to allow multiple random intercepts, random gradients and
interactions on random effects if there is a need.
A list of multivariable lme4::lmer
fitted model
outputs. Output is of class lmerMod
.
Other finalfit model wrappers:
coxphmulti()
,
coxphuni()
,
crrmulti()
,
crruni()
,
glmmixed()
,
glmmulti_boot()
,
glmmulti()
,
glmuni()
,
lmmulti()
,
lmuni()
,
svyglmmulti()
,
svyglmuni()
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "nodes" colon_s %>% lmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel")
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") random_effect = "hospital" dependent = "nodes" colon_s %>% lmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel")
finalfit
model wrapperUsing finalfit
conventions, produces a multivariable linear regression
model for a set of explanatory variables against a continuous dependent.
lmmulti(.data, dependent, explanatory, weights = "", ...)
lmmulti(.data, dependent, explanatory, weights = "", ...)
.data |
Dataframe. |
dependent |
Character vector of length 1: name of depdendent variable (must a continuous vector). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
weights |
Character vector of length 1: name of variabe for weighting. 'Prior weights' to be used in the fitting process. |
... |
Other arguments to pass to |
Uses lm
with finalfit
modelling conventions.
Output can be passed to fit2df
.
A multivariable lm
fitted model.
Other finalfit model wrappers:
coxphmulti()
,
coxphuni()
,
crrmulti()
,
crruni()
,
glmmixed()
,
glmmulti_boot()
,
glmmulti()
,
glmuni()
,
lmmixed()
,
lmuni()
,
svyglmmulti()
,
svyglmuni()
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "nodes" colon_s %>% lmmulti(dependent, explanatory) %>% fit2df()
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "nodes" colon_s %>% lmmulti(dependent, explanatory) %>% fit2df()
finalfit
model wrapperUsing finalfit
conventions, produces multiple univariable linear
regression models for a set of explanatory variables against a continuous dependent.
lmuni(.data, dependent, explanatory, weights = "", ...)
lmuni(.data, dependent, explanatory, weights = "", ...)
.data |
Dataframe. |
dependent |
Character vector of length 1, name of depdendent variable (must be continuous vector). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
weights |
Character vector of length 1: name of variabe for weighting. 'Prior weights' to be used in the fitting process. |
... |
Other arguments to pass to |
Uses lm
with finalfit
modelling conventions. Output can be
passed to fit2df
.
A list of multivariable lm
fitted model outputs.
Output is of class lmlist
.
Other finalfit model wrappers:
coxphmulti()
,
coxphuni()
,
crrmulti()
,
crruni()
,
glmmixed()
,
glmmulti_boot()
,
glmmulti()
,
glmuni()
,
lmmixed()
,
lmmulti()
,
svyglmmulti()
,
svyglmuni()
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "nodes" colon_s %>% lmuni(dependent, explanatory) %>% fit2df()
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "nodes" colon_s %>% lmuni(dependent, explanatory) %>% fit2df()
Internal, not usually called directly
metrics_hoslem(y, yhat, g = 10, digits = c(2, 3))
metrics_hoslem(y, yhat, g = 10, digits = c(2, 3))
y |
Observed y, usually of the form |
yhat |
Predicted y_hat, usually for the form |
g |
Number of bins to calculate quantiles. |
digits |
Number of decimal places of form |
Character string of chi-sq result, df, and p-value. Significant p-value suggests poor fit.
Adapted from Peter Solymos.
https://github.com/psolymos/ResourceSelection/blob/master/R/hoslem.test.R
fit = glm(mort_5yr~age.factor+extent.factor, data=colon_s, family="binomial") metrics_hoslem(fit$y, fit$fitted)
fit = glm(mort_5yr~age.factor+extent.factor, data=colon_s, family="binomial") metrics_hoslem(fit$y, fit$fitted)
Compare missing data
missing_compare( .data, dependent, explanatory, p = TRUE, na_include = FALSE, ... )
missing_compare( .data, dependent, explanatory, p = TRUE, na_include = FALSE, ... )
.data |
Dataframe. |
dependent |
Variable to test missingness against other variables with. |
explanatory |
Variables to have missingness tested against. |
p |
Logical: Include null hypothesis statistical test. |
na_include |
Include missing data in explanatory variables as a factor level. |
... |
Other arguments to |
A dataframe comparing missing data in the dependent variable across explanatory variables. Continuous data are compared with an Analysis of Variance F-test by default. Discrete data are compared with a chi-squared test.
library(finalfit) explanatory = c("age", "age.factor", "extent.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% ff_glimpse(dependent, explanatory) colon_s %>% missing_pattern(dependent, explanatory) colon_s %>% missing_compare(dependent, explanatory)
library(finalfit) explanatory = c("age", "age.factor", "extent.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% ff_glimpse(dependent, explanatory) colon_s %>% missing_pattern(dependent, explanatory) colon_s %>% missing_compare(dependent, explanatory)
Summary of missing values
missing_glimpse(.data, dependent = NULL, explanatory = NULL, digits = 1)
missing_glimpse(.data, dependent = NULL, explanatory = NULL, digits = 1)
.data |
Data frame. |
dependent |
Optional character vector: name(s) of depdendent variable(s). |
explanatory |
Optional character vector: name(s) of explanatory variable(s). |
digits |
Number of decmial places to show for percentage missing. |
Data frame.
colon_s %>% missing_glimpse()
colon_s %>% missing_glimpse()
Compare the occurence of missing values in all variables by each other.
Suggest limit the number of variables to a maximum of around six.
Dependent
and explanatory
are for convenience of variable
selection, are optional, and have no other specific function.
missing_pairs( .data, dependent = NULL, explanatory = NULL, use_labels = TRUE, title = NULL, position = "stack", showXAxisPlotLabels = TRUE, showYAxisPlotLabels = FALSE )
missing_pairs( .data, dependent = NULL, explanatory = NULL, use_labels = TRUE, title = NULL, position = "stack", showXAxisPlotLabels = TRUE, showYAxisPlotLabels = FALSE )
.data |
Data frame. |
dependent |
Character vector. Optional name of dependent variable. |
explanatory |
Character vector. Optional name(s) of explanatory variables. |
use_labels |
Use variable label names in plot labelling. |
title |
Character vector. Optional title for plot. |
position |
For discrete variables, choose "stack" or "fill" to show counts or proportions. |
showXAxisPlotLabels |
Show x-axis plot labels. |
showYAxisPlotLabels |
Show y-axis plot labels. |
A plot matrix comparing missing values in all variables against each other.
## Not run: explanatory = c("age", "nodes", "age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% missing_pairs(dependent, explanatory) ## End(Not run)
## Not run: explanatory = c("age", "nodes", "age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% missing_pairs(dependent, explanatory) ## End(Not run)
finalfit
modelsUsing finalfit
conventions, produces a missing data matrix using
md.pattern
.
missing_pattern( .data, dependent = NULL, explanatory = NULL, rotate.names = TRUE, ... )
missing_pattern( .data, dependent = NULL, explanatory = NULL, rotate.names = TRUE, ... )
.data |
Data frame. Missing values must be coded |
dependent |
Character vector usually of length 1, name of depdendent variable. |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
rotate.names |
Logical. Should the orientation of variable names on plot should be vertical. |
... |
pass other arguments such as |
A matrix with ncol(x)+1
columns, in which each row corresponds
to a missing data pattern (1=observed, 0=missing). Rows and columns are
sorted in increasing amounts of missing information. The last column and
row contain row and column counts, respectively.
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% missing_pattern(dependent, explanatory)
library(finalfit) library(dplyr) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% missing_pattern(dependent, explanatory)
Create a plot of missing values by observations on the x-axis and variable on
the y-axis. Dependent
and explanatory
are for convenience and are optional.
missing_plot( .data, dependent = NULL, explanatory = NULL, use_labels = TRUE, title = NULL, plot_opts = NULL )
missing_plot( .data, dependent = NULL, explanatory = NULL, use_labels = TRUE, title = NULL, plot_opts = NULL )
.data |
Data frame. |
dependent |
Character vector. Optional name of dependent variable. |
explanatory |
Character vector. Optional name(s) of explanatory variables. |
use_labels |
Use variable label names in plot labelling. |
title |
Character vector. Optional title for plot. |
plot_opts |
A list of arguments to be appended to the ggplot call by "+". |
Heat map of missing values in dataset.
colon_s %>% missing_plot()
colon_s %>% missing_plot()
Create predictorMatrix for use with mice
missing_predictorMatrix( .data, drop_from_imputed = NULL, drop_from_imputer = NULL )
missing_predictorMatrix( .data, drop_from_imputed = NULL, drop_from_imputer = NULL )
.data |
Data frame. |
drop_from_imputed |
Quoted names of variables not to impute. |
drop_from_imputer |
Quoted names of variables not to use in imputation algorithm. |
Matrix formatted for predictorMatrix argument in mice.
library(mice) library(dplyr) # Create some extra missing data ## Smoking missing completely at random set.seed(1) colon_s$smoking_mcar = sample(c("Smoker", "Non-smoker", NA), dim(colon_s)[1], replace=TRUE, prob = c(0.2, 0.7, 0.1)) %>% factor() %>% ff_label("Smoking (MCAR)") ## Make smoking missing conditional on patient sex colon_s$smoking_mar[colon_s$sex.factor == "Female"] = sample(c("Smoker", "Non-smoker", NA), sum(colon_s$sex.factor == "Female"), replace = TRUE, prob = c(0.1, 0.5, 0.4)) colon_s$smoking_mar[colon_s$sex.factor == "Male"] = sample(c("Smoker", "Non-smoker", NA), sum(colon_s$sex.factor == "Male"), replace=TRUE, prob = c(0.15, 0.75, 0.1)) colon_s$smoking_mar = factor(colon_s$smoking_mar)%>% ff_label("Smoking (MAR)") explanatory = c("age", "sex.factor", "nodes", "obstruct.factor", "smoking_mar") dependent = "mort_5yr" colon_s %>% select(dependent, explanatory) %>% missing_predictorMatrix(drop_from_imputed = c("obstruct.factor", "mort_5yr")) -> predM colon_s %>% select(dependent, explanatory) %>% mice(m = 2, predictorMatrix = predM) %>% # e.g. m=10 when for real # Run logistic regression on each imputed set with(glm(formula(ff_formula(dependent, explanatory)), family="binomial")) %>% pool() %>% summary(conf.int = TRUE, exponentiate = TRUE) %>% # Jiggle into finalfit format mutate(explanatory_name = rownames(.)) %>% select(explanatory_name, estimate, `2.5 %`, `97.5 %`, p.value) %>% condense_fit(estimate_suffix = " (multiple imputation)") %>% remove_intercept() -> fit_imputed
library(mice) library(dplyr) # Create some extra missing data ## Smoking missing completely at random set.seed(1) colon_s$smoking_mcar = sample(c("Smoker", "Non-smoker", NA), dim(colon_s)[1], replace=TRUE, prob = c(0.2, 0.7, 0.1)) %>% factor() %>% ff_label("Smoking (MCAR)") ## Make smoking missing conditional on patient sex colon_s$smoking_mar[colon_s$sex.factor == "Female"] = sample(c("Smoker", "Non-smoker", NA), sum(colon_s$sex.factor == "Female"), replace = TRUE, prob = c(0.1, 0.5, 0.4)) colon_s$smoking_mar[colon_s$sex.factor == "Male"] = sample(c("Smoker", "Non-smoker", NA), sum(colon_s$sex.factor == "Male"), replace=TRUE, prob = c(0.15, 0.75, 0.1)) colon_s$smoking_mar = factor(colon_s$smoking_mar)%>% ff_label("Smoking (MAR)") explanatory = c("age", "sex.factor", "nodes", "obstruct.factor", "smoking_mar") dependent = "mort_5yr" colon_s %>% select(dependent, explanatory) %>% missing_predictorMatrix(drop_from_imputed = c("obstruct.factor", "mort_5yr")) -> predM colon_s %>% select(dependent, explanatory) %>% mice(m = 2, predictorMatrix = predM) %>% # e.g. m=10 when for real # Run logistic regression on each imputed set with(glm(formula(ff_formula(dependent, explanatory)), family="binomial")) %>% pool() %>% summary(conf.int = TRUE, exponentiate = TRUE) %>% # Jiggle into finalfit format mutate(explanatory_name = rownames(.)) %>% select(explanatory_name, estimate, `2.5 %`, `97.5 %`, p.value) %>% condense_fit(estimate_suffix = " (multiple imputation)") %>% remove_intercept() -> fit_imputed
Produce an odds ratio table and plot from a glm()
or
lme4::glmer()
model.
or_plot( .data, dependent, explanatory, random_effect = NULL, factorlist = NULL, glmfit = NULL, confint_type = NULL, confint_level = 0.95, remove_ref = FALSE, breaks = NULL, column_space = c(-0.5, 0, 0.5), dependent_label = NULL, prefix = "", suffix = NULL, table_text_size = 4, title_text_size = 13, plot_opts = NULL, table_opts = NULL, ... )
or_plot( .data, dependent, explanatory, random_effect = NULL, factorlist = NULL, glmfit = NULL, confint_type = NULL, confint_level = 0.95, remove_ref = FALSE, breaks = NULL, column_space = c(-0.5, 0, 0.5), dependent_label = NULL, prefix = "", suffix = NULL, table_text_size = 4, title_text_size = 13, plot_opts = NULL, table_opts = NULL, ... )
.data |
Data frame. |
dependent |
Character vector of length 1: name of depdendent variable (must have 2 levels). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
random_effect |
Character vector of length 1, name of random effect variable. |
factorlist |
Option to provide output directly from
|
glmfit |
Option to provide output directly from |
confint_type |
One of |
confint_level |
The confidence level required. |
remove_ref |
Logical. Remove reference level for factors. |
breaks |
Manually specify x-axis breaks in format |
column_space |
Adjust table column spacing. |
dependent_label |
Main label for plot. |
prefix |
Plots are titled by default with the dependent variable. This adds text before that label. |
suffix |
Plots are titled with the dependent variable. This adds text after that label. |
table_text_size |
Alter font size of table text. |
title_text_size |
Alter font size of title text. |
plot_opts |
A list of arguments to be appended to the ggplot call by "+". |
table_opts |
A list of arguments to be appended to the ggplot table call by "+". |
... |
Other parameters. |
Returns a table and plot produced in ggplot2
.
Other finalfit plot functions:
coefficient_plot()
,
ff_plot()
,
hr_plot()
,
surv_plot()
library(finalfit) library(dplyr) library(ggplot2) # OR plot explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% or_plot(dependent, explanatory) colon_s %>% or_plot(dependent, explanatory, table_text_size=4, title_text_size=14, plot_opts=list(xlab("OR, 95% CI"), theme(axis.title = element_text(size=12))))
library(finalfit) library(dplyr) library(ggplot2) # OR plot explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% or_plot(dependent, explanatory) colon_s %>% or_plot(dependent, explanatory, table_text_size=4, title_text_size=14, plot_opts=list(xlab("OR, 95% CI"), theme(axis.title = element_text(size=12))))
Internal function, not called directly
p_tidy(x, digits, prefix = "=")
p_tidy(x, digits, prefix = "=")
x |
Numeric vector of values to round |
digits |
Integer of length one: value to round to. |
prefix |
Appended in front of values for use with |
e.g. for 3 decimal places I want 0.100, not 0.1. Note this function with convert 0.000 to <0.001. All other values are prefixed with "=" by default
Vector of strings.
Remove duplicates and replace
rm_duplicates(.var, fromLast = FALSE, replacement = "")
rm_duplicates(.var, fromLast = FALSE, replacement = "")
.var |
Vector. |
fromLast |
Logical. Consider duplication from last to first. |
replacement |
Character for what to replace duplicate with. |
Character vector.
It is common to want to remove cases/rows where all variables in a particular set are missing, e.g. all symptom variables are missing in a health care dataset.
rm_empty_block(.data, ...)
rm_empty_block(.data, ...)
.data |
Dataframe. |
... |
Unquoted variable/column names. |
Data frame.
# Pretend that we want to remove rows that are missing in group1, group2, and group3 # but keep rest of dataset. colon_s %>% dplyr::mutate( group1 = rep(c(NA, 1), length.out = 929), group2 = rep(c(NA, 1), length.out = 929), group3 = rep(c(NA, 1), length.out = 929) ) %>% rm_empty_block(group1, group2, group3) %>% head()
# Pretend that we want to remove rows that are missing in group1, group2, and group3 # but keep rest of dataset. colon_s %>% dplyr::mutate( group1 = rep(c(NA, 1), length.out = 929), group2 = rep(c(NA, 1), length.out = 929), group3 = rep(c(NA, 1), length.out = 929) ) %>% rm_empty_block(group1, group2, group3) %>% head()
e.g. for 3 decimal places I want 1.200, not 1.2.
round_tidy(x, digits)
round_tidy(x, digits)
x |
Numeric vector of values to round |
digits |
Integer of length one: value to round to. |
Vector of strings.
round_tidy(0.01023, 3)
round_tidy(0.01023, 3)
When producing conditional estimates from a regression model, it is often useful to set variables not of interest to their mode for factors and mean or median for numerics when creating the newdata object.
summary_df(.data, cont = "mean")
summary_df(.data, cont = "mean")
.data |
A data frame or tibble. |
cont |
One of "mean" or "median": the summary estimate for continuous variables. |
A data frame or tibble with the mode for factors and mean/median for continuous variables.
library(dplyr) colon_s %>% select(age, sex.factor, obstruct.factor, perfor.factor) %>% summary_df() colon_s %>% select(age, sex.factor, obstruct.factor, perfor.factor) %>% summary_df(cont = "median")
library(dplyr) colon_s %>% select(age, sex.factor, obstruct.factor, perfor.factor) %>% summary_df() colon_s %>% select(age, sex.factor, obstruct.factor, perfor.factor) %>% summary_df(cont = "median")
A function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a summary table.
summary_factorlist( .data, dependent = NULL, explanatory = NULL, formula = NULL, cont = "mean", cont_nonpara = NULL, cont_cut = 5, cont_range = TRUE, p = FALSE, p_cont_para = "aov", p_cat = "chisq", column = TRUE, total_col = FALSE, orderbytotal = FALSE, digits = c(1, 1, 3, 1, 0), na_include = FALSE, na_include_dependent = FALSE, na_complete_cases = FALSE, na_to_p = FALSE, na_to_prop = TRUE, fit_id = FALSE, add_dependent_label = FALSE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", add_col_totals = FALSE, include_col_totals_percent = TRUE, col_totals_rowname = NULL, col_totals_prefix = "", add_row_totals = FALSE, include_row_totals_percent = TRUE, include_row_missing_col = TRUE, row_totals_colname = "Total N", row_missing_colname = "Missing N", catTest = NULL, weights = NULL )
summary_factorlist( .data, dependent = NULL, explanatory = NULL, formula = NULL, cont = "mean", cont_nonpara = NULL, cont_cut = 5, cont_range = TRUE, p = FALSE, p_cont_para = "aov", p_cat = "chisq", column = TRUE, total_col = FALSE, orderbytotal = FALSE, digits = c(1, 1, 3, 1, 0), na_include = FALSE, na_include_dependent = FALSE, na_complete_cases = FALSE, na_to_p = FALSE, na_to_prop = TRUE, fit_id = FALSE, add_dependent_label = FALSE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", add_col_totals = FALSE, include_col_totals_percent = TRUE, col_totals_rowname = NULL, col_totals_prefix = "", add_row_totals = FALSE, include_row_totals_percent = TRUE, include_row_missing_col = TRUE, row_totals_colname = "Total N", row_missing_colname = "Missing N", catTest = NULL, weights = NULL )
.data |
Dataframe. |
dependent |
Character vector of length 1: name of dependent variable (2 to 5 factor levels). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
formula |
an object of class "formula" (or one that can be coerced to that class). Optional instead of standard dependent/explanatory format. Do not include if using dependent/explanatory. |
cont |
Summary for continuous explanatory variables: "mean" (standard deviation) or "median" (interquartile range). If "median" then non-parametric hypothesis test performed (see below). |
cont_nonpara |
Numeric vector of form e.g. |
cont_cut |
Numeric: number of unique values in continuous variable at which to consider it a factor. |
cont_range |
Logical. Median is show with 1st and 3rd quartiles. |
p |
Logical: Include null hypothesis statistical test. |
p_cont_para |
Character. Continuous variable parametric test. One of either "aov" (analysis of variance) or "t.test" for Welch two sample t-test. Note continuous non-parametric test is always Kruskal Wallis (kruskal.test) which in two-group setting is equivalent to Mann-Whitney U /Wilcoxon rank sum test. For continous dependent and continuous explanatory, the parametric test p-value returned is for the Pearson correlation coefficient. The non-parametric equivalent is for the p-value for the Spearman correlation coefficient. |
p_cat |
Character. Categorical variable test. One of either "chisq" or "fisher". |
column |
Logical: Compute margins by column rather than row. |
total_col |
Logical: include a total column summing across factor levels. |
orderbytotal |
Logical: order final table by total column high to low. |
digits |
Number of digits to round to (1) mean/median, (2) standard deviation / interquartile range, (3) p-value, (4) count percentage, (5) weighted count. |
na_include |
Logical: make explanatory variables missing data explicit
( |
na_include_dependent |
Logical: make dependent variable missing data explicit. |
na_complete_cases |
Logical: include only rows with complete data. |
na_to_p |
Logical: include missing as group in statistical test. |
na_to_prop |
Logical: include missing in calculation of column proportions. |
fit_id |
Logical: allows merging via |
add_dependent_label |
Add the name of the dependent label to the top left of table. |
dependent_label_prefix |
Add text before dependent label. |
dependent_label_suffix |
Add text after dependent label. |
add_col_totals |
Logical. Include column total n. |
include_col_totals_percent |
Include column percentage of total. |
col_totals_rowname |
Logical. Row name for column totals. |
col_totals_prefix |
Character. Prefix to column totals, e.g. "N=". |
add_row_totals |
Logical. Include row totals. Note this differs from
|
include_row_totals_percent |
Include row percentage of total. |
include_row_missing_col |
Logical. Include missing data total for each
row. Only used when |
row_totals_colname |
Character. Column name for row totals. |
row_missing_colname |
Character. Column name for missing data totals for each row. |
catTest |
Deprecated. See |
weights |
Character vector of length 1: name of column to use for weights. Explanatory continuous variables are multiplied by weights. Explanatory categorical variables are counted with a frequency weight (sum(weights)). |
This function aims to produce publication-ready summary tables for categorical or continuous dependent variables. It usually takes a categorical dependent variable to produce a cross table of counts and proportions expressed as percentages or summarised continuous explanatory variables. However, it will take a continuous dependent variable to produce mean (standard deviation) or median (interquartile range) for use with linear regression models.
Returns a factorlist
dataframe.
fit2df
ff_column_totals
ff_row_totals
ff_label
ff_glimpse
ff_percent_only
. For lots of examples, see https://finalfit.org/
library(finalfit) library(dplyr) # Load example dataset, modified version of survival::colon data(colon_s) # Table 1 - Patient demographics ---- explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor") dependent = "perfor.factor" colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE) # summary.factorlist() is also commonly used to summarise any number of # variables by an outcome variable (say dead yes/no). # Table 2 - 5 yr mortality ---- explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% summary_factorlist(dependent, explanatory)
library(finalfit) library(dplyr) # Load example dataset, modified version of survival::colon data(colon_s) # Table 1 - Patient demographics ---- explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor") dependent = "perfor.factor" colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE) # summary.factorlist() is also commonly used to summarise any number of # variables by an outcome variable (say dead yes/no). # Table 2 - 5 yr mortality ---- explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% summary_factorlist(dependent, explanatory)
A function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a summary table.
summary_factorlist_stratified( .data, ..., split, colname_sep = "|", level_max_length = 10, n_common_cols = 2 )
summary_factorlist_stratified( .data, ..., split, colname_sep = "|", level_max_length = 10, n_common_cols = 2 )
.data |
Dataframe. |
... |
Arguments to |
split |
Quoted variable name to stratify columns by. |
colname_sep |
Separator for creation of new column name. |
level_max_length |
Maximum name for each factor level contributing to column name. |
n_common_cols |
Number of common columns in |
This function aims to produce publication-ready summary tables for
categorical or continuous dependent variables. It usually takes a categorical
dependent variable to produce a cross table of counts and proportions
expressed as percentages or summarised continuous explanatory variables.
However, it will take a continuous dependent variable to produce mean
(standard deviation) or median (interquartile range) for use with linear
regression models.
Stratify a summary_factorlist
table (beta testing)
Dataframe.
# Table 1 - Perforation status stratified by sex ---- explanatory = c("age", "obstruct.factor") dependent = "perfor.factor" # Single split colon_s %>% summary_factorlist_stratified(dependent, explanatory, split = c("sex.factor")) # Double split colon_s %>% summary_factorlist_stratified(dependent, explanatory, split = c("sex.factor", "age.factor"))
# Table 1 - Perforation status stratified by sex ---- explanatory = c("age", "obstruct.factor") dependent = "perfor.factor" # Single split colon_s %>% summary_factorlist_stratified(dependent, explanatory, split = c("sex.factor")) # Double split colon_s %>% summary_factorlist_stratified(dependent, explanatory, split = c("sex.factor", "age.factor"))
Produce a survival curve plot and number-at-risk table using survminer::ggsurvplot
and finalfit
conventions.
surv_plot(.data, dependent, explanatory, ...)
surv_plot(.data, dependent, explanatory, ...)
.data |
Dataframe. |
dependent |
Character vector of length 1: Survival object of the form |
explanatory |
Character vector of max length 2: quoted name(s) of explanatory variables. |
... |
Arguments passed to |
Returns a table and plot produced in ggplot2
.
Other finalfit plot functions:
coefficient_plot()
,
ff_plot()
,
hr_plot()
,
or_plot()
library(finalfit) library(dplyr) # Survival plot data(colon_s) explanatory = c("perfor.factor") dependent = "Surv(time, status)" colon_s %>% surv_plot(dependent, explanatory, xlab="Time (days)", pval=TRUE, legend="none")
library(finalfit) library(dplyr) # Survival plot data(colon_s) explanatory = c("perfor.factor") dependent = "Surv(time, status)" colon_s %>% surv_plot(dependent, explanatory, xlab="Time (days)", pval=TRUE, legend="none")
Wrapper for svyglm
. Fit a generalised linear model to
data from a complex survey design, with inverse-probability weighting and
design-based standard errors.
svyglmmulti(design, dependent, explanatory, ...)
svyglmmulti(design, dependent, explanatory, ...)
design |
Survey design. |
dependent |
Character vector of length 1: name of depdendent variable (must have 2 levels). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
... |
Other arguments to be passed to |
A list of univariable fitted model outputs. Output is of class
svyglmlist
.
Other finalfit model wrappers:
coxphmulti()
,
coxphuni()
,
crrmulti()
,
crruni()
,
glmmixed()
,
glmmulti_boot()
,
glmmulti()
,
glmuni()
,
lmmixed()
,
lmmulti()
,
lmuni()
,
svyglmuni()
# Examples taken from survey::svyglm() help page. library(survey) library(dplyr) data(api) dependent = "api00" explanatory = c("ell", "meals", "mobility") library(survey) library(dplyr) data(api) apistrat = apistrat %>% mutate( api00 = ff_label(api00, "API in 2000 (api00)"), ell = ff_label(ell, "English language learners (percent)(ell)"), meals = ff_label(meals, "Meals eligible (percent)(meals)"), mobility = ff_label(mobility, "First year at the school (percent)(mobility)"), sch.wide = ff_label(sch.wide, "School-wide target met (sch.wide)") ) # Linear example dependent = "api00" explanatory = c("ell", "meals", "mobility") # Stratified design dstrat = svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) # Univariable fit fit_uni = dstrat %>% svyglmuni(dependent, explanatory) %>% fit2df(estimate_suffix = " (univariable)") # Multivariable fit fit_multi = dstrat %>% svyglmmulti(dependent, explanatory) %>% fit2df(estimate_suffix = " (multivariable)") # Pipe together apistrat %>% summary_factorlist(dependent, explanatory, fit_id = TRUE) %>% ff_merge(fit_uni) %>% ff_merge(fit_multi) %>% select(-fit_id, -index) %>% dependent_label(apistrat, dependent) # Binomial example ## Note model family needs specified and exponentiation if desired dependent = "sch.wide" explanatory = c("ell", "meals", "mobility") # Univariable fit fit_uni = dstrat %>% svyglmuni(dependent, explanatory, family = "quasibinomial") %>% fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (univariable)") # Multivariable fit fit_multi = dstrat %>% svyglmmulti(dependent, explanatory, family = "quasibinomial") %>% fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (multivariable)") # Pipe together apistrat %>% summary_factorlist(dependent, explanatory, fit_id = TRUE) %>% ff_merge(fit_uni) %>% ff_merge(fit_multi) %>% select(-fit_id, -index) %>% dependent_label(apistrat, dependent)
# Examples taken from survey::svyglm() help page. library(survey) library(dplyr) data(api) dependent = "api00" explanatory = c("ell", "meals", "mobility") library(survey) library(dplyr) data(api) apistrat = apistrat %>% mutate( api00 = ff_label(api00, "API in 2000 (api00)"), ell = ff_label(ell, "English language learners (percent)(ell)"), meals = ff_label(meals, "Meals eligible (percent)(meals)"), mobility = ff_label(mobility, "First year at the school (percent)(mobility)"), sch.wide = ff_label(sch.wide, "School-wide target met (sch.wide)") ) # Linear example dependent = "api00" explanatory = c("ell", "meals", "mobility") # Stratified design dstrat = svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) # Univariable fit fit_uni = dstrat %>% svyglmuni(dependent, explanatory) %>% fit2df(estimate_suffix = " (univariable)") # Multivariable fit fit_multi = dstrat %>% svyglmmulti(dependent, explanatory) %>% fit2df(estimate_suffix = " (multivariable)") # Pipe together apistrat %>% summary_factorlist(dependent, explanatory, fit_id = TRUE) %>% ff_merge(fit_uni) %>% ff_merge(fit_multi) %>% select(-fit_id, -index) %>% dependent_label(apistrat, dependent) # Binomial example ## Note model family needs specified and exponentiation if desired dependent = "sch.wide" explanatory = c("ell", "meals", "mobility") # Univariable fit fit_uni = dstrat %>% svyglmuni(dependent, explanatory, family = "quasibinomial") %>% fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (univariable)") # Multivariable fit fit_multi = dstrat %>% svyglmmulti(dependent, explanatory, family = "quasibinomial") %>% fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (multivariable)") # Pipe together apistrat %>% summary_factorlist(dependent, explanatory, fit_id = TRUE) %>% ff_merge(fit_uni) %>% ff_merge(fit_multi) %>% select(-fit_id, -index) %>% dependent_label(apistrat, dependent)
Wrapper for svyglm
. Fit a generalised linear model to
data from a complex survey design, with inverse-probability weighting and
design-based standard errors.
svyglmuni(design, dependent, explanatory, ...)
svyglmuni(design, dependent, explanatory, ...)
design |
Survey design. |
dependent |
Character vector of length 1: name of depdendent variable (must have 2 levels). |
explanatory |
Character vector of any length: name(s) of explanatory variables. |
... |
Other arguments to be passed to |
A list of univariable fitted model outputs. Output is of class
svyglmlist
.
Other finalfit model wrappers:
coxphmulti()
,
coxphuni()
,
crrmulti()
,
crruni()
,
glmmixed()
,
glmmulti_boot()
,
glmmulti()
,
glmuni()
,
lmmixed()
,
lmmulti()
,
lmuni()
,
svyglmmulti()
# Examples taken from survey::svyglm() help page. library(survey) library(dplyr) data(api) dependent = "api00" explanatory = c("ell", "meals", "mobility") library(survey) library(dplyr) data(api) apistrat = apistrat %>% mutate( api00 = ff_label(api00, "API in 2000 (api00)"), ell = ff_label(ell, "English language learners (percent)(ell)"), meals = ff_label(meals, "Meals eligible (percent)(meals)"), mobility = ff_label(mobility, "First year at the school (percent)(mobility)"), sch.wide = ff_label(sch.wide, "School-wide target met (sch.wide)") ) # Linear example dependent = "api00" explanatory = c("ell", "meals", "mobility") # Stratified design dstrat = svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) # Univariable fit fit_uni = dstrat %>% svyglmuni(dependent, explanatory) %>% fit2df(estimate_suffix = " (univariable)") # Multivariable fit fit_multi = dstrat %>% svyglmmulti(dependent, explanatory) %>% fit2df(estimate_suffix = " (multivariable)") # Pipe together apistrat %>% summary_factorlist(dependent, explanatory, fit_id = TRUE) %>% ff_merge(fit_uni) %>% ff_merge(fit_multi) %>% select(-fit_id, -index) %>% dependent_label(apistrat, dependent) # Binomial example ## Note model family needs specified and exponentiation if desired dependent = "sch.wide" explanatory = c("ell", "meals", "mobility") # Univariable fit fit_uni = dstrat %>% svyglmuni(dependent, explanatory, family = "quasibinomial") %>% fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (univariable)") # Multivariable fit fit_multi = dstrat %>% svyglmmulti(dependent, explanatory, family = "quasibinomial") %>% fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (multivariable)") # Pipe together apistrat %>% summary_factorlist(dependent, explanatory, fit_id = TRUE) %>% ff_merge(fit_uni) %>% ff_merge(fit_multi) %>% select(-fit_id, -index) %>% dependent_label(apistrat, dependent)
# Examples taken from survey::svyglm() help page. library(survey) library(dplyr) data(api) dependent = "api00" explanatory = c("ell", "meals", "mobility") library(survey) library(dplyr) data(api) apistrat = apistrat %>% mutate( api00 = ff_label(api00, "API in 2000 (api00)"), ell = ff_label(ell, "English language learners (percent)(ell)"), meals = ff_label(meals, "Meals eligible (percent)(meals)"), mobility = ff_label(mobility, "First year at the school (percent)(mobility)"), sch.wide = ff_label(sch.wide, "School-wide target met (sch.wide)") ) # Linear example dependent = "api00" explanatory = c("ell", "meals", "mobility") # Stratified design dstrat = svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) # Univariable fit fit_uni = dstrat %>% svyglmuni(dependent, explanatory) %>% fit2df(estimate_suffix = " (univariable)") # Multivariable fit fit_multi = dstrat %>% svyglmmulti(dependent, explanatory) %>% fit2df(estimate_suffix = " (multivariable)") # Pipe together apistrat %>% summary_factorlist(dependent, explanatory, fit_id = TRUE) %>% ff_merge(fit_uni) %>% ff_merge(fit_multi) %>% select(-fit_id, -index) %>% dependent_label(apistrat, dependent) # Binomial example ## Note model family needs specified and exponentiation if desired dependent = "sch.wide" explanatory = c("ell", "meals", "mobility") # Univariable fit fit_uni = dstrat %>% svyglmuni(dependent, explanatory, family = "quasibinomial") %>% fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (univariable)") # Multivariable fit fit_multi = dstrat %>% svyglmmulti(dependent, explanatory, family = "quasibinomial") %>% fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (multivariable)") # Pipe together apistrat %>% summary_factorlist(dependent, explanatory, fit_id = TRUE) %>% ff_merge(fit_uni) %>% ff_merge(fit_multi) %>% select(-fit_id, -index) %>% dependent_label(apistrat, dependent)
3154 healthy young men aged 39-59 from the San Francisco area were assessed for their personality type. All were free from coronary heart disease at the start of the research. Eight and a half years later change in this situation was recorded.
data(wcgs)
data(wcgs)
A data frame with 3154 observations on the following 13 variables.
id
Subject ID
age
Age: age in years
height
Height: height in inches
weight
Weight: weight in pounds
sbp
Systolic blood pressure: mmHg
dbp
Diastolic blood pressure: mmHg
chol
Cholesterol: mg/100 ml
personality
Personality type/Behavior pattern: a factor with
levels A1
, A2
, B3
, B4
personality_2L
Dichotomous personality type / behavior
pattern: A
= aggressive; B
= passive
ncigs
Smoking: Cigarettes/day
smoking
Smoking: No
,
Yes
arcus
Corneal arcus: No
, Yes
chd
Coronary heart disease event: No
Yes
typechd
coronary heart disease is a factor with levels
No
, MI_SD
(MI or sudden death), Silent_MI
,
Angina
timechd
Observation (follow up) time: Days
The WCGS began in 1960 with 3,524 male volunteers who were employed by 11 California companies. Subjects were 39 to 59 years old and free of heart disease as determined by electrocardiogram. After the initial screening, the study population dropped to 3,154 and the number of companies to 10 because of various exclusions. The cohort comprised both blue- and white-collar employees. At baseline the following information was collected: socio-demographic including age, education, marital status, income, occupation; physical and physiological including height, weight, blood pressure, electrocardiogram, and corneal arcus; biochemical including cholesterol and lipoprotein fractions; medical and family history and use of medications; behavioral data including Type A interview, smoking, exercise, and alcohol use. Later surveys added data on anthropometry, triglycerides, Jenkins Activity Survey, and caffeine use. Average follow-up continued for 8.5 years with repeat examinations
Statistics for Epidemiology by N. Jewell (2004)
Coronary Heart Disease in the Western Collaborative Group Study Final Follow-up Experience of 8 1/2 Years Ray H. Rosenman, MD; Richard J. Brand, PhD; C. David Jenkins, PhD; Meyer Friedman, MD; Reuben Straus, MD; Moses Wurm, MD JAMA. 1975;233(8):872-877. doi:10.1001/jama.1975.03260080034016.