Safely construct a simple Wilkinson notation formula from the outcome (dependent variable) name and vector of input (independent variable) names.

mk_formula(
  outcome,
  variables,
  ...,
  intercept = TRUE,
  outcome_target = NULL,
  outcome_comparator = "==",
  env = baseenv(),
  extra_values = NULL,
  as_character = FALSE
)

Arguments

outcome

character scalar, name of outcome or dependent variable.

variables

character vector, names of input or independent variables.

...

not used, force later arguments to bind by name.

intercept

logical, if TRUE allow an intercept term.

outcome_target

scalar, if not NULL write outcome==outcome_target in formula.

outcome_comparator

one of "==", "!=", ">=", "<=", ">", "<", only use of outcome_target is not NULL.

env

environment to use in formula (unless extra_values is non empty, then this is a parent environemnt).

extra_values

if not empty extra values to be added to a new formula environment containing env.

as_character

if TRUE return formula as a character string.

Value

a formula object

Details

Note: outcome and variables are each intended to be simple variable names or column names (or .). They are not intended to specify interactions, I()-terms, transforms, general experessions or other complex formula terms. Essentially the same effect as reformulate, but trying to avoid the paste currently in reformulate by calling update.formula (which appears to work over terms). Another reasonable way to do this is just paste(outcome, paste(variables, collapse = " + "), sep = " ~ ").

Care must be taken with later arguments to functions like lm() whose help states: "All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula." Also note env defaults to baseenv() to try and minimize refence leaks produced by the environemnt captured by the formal ending up stored in the resulting model for lm() and glm(). For behavior closer to as.formula() please set the env argument to parent.frame().

See also

Examples

f <- mk_formula("mpg", c("cyl", "disp")) print(f)
#> mpg ~ cyl + disp #> <environment: base>
(model <- lm(f, mtcars))
#> #> Call: #> lm(formula = f, data = mtcars) #> #> Coefficients: #> (Intercept) cyl disp #> 34.66099 -1.58728 -0.02058 #>
format(model$terms)
#> [1] "mpg ~ cyl + disp"
f <- mk_formula("cyl", c("wt", "gear"), outcome_target = 8, outcome_comparator = ">=") print(f)
#> (cyl >= 8) ~ wt + gear #> <environment: base>
(model <- glm(f, mtcars, family = binomial))
#> #> Call: glm(formula = f, family = binomial, data = mtcars) #> #> Coefficients: #> (Intercept) wt gear #> -33.9388 9.3992 0.5893 #> #> Degrees of Freedom: 31 Total (i.e. Null); 29 Residual #> Null Deviance: 43.86 #> Residual Deviance: 14.21 AIC: 20.21
format(model$terms)
#> [1] "(cyl >= 8) ~ wt + gear"