Safely construct a simple Wilkinson notation formula from the outcome (dependent variable) name and vector of input (independent variable) names.

  intercept = TRUE,
  outcome_target = NULL,
  outcome_comparator = "==",
  env = baseenv(),
  extra_values = NULL,
  as_character = FALSE



character scalar, name of outcome or dependent variable.


character vector, names of input or independent variables.


not used, force later arguments to bind by name.


logical, if TRUE allow an intercept term.


scalar, if not NULL write outcome==outcome_target in formula.


one of "==", "!=", ">=", "<=", ">", "<", only use of outcome_target is not NULL.


environment to use in formula (unless extra_values is non empty, then this is a parent environemnt).


if not empty extra values to be added to a new formula environment containing env.


if TRUE return formula as a character string.


a formula object


Note: outcome and variables are each intended to be simple variable names or column names (or .). They are not intended to specify interactions, I()-terms, transforms, general experessions or other complex formula terms. Essentially the same effect as reformulate, but trying to avoid the paste currently in reformulate by calling update.formula (which appears to work over terms). Another reasonable way to do this is just paste(outcome, paste(variables, collapse = " + "), sep = " ~ ").

Care must be taken with later arguments to functions like lm() whose help states: "All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula." Also note env defaults to baseenv() to try and minimize refence leaks produced by the environemnt captured by the formal ending up stored in the resulting model for lm() and glm(). For behavior closer to as.formula() please set the env argument to parent.frame().

See also


f <- mk_formula("mpg", c("cyl", "disp")) print(f)
#> mpg ~ cyl + disp #> <environment: base>
(model <- lm(f, mtcars))
#> #> Call: #> lm(formula = f, data = mtcars) #> #> Coefficients: #> (Intercept) cyl disp #> 34.66099 -1.58728 -0.02058 #>
#> [1] "mpg ~ cyl + disp"
f <- mk_formula("cyl", c("wt", "gear"), outcome_target = 8, outcome_comparator = ">=") print(f)
#> (cyl >= 8) ~ wt + gear #> <environment: base>
(model <- glm(f, mtcars, family = binomial))
#> #> Call: glm(formula = f, family = binomial, data = mtcars) #> #> Coefficients: #> (Intercept) wt gear #> -33.9388 9.3992 0.5893 #> #> Degrees of Freedom: 31 Total (i.e. Null); 29 Residual #> Null Deviance: 43.86 #> Residual Deviance: 14.21 AIC: 20.21
#> [1] "(cyl >= 8) ~ wt + gear"