Safely construct a simple Wilkinson notation formula from the outcome (dependent variable) name and vector of input (independent variable) names.
mk_formula( outcome, variables, ..., intercept = TRUE, outcome_target = NULL, outcome_comparator = "==", env = baseenv(), extra_values = NULL, as_character = FALSE )
outcome | character scalar, name of outcome or dependent variable. |
---|---|
variables | character vector, names of input or independent variables. |
... | not used, force later arguments to bind by name. |
intercept | logical, if TRUE allow an intercept term. |
outcome_target | scalar, if not NULL write outcome==outcome_target in formula. |
outcome_comparator | one of "==", "!=", ">=", "<=", ">", "<", only use of outcome_target is not NULL. |
env | environment to use in formula (unless extra_values is non empty, then this is a parent environemnt). |
extra_values | if not empty extra values to be added to a new formula environment containing env. |
as_character | if TRUE return formula as a character string. |
a formula object
Note: outcome and variables
are each intended to be simple variable names or column names (or .). They are not
intended to specify
interactions, I()-terms, transforms, general experessions or other complex formula terms.
Essentially the same effect as reformulate
, but trying to avoid the
paste
currently in reformulate
by calling update.formula
(which appears to work over terms).
Another reasonable way to do this is just paste(outcome, paste(variables, collapse = " + "), sep = " ~ ")
.
Care must be taken with later arguments to functions like lm()
whose help states:
"All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula."
Also note env
defaults to baseenv()
to try and minimize refence leaks produced by the environemnt
captured by the formal ending up stored in the resulting model for lm()
and glm()
. For
behavior closer to as.formula()
please set the env
argument to parent.frame()
.
#> mpg ~ cyl + disp #> <environment: base>#> #> Call: #> lm(formula = f, data = mtcars) #> #> Coefficients: #> (Intercept) cyl disp #> 34.66099 -1.58728 -0.02058 #>#> [1] "mpg ~ cyl + disp"#> (cyl >= 8) ~ wt + gear #> <environment: base>#> #> Call: glm(formula = f, family = binomial, data = mtcars) #> #> Coefficients: #> (Intercept) wt gear #> -33.9388 9.3992 0.5893 #> #> Degrees of Freedom: 31 Total (i.e. Null); 29 Residual #> Null Deviance: 43.86 #> Residual Deviance: 14.21 AIC: 20.21#> [1] "(cyl >= 8) ~ wt + gear"