Fit a stats::glm without carying back large structures.

Please see https://win-vector.com/2014/05/30/trimming-the-fat-from-glm-models-in-r/ for discussion.

clean_fit_glm(
  outcome,
  variables,
  data,
  ...,
  family,
  intercept = TRUE,
  outcome_target = NULL,
  outcome_comparator = "==",
  weights = NULL,
  env = baseenv()
)

Arguments

outcome	character, name of outcome column.
variables	character, names of varaible columns.
data	data.frame, training data.
...	not used, force later arguments to be used by name
family	passed to stats::glm()
intercept	logical, if TRUE allow an intercept term.
outcome_target	scalar, if not NULL write outcome==outcome_target in formula.
outcome_comparator	one of "==", "!=", ">=", "<=", ">", "<", only use of outcome_target is not NULL.
weights	passed to stats::glm()
env	environment to work in.

Value

list(model=model, summary=summary)

Examples


mk_data_example <- function(k) {
  data.frame(
    x1 = rep(c("a", "a", "b", "b"), k),
    x2 = rep(c(0, 0, 0, 1), k),
    y = rep(1:4, k),
    yC = rep(c(FALSE, TRUE, TRUE, TRUE), k),
    stringsAsFactors = FALSE)
}

res_glm <- clean_fit_glm("yC", c("x1", "x2"),
                         mk_data_example(1),
                         family = binomial)
length(serialize(res_glm$model, NULL))
#> [1] 33777

res_glm <- clean_fit_glm("yC", c("x1", "x2"),
                         mk_data_example(10000),
                         family = binomial)
length(serialize(res_glm$model, NULL))
#> [1] 33777

predict(res_glm$model,
        newdata = mk_data_example(1),
        type = "response")
#>   1   2   3   4 
#> 0.5 0.5 1.0 1.0