`R/variable_importance.R`

`value_variables_C.Rd`

Value variables for prediction a categorical outcome.

value_variables_C( dframe, varlist, outcomename, outcometarget, ..., weights = c(), minFraction = 0.02, smFactor = 0, rareCount = 0, rareSig = 1, collarProb = 0, scale = FALSE, doCollar = FALSE, splitFunction = NULL, ncross = 3, forceSplit = FALSE, catScaling = TRUE, verbose = FALSE, parallelCluster = NULL, use_parallel = TRUE, customCoders = list(c.PiecewiseV.num = vtreat::solve_piecewisec, n.PiecewiseV.num = vtreat::solve_piecewise, c.knearest.num = vtreat::square_windowc, n.knearest.num = vtreat::square_window), codeRestriction = c("PiecewiseV", "knearest", "clean", "isBAD", "catB", "catP"), missingness_imputation = NULL, imputation_map = NULL )

dframe | Data frame to learn treatments from (training data), must have at least 1 row. |
---|---|

varlist | Names of columns to treat (effective variables). |

outcomename | Name of column holding outcome variable. dframe[[outcomename]] must be only finite non-missing values. |

outcometarget | Value/level of outcome to be considered "success", and there must be a cut such that dframe[[outcomename]]==outcometarget at least twice and dframe[[outcomename]]!=outcometarget at least twice. |

... | no additional arguments, declared to forced named binding of later arguments |

weights | optional training weights for each row |

minFraction | optional minimum frequency a categorical level must have to be converted to an indicator column. |

smFactor | optional smoothing factor for impact coding models. |

rareCount | optional integer, allow levels with this count or below to be pooled into a shared rare-level. Defaults to 0 or off. |

rareSig | optional numeric, suppress levels from pooling at this significance value greater. Defaults to NULL or off. |

collarProb | what fraction of the data (pseudo-probability) to collar data at if doCollar is set during |

scale | optional if TRUE replace numeric variables with regression ("move to outcome-scale"). |

doCollar | optional if TRUE collar numeric variables by cutting off after a tail-probability specified by collarProb during treatment design. |

splitFunction | (optional) see vtreat::buildEvalSets . |

ncross | optional scalar>=2 number of cross-validation rounds to design. |

forceSplit | logical, if TRUE force cross-validated significance calculations on all variables. |

catScaling | optional, if TRUE use glm() linkspace, if FALSE use lm() for scaling. |

verbose | if TRUE print progress. |

parallelCluster | (optional) a cluster object created by package parallel or package snow. |

use_parallel | logical, if TRUE use parallel methods. |

customCoders | additional coders to use for variable importance estimate. |

codeRestriction | codes to restrict to for variable importance estimate. |

missingness_imputation | function of signature f(values: numeric, weights: numeric), simple missing value imputer. |

imputation_map | map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers. |

table of variable valuations