k-fold cross validation stratified on y, a splitFunction in the sense of vtreat::buildEvalSets

kWayStratifiedY(nRows, nSplits, dframe, y)

Arguments

nRows

number of rows to split (>1)

nSplits

number of groups to split into (<nRows,>1).

dframe

original data frame (ignored).

y

numeric outcome variable try to have equidistributed in each split.

Value

split plan

Examples

set.seed(23255) d <- data.frame(y=sin(1:100)) pStrat <- kWayStratifiedY(nrow(d),5,d,d$y) problemAppPlan(nrow(d),5,pStrat,TRUE)
#> NULL
d$stratGroup <- vtreat::getSplitPlanAppLabels(nrow(d),pStrat) pSimple <- kWayCrossValidation(nrow(d),5,d,d$y) problemAppPlan(nrow(d),5,pSimple,TRUE)
#> NULL
d$simpleGroup <- vtreat::getSplitPlanAppLabels(nrow(d),pSimple) summary(tapply(d$y,d$simpleGroup,mean))
#> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.106875 -0.059623 -0.007774 -0.001272 0.068139 0.099774
summary(tapply(d$y,d$stratGroup,mean))
#> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.011531 -0.010448 -0.002826 -0.001272 0.008797 0.009649