R/outOfSample.R
kWayStratifiedY.Rd
k-fold cross validation stratified on y, a splitFunction in the sense of vtreat::buildEvalSets
kWayStratifiedY(nRows, nSplits, dframe, y)
nRows | number of rows to split (>1) |
---|---|
nSplits | number of groups to split into (<nRows,>1). |
dframe | original data frame (ignored). |
y | numeric outcome variable try to have equidistributed in each split. |
split plan
set.seed(23255) d <- data.frame(y=sin(1:100)) pStrat <- kWayStratifiedY(nrow(d),5,d,d$y) problemAppPlan(nrow(d),5,pStrat,TRUE)#> NULLd$stratGroup <- vtreat::getSplitPlanAppLabels(nrow(d),pStrat) pSimple <- kWayCrossValidation(nrow(d),5,d,d$y) problemAppPlan(nrow(d),5,pSimple,TRUE)#> NULLd$simpleGroup <- vtreat::getSplitPlanAppLabels(nrow(d),pSimple) summary(tapply(d$y,d$simpleGroup,mean))#> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.106875 -0.059623 -0.007774 -0.001272 0.068139 0.099774#> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.011531 -0.010448 -0.002826 -0.001272 0.008797 0.009649