In [an earlier note](https://github.com/WinVector/Examples/blob/master/dplyr/Dependencies.md) we exhibited a non-signalling result corruption in `dplyr` `0.7.4`. In this note we demonstrate the [`seplyr`](https://winvector.github.io/seplyr/) work-around. Re-establish up our example: ```{r setup} packageVersion("dplyr") my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") d <- dplyr::copy_to( my_db, data.frame( valuesA = c("A", NA, NA), valuesB = c("B", NA, NA), canUseFix1 = c(TRUE, TRUE, FALSE), fix1 = c('Fix_1_V1', "Fix_1_V2", "Fix_1_V3"), canUseFix2 = c(FALSE, FALSE, TRUE), fix2 = c('Fix_2_V1', "Fix_2_V2", "Fix_2_V3"), stringsAsFactors = FALSE), 'd', temporary = TRUE, overwrite = TRUE) knitr::kable(dplyr::collect(d)) ``` [`seplyr`](https://winvector.github.io/seplyr/) has a fix/work-around for the earlier issue: automatically break up the steps into safe blocks ([announcement](http://www.win-vector.com/blog/2017/11/win-vector-llc-announces-new-big-data-in-r-tools/); here we are using the development [`seplyr`](https://winvector.github.io/seplyr/) `0.5.1` version of [`mutate_se()`](https://winvector.github.io/seplyr/reference/mutate_se.html)). ```{r goodresult2} library("seplyr") packageVersion("seplyr") d %.>% mutate_se( ., qae(valuesA := ifelse(is.na(valuesA) & canUseFix1, fix1, valuesA), valuesA := ifelse(is.na(valuesA) & canUseFix2, fix2, valuesA), valuesB := ifelse(is.na(valuesB) & canUseFix1, fix1, valuesB), valuesB := ifelse(is.na(valuesB) & canUseFix2, fix2, valuesB)), printPlan = TRUE) %.>% select_se(., c("valuesA", "valuesB")) %.>% dplyr::collect(.) %.>% knitr::kable(.) ``` We now have a correct result (all cells filled). `seplyr` used safe statement re-ordering to break the calculation into the minimum number of blocks/groups that have no in-block dependencies between statements (note this is more efficient that merely introducing a new mutate each first time a new value is used). We can slow that down and see how the underlying planning functions break the assignments down into a small number of safe blocks (here we are using the development [`wrapr`](https://winvector.github.io/wrapr/) `1.0.2` function [`qae()`](https://winvector.github.io/wrapr/reference/qae.html)). ```{r goodresult2s} packageVersion("wrapr") steps <- qae( valuesA := ifelse(is.na(valuesA) & canUseFix1, fix1, valuesA), valuesA := ifelse(is.na(valuesA) & canUseFix2, fix2, valuesA), valuesB := ifelse(is.na(valuesB) & canUseFix1, fix1, valuesB), valuesB := ifelse(is.na(valuesB) & canUseFix2, fix2, valuesB)) print(steps) plan <- partition_mutate_se(steps) print(plan) d %.>% mutate_seb(., plan) %.>% select_se(., c("valuesA", "valuesB")) %.>% dplyr::collect(.) %.>% knitr::kable(.) ``` Note that the current `CRAN` versions of [`wrapr`](https://CRAN.R-project.org/package=wrapr) and [`seplyr`](https://CRAN.R-project.org/package=seplyr) *already* implement the above work-around. Just some of the conveniences such as `printPlan = TRUE` and `qae()` require the development versions of these packages.