In an earlier note we exhibited a non-signalling result corruption in dplyr 0.7.4. In this note we demonstrate the seplyr work-around.
Re-establish up our example:
packageVersion("dplyr")
## [1] '0.7.4'
my_db <- DBI::dbConnect(RSQLite::SQLite(),
":memory:")
d <- dplyr::copy_to(
my_db,
data.frame(
valuesA = c("A", NA, NA),
valuesB = c("B", NA, NA),
canUseFix1 = c(TRUE, TRUE, FALSE),
fix1 = c('Fix_1_V1', "Fix_1_V2", "Fix_1_V3"),
canUseFix2 = c(FALSE, FALSE, TRUE),
fix2 = c('Fix_2_V1', "Fix_2_V2", "Fix_2_V3"),
stringsAsFactors = FALSE),
'd',
temporary = TRUE, overwrite = TRUE)
knitr::kable(dplyr::collect(d))
| valuesA | valuesB | canUseFix1 | fix1 | canUseFix2 | fix2 |
|---|---|---|---|---|---|
| A | B | 1 | Fix_1_V1 | 0 | Fix_2_V1 |
| NA | NA | 1 | Fix_1_V2 | 0 | Fix_2_V2 |
| NA | NA | 0 | Fix_1_V3 | 1 | Fix_2_V3 |
seplyr has a fix/work-around for the earlier issue: automatically break up the steps into safe blocks (announcement; here we are using the development seplyr 0.5.1 version of mutate_se()).
library("seplyr")
## Loading required package: wrapr
packageVersion("seplyr")
## [1] '0.5.1'
d %.>%
mutate_se(
.,
qae(valuesA := ifelse(is.na(valuesA) & canUseFix1,
fix1, valuesA),
valuesA := ifelse(is.na(valuesA) & canUseFix2,
fix2, valuesA),
valuesB := ifelse(is.na(valuesB) & canUseFix1,
fix1, valuesB),
valuesB := ifelse(is.na(valuesB) & canUseFix2,
fix2, valuesB)),
printPlan = TRUE) %.>%
select_se(., c("valuesA", "valuesB")) %.>%
dplyr::collect(.) %.>%
knitr::kable(.)
## $group00001
## valuesA
## "ifelse(is.na(valuesA) & canUseFix1, fix1, valuesA)"
## valuesB
## "ifelse(is.na(valuesB) & canUseFix1, fix1, valuesB)"
##
## $group00002
## valuesA
## "ifelse(is.na(valuesA) & canUseFix2, fix2, valuesA)"
## valuesB
## "ifelse(is.na(valuesB) & canUseFix2, fix2, valuesB)"
| valuesA | valuesB |
|---|---|
| A | B |
| Fix_1_V2 | Fix_1_V2 |
| Fix_2_V3 | Fix_2_V3 |
We now have a correct result (all cells filled).
seplyr used safe statement re-ordering to break the calculation into the minimum number of blocks/groups that have no in-block dependencies between statements (note this is more efficient that merely introducing a new mutate each first time a new value is used).
We can slow that down and see how the underlying planning functions break the assignments down into a small number of safe blocks (here we are using the development wrapr 1.0.2 function qae()).
packageVersion("wrapr")
## [1] '1.0.3'
steps <- qae(
valuesA := ifelse(is.na(valuesA) & canUseFix1,
fix1, valuesA),
valuesA := ifelse(is.na(valuesA) & canUseFix2,
fix2, valuesA),
valuesB := ifelse(is.na(valuesB) & canUseFix1,
fix1, valuesB),
valuesB := ifelse(is.na(valuesB) & canUseFix2,
fix2, valuesB))
print(steps)
## $valuesA
## [1] "ifelse(is.na(valuesA) & canUseFix1, fix1, valuesA)"
##
## $valuesA
## [1] "ifelse(is.na(valuesA) & canUseFix2, fix2, valuesA)"
##
## $valuesB
## [1] "ifelse(is.na(valuesB) & canUseFix1, fix1, valuesB)"
##
## $valuesB
## [1] "ifelse(is.na(valuesB) & canUseFix2, fix2, valuesB)"
plan <- partition_mutate_se(steps)
print(plan)
## $group00001
## valuesA
## "ifelse(is.na(valuesA) & canUseFix1, fix1, valuesA)"
## valuesB
## "ifelse(is.na(valuesB) & canUseFix1, fix1, valuesB)"
##
## $group00002
## valuesA
## "ifelse(is.na(valuesA) & canUseFix2, fix2, valuesA)"
## valuesB
## "ifelse(is.na(valuesB) & canUseFix2, fix2, valuesB)"
d %.>%
mutate_seb(., plan) %.>%
select_se(., c("valuesA", "valuesB")) %.>%
dplyr::collect(.) %.>%
knitr::kable(.)
| valuesA | valuesB |
|---|---|
| A | B |
| Fix_1_V2 | Fix_1_V2 |
| Fix_2_V3 | Fix_2_V3 |
Note that the current CRAN versions of wrapr and seplyr already implement the above work-around. Just some of the conveniences such as printPlan = TRUE and qae() require the development versions of these packages.