In an earlier note we exhibited a non-signalling result corruption in dplyr
0.7.4
. In this note we demonstrate the seplyr
work-around.
Re-establish up our example:
packageVersion("dplyr")
## [1] '0.7.4'
my_db <- DBI::dbConnect(RSQLite::SQLite(),
":memory:")
d <- dplyr::copy_to(
my_db,
data.frame(
valuesA = c("A", NA, NA),
valuesB = c("B", NA, NA),
canUseFix1 = c(TRUE, TRUE, FALSE),
fix1 = c('Fix_1_V1', "Fix_1_V2", "Fix_1_V3"),
canUseFix2 = c(FALSE, FALSE, TRUE),
fix2 = c('Fix_2_V1', "Fix_2_V2", "Fix_2_V3"),
stringsAsFactors = FALSE),
'd',
temporary = TRUE, overwrite = TRUE)
knitr::kable(dplyr::collect(d))
valuesA | valuesB | canUseFix1 | fix1 | canUseFix2 | fix2 |
---|---|---|---|---|---|
A | B | 1 | Fix_1_V1 | 0 | Fix_2_V1 |
NA | NA | 1 | Fix_1_V2 | 0 | Fix_2_V2 |
NA | NA | 0 | Fix_1_V3 | 1 | Fix_2_V3 |
seplyr
has a fix/work-around for the earlier issue: automatically break up the steps into safe blocks (announcement; here we are using the development seplyr
0.5.1
version of mutate_se()
).
library("seplyr")
## Loading required package: wrapr
packageVersion("seplyr")
## [1] '0.5.1'
d %.>%
mutate_se(
.,
qae(valuesA := ifelse(is.na(valuesA) & canUseFix1,
fix1, valuesA),
valuesA := ifelse(is.na(valuesA) & canUseFix2,
fix2, valuesA),
valuesB := ifelse(is.na(valuesB) & canUseFix1,
fix1, valuesB),
valuesB := ifelse(is.na(valuesB) & canUseFix2,
fix2, valuesB)),
printPlan = TRUE) %.>%
select_se(., c("valuesA", "valuesB")) %.>%
dplyr::collect(.) %.>%
knitr::kable(.)
## $group00001
## valuesA
## "ifelse(is.na(valuesA) & canUseFix1, fix1, valuesA)"
## valuesB
## "ifelse(is.na(valuesB) & canUseFix1, fix1, valuesB)"
##
## $group00002
## valuesA
## "ifelse(is.na(valuesA) & canUseFix2, fix2, valuesA)"
## valuesB
## "ifelse(is.na(valuesB) & canUseFix2, fix2, valuesB)"
valuesA | valuesB |
---|---|
A | B |
Fix_1_V2 | Fix_1_V2 |
Fix_2_V3 | Fix_2_V3 |
We now have a correct result (all cells filled).
seplyr
used safe statement re-ordering to break the calculation into the minimum number of blocks/groups that have no in-block dependencies between statements (note this is more efficient that merely introducing a new mutate each first time a new value is used).
We can slow that down and see how the underlying planning functions break the assignments down into a small number of safe blocks (here we are using the development wrapr
1.0.2
function qae()
).
packageVersion("wrapr")
## [1] '1.0.3'
steps <- qae(
valuesA := ifelse(is.na(valuesA) & canUseFix1,
fix1, valuesA),
valuesA := ifelse(is.na(valuesA) & canUseFix2,
fix2, valuesA),
valuesB := ifelse(is.na(valuesB) & canUseFix1,
fix1, valuesB),
valuesB := ifelse(is.na(valuesB) & canUseFix2,
fix2, valuesB))
print(steps)
## $valuesA
## [1] "ifelse(is.na(valuesA) & canUseFix1, fix1, valuesA)"
##
## $valuesA
## [1] "ifelse(is.na(valuesA) & canUseFix2, fix2, valuesA)"
##
## $valuesB
## [1] "ifelse(is.na(valuesB) & canUseFix1, fix1, valuesB)"
##
## $valuesB
## [1] "ifelse(is.na(valuesB) & canUseFix2, fix2, valuesB)"
plan <- partition_mutate_se(steps)
print(plan)
## $group00001
## valuesA
## "ifelse(is.na(valuesA) & canUseFix1, fix1, valuesA)"
## valuesB
## "ifelse(is.na(valuesB) & canUseFix1, fix1, valuesB)"
##
## $group00002
## valuesA
## "ifelse(is.na(valuesA) & canUseFix2, fix2, valuesA)"
## valuesB
## "ifelse(is.na(valuesB) & canUseFix2, fix2, valuesB)"
d %.>%
mutate_seb(., plan) %.>%
select_se(., c("valuesA", "valuesB")) %.>%
dplyr::collect(.) %.>%
knitr::kable(.)
valuesA | valuesB |
---|---|
A | B |
Fix_1_V2 | Fix_1_V2 |
Fix_2_V3 | Fix_2_V3 |
Note that the current CRAN
versions of wrapr
and seplyr
already implement the above work-around. Just some of the conveniences such as printPlan = TRUE
and qae()
require the development versions of these packages.