Spark 2* union_all has issues ( https://github.com/WinVector/replyr/blob/master/issues/UnionIssue.md ), and exposed union_all semantics differ from data-source back-end to back-end. This is an attempt to provide a join-based replacement.
replyr_union_all( tabA, tabB, ..., useDplyrLocal = TRUE, useSparkRbind = TRUE, tempNameGenerator = mk_tmp_name_source("replyr_union_all") )
tabA | not-NULL table with at least 1 row. |
---|---|
tabB | not-NULL table with at least 1 row on same data source as tabA and common columns. |
... | force later arguments to be bound by name. |
useDplyrLocal | logical if TRUE use dplyr::bind_rows for local data. |
useSparkRbind | logical if TRUE try to use rbind on Sparklyr data |
tempNameGenerator | temp name generator produced by wrapr::mk_tmp_name_source, used to record dplyr::compute() effects. |
table with all rows of tabA and tabB (union_all).
d1 <- data.frame(x = c('a','b'), y = 1, stringsAsFactors= FALSE) d2 <- data.frame(x = 'c', z = 1, stringsAsFactors= FALSE) replyr_union_all(d1, d2, useDplyrLocal= FALSE)#> y z x #> 1 1 NA a #> 2 1 NA b #> 3 NA 1 c