Methods to reliably use dplyr on remote data sources in R (SQL databases, Spark 2.0.0 and above) in a generic fashion.

Details

replyr is going into maintenance mode. It has been hard to track shifting dplyr/dbplyr/rlang APIs and data structures post dplyr 0.5. Most of what it does is now done better in one of the newer non-monolithic packages:

replyr helps with the following:

  • Summarizing remote data (via replyr_summarize).

  • Facilitating writing "source generic" code that works similarly on multiple 'dplyr' data sources.

  • Providing big data versions of functions for splitting data, binding rows, pivoting, adding row-ids, ranking, and completing experimental designs.

  • Packaging common data manipulation tasks into operators such as the gapply function.

  • Providing support code for common SparklyR tasks, such as tracking temporary handle IDs.

replyr is in maintenance mode. Better version of the functionality have been ported to the following packages: wrapr, cdata, rquery, and seplyr.

To learn more about replyr, please start with the vignette: vignette('replyr','replyr')