Compute per-column summaries and return as a data.frame. Warning: can be an expensive operation.

replyr_summary(
  x,
  ...,
  countUniqueNum = FALSE,
  countUniqueNonNum = FALSE,
  cols = NULL,
  compute = TRUE
)

Arguments

x

tbl or item that can be coerced into such.

...

force additional arguments to be bound by name.

countUniqueNum

logical, if true include unique non-NA counts for numeric cols.

countUniqueNonNum

logical, if true include unique non-NA counts for non-numeric cols.

cols

if not NULL set of columns to restrict to.

compute

logical if TRUE call compute before working

Value

summary of columns.

Details

Can be slow compared to dplyr::summarize_all() (but serves a different purpose). Also, for numeric columns includes NaN in nna count (as is typical for R, e.g., is.na(NaN)). And note: replyr_summary() currently skips "raw" columns.

See also

Examples

d <- data.frame(p= c(TRUE, FALSE, NA), r= I(list(1,2,3)), s= NA, t= as.raw(3:5), w= 1:3, x= c(NA,2,3), y= factor(c(3,5,NA)), z= c('a',NA,'z'), stringsAsFactors=FALSE) # sc <- sparklyr::spark_connect(version='2.2.0', # master = "local") # dS <- replyr_copy_to(sc, dplyr::select(d, -r, -t), 'dS', # temporary=TRUE, overwrite=TRUE) # replyr_summary(dS) # sparklyr::spark_disconnect(sc) if (requireNamespace("RSQLite", quietly = TRUE)) { my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") RSQLite::initExtension(my_db) dM <- replyr_copy_to(my_db, dplyr::select(d, -r, -t), 'dM', temporary=TRUE, overwrite=TRUE) print(replyr_summary(dM)) DBI::dbDisconnect(my_db) }
#> column index class nrows nna nunique min max mean sd lexmin lexmax #> 1 p 1 integer 3 1 NA 0 1 0.5 0.7071068 <NA> <NA> #> 2 s 2 integer 3 3 NA NA NA NA NA <NA> <NA> #> 3 w 3 integer 3 0 NA 1 3 2.0 1.0000000 <NA> <NA> #> 4 x 4 numeric 3 1 NA 2 3 2.5 0.7071068 <NA> <NA> #> 5 y 5 character 3 1 NA NA NA NA NA 3 5 #> 6 z 6 character 3 1 NA NA NA NA NA a z
d$q <- list(1,2,3) replyr_summary(d)
#> column index class nrows nna nunique min max mean sd lexmin #> 1 p 1 logical 3 1 NA 0 1 0.5 0.7071068 <NA> #> 2 r 2 AsIs 3 NA NA NA NA NA NA <NA> #> 3 s 3 logical 3 3 NA Inf -Inf NaN NA <NA> #> 4 t 4 raw 3 NA NA NA NA NA NA <NA> #> 5 w 5 integer 3 0 NA 1 3 2.0 1.0000000 <NA> #> 6 x 6 numeric 3 1 NA 2 3 2.5 0.7071068 <NA> #> 7 y 7 factor 3 1 NA NA NA NA NA 3 #> 8 z 8 character 3 1 NA NA NA NA NA a #> 9 q 9 list 3 NA NA NA NA NA NA <NA> #> lexmax #> 1 <NA> #> 2 <NA> #> 3 <NA> #> 4 <NA> #> 5 <NA> #> 6 <NA> #> 7 5 #> 8 z #> 9 <NA>