Compute per-column summaries and return as a data.frame
. Warning: can be an expensive operation.
replyr_summary( x, ..., countUniqueNum = FALSE, countUniqueNonNum = FALSE, cols = NULL, compute = TRUE )
x | tbl or item that can be coerced into such. |
---|---|
... | force additional arguments to be bound by name. |
countUniqueNum | logical, if true include unique non-NA counts for numeric cols. |
countUniqueNonNum | logical, if true include unique non-NA counts for non-numeric cols. |
cols | if not NULL set of columns to restrict to. |
compute | logical if TRUE call compute before working |
summary of columns.
Can be slow compared to dplyr::summarize_all()
(but serves a different purpose).
Also, for numeric columns includes NaN
in nna
count (as is typical for R
, e.g.,
is.na(NaN)
). And note: replyr_summary()
currently skips "raw" columns.
d <- data.frame(p= c(TRUE, FALSE, NA), r= I(list(1,2,3)), s= NA, t= as.raw(3:5), w= 1:3, x= c(NA,2,3), y= factor(c(3,5,NA)), z= c('a',NA,'z'), stringsAsFactors=FALSE) # sc <- sparklyr::spark_connect(version='2.2.0', # master = "local") # dS <- replyr_copy_to(sc, dplyr::select(d, -r, -t), 'dS', # temporary=TRUE, overwrite=TRUE) # replyr_summary(dS) # sparklyr::spark_disconnect(sc) if (requireNamespace("RSQLite", quietly = TRUE)) { my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") RSQLite::initExtension(my_db) dM <- replyr_copy_to(my_db, dplyr::select(d, -r, -t), 'dM', temporary=TRUE, overwrite=TRUE) print(replyr_summary(dM)) DBI::dbDisconnect(my_db) }#> column index class nrows nna nunique min max mean sd lexmin lexmax #> 1 p 1 integer 3 1 NA 0 1 0.5 0.7071068 <NA> <NA> #> 2 s 2 integer 3 3 NA NA NA NA NA <NA> <NA> #> 3 w 3 integer 3 0 NA 1 3 2.0 1.0000000 <NA> <NA> #> 4 x 4 numeric 3 1 NA 2 3 2.5 0.7071068 <NA> <NA> #> 5 y 5 character 3 1 NA NA NA NA NA 3 5 #> 6 z 6 character 3 1 NA NA NA NA NA a z#> column index class nrows nna nunique min max mean sd lexmin #> 1 p 1 logical 3 1 NA 0 1 0.5 0.7071068 <NA> #> 2 r 2 AsIs 3 NA NA NA NA NA NA <NA> #> 3 s 3 logical 3 3 NA Inf -Inf NaN NA <NA> #> 4 t 4 raw 3 NA NA NA NA NA NA <NA> #> 5 w 5 integer 3 0 NA 1 3 2.0 1.0000000 <NA> #> 6 x 6 numeric 3 1 NA 2 3 2.5 0.7071068 <NA> #> 7 y 7 factor 3 1 NA NA NA NA NA 3 #> 8 z 8 character 3 1 NA NA NA NA NA a #> 9 q 9 list 3 NA NA NA NA NA NA <NA> #> lexmax #> 1 <NA> #> 2 <NA> #> 3 <NA> #> 4 <NA> #> 5 <NA> #> 6 <NA> #> 7 5 #> 8 z #> 9 <NA>