cdata is a general data re-shaper that has the great virtue of adhering to the so-called “Rule of Representation”:

Fold knowledge into data, so program logic can be stupid and robust.

The Art of Unix Programming, Erick S. Raymond, Addison-Wesley , 2003

The point being: it is much easier to reason about data than to try to reason about code, so using data to control your code is often a very good trade-off.

Briefly: cdata supplies data transform operators that:

• Work on local data or with any DBI data source.
• Are powerful generalizations of the operations commonly called pivot and un-pivot.

A quick example: plot iris petal and sepal dimensions in a faceted graph.

iris <- data.frame(iris)

library("ggplot2")
library("cdata")

#
# build a control table with a "key column" flower_part
# and "value columns" Length and Width
#
controlTable <- wrapr::qchar_frame(
flower_part, Length      , Width       |
Petal    , Petal.Length, Petal.Width |
Sepal    , Sepal.Length, Sepal.Width )

# do the unpivot to convert the row records to block records
iris_aug <- rowrecs_to_blocks(
iris,
controlTable,
columnsToCopy = c("Species"))

ggplot(iris_aug, aes(x=Length, y=Width)) +
geom_point(aes(color=Species, shape=Species)) +
facet_wrap(~flower_part, labeller = label_both, scale = "free") +
ggtitle("Iris dimensions") +  scale_color_brewer(palette = "Dark2")

More details on the above example can be found here. A tutorial on how to design a controlTable can be found here.
And some discussion of the nature of records in cdata can be found here.

We can also exhibit a larger example of using cdata to create a scatter-plot matrix, or pair plot:


iris <- data.frame(iris)

library("ggplot2")
library("cdata")

# declare our columns of interest
meas_vars <- qc(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
category_variable <- "Species"

# build a control with all pairs of variables as value columns
# and pair_key as the key column
controlTable <- data.frame(expand.grid(meas_vars, meas_vars,
stringsAsFactors = FALSE))
# name the value columns value1 and value2
colnames(controlTable) <- qc(value1, value2)
# insert first, or key column
controlTable <- cbind(
data.frame(pair_key = paste(controlTable[[1]], controlTable[[2]]),
stringsAsFactors = FALSE),
controlTable)

# do the unpivot to convert the row records to multiple block records
iris_aug <- rowrecs_to_blocks(
iris,
controlTable,
columnsToCopy = category_variable)

# unpack the key column into two variable keys for the facet_grid
splt <- strsplit(iris_aug$pair_key, split = " ", fixed = TRUE) iris_aug$v1 <- vapply(splt, function(si) si[[1]], character(1))
iris_aug$v2 <- vapply(splt, function(si) si[[2]], character(1)) ggplot(iris_aug, aes(x=value1, y=value2)) + geom_point(aes_string(color=category_variable, shape=category_variable)) + facet_grid(v2~v1, labeller = label_both, scale = "free") + ggtitle("Iris dimensions") + scale_color_brewer(palette = "Dark2") + ylab(NULL) + xlab(NULL) The above is now wrapped into a one-line command in WVPlots. And a quick database example: library("cdata") library("rquery") use_spark <- FALSE if(use_spark) { my_db <- sparklyr::spark_connect(version='2.2.0', master = "local") } else { my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") } # pivot example d <- wrapr::build_frame( "meas", "val" | "AUC" , 0.6 | "R2" , 0.2 ) DBI::dbWriteTable(my_db, 'd', d, temporary = TRUE) rstr(my_db, 'd') # table d SQLiteConnection # nrow: 2 # 'data.frame': 2 obs. of 2 variables: #$ meas: chr  "AUC" "R2"
#   $val : num 0.6 0.2 td <- db_td(my_db, "d") td # [1] "table(d; meas, val)" cT <- td %.>% build_pivot_control(., columnToTakeKeysFrom= 'meas', columnToTakeValuesFrom= 'val') %.>% execute(my_db, .) print(cT) # meas val # 1 AUC AUC # 2 R2 R2 tab <- td %.>% blocks_to_rowrecs(., keyColumns = NULL, controlTable = cT, temporary = FALSE) %.>% materialize(my_db, .) print(tab) # [1] "table(rquery_mat_20991586465541874367_0000000000; AUC, R2)" rstr(my_db, tab) # table rquery_mat_20991586465541874367_0000000000 SQLiteConnection # nrow: 1 # 'data.frame': 1 obs. of 2 variables: #$ AUC: num 0.6
#   \$ R2 : num 0.2

if(use_spark) {
sparklyr::spark_disconnect(my_db)
} else {
DBI::dbDisconnect(my_db)
}

The cdata package is a demonstration of the “coordinatized data” theory and includes an implementation of the “fluid data” methodology. The recommended tutorial is: Fluid data reshaping with cdata. We also have a short free cdata screencast (and another example can be found here).

Install via CRAN:

install.packages("cdata")

Note: cdata is targeted at data with “tame column names” (column names that are valid both in databases, and as R unquoted variable names) and basic types (column values that are simple R types such as character, numeric, logical, and so on).