cdata recommends an operator idiom to apply data transforms.

The idea is simple, yet powerful.

First let’s start with some data.

model_id measure value
1 AUC 0.7
1 R2 0.4
2 AUC 0.8
2 R2 0.5

In the above data we have two measurements each for two individuals (individuals identified by the “model_id” column). Using cdata’s rowrecs_to_blocks_spec() method we can capture a description of this record structure and transformation details.

Once we have this specification we can transform the data using operator notation.

We can collect the record blocks into rows by a “factor-out” (or aggregation/projection) step.

model_id measure value
1 AUC 0.7
1 R2 0.4
2 AUC 0.8
2 R2 0.5
model_id AUC R2
1 0.7 0.4
2 0.8 0.5
model_id AUC R2
1 0.7 0.4
2 0.8 0.5

We can expand record rows into blocks by a “multiplication” (or join) step.

model_id AUC R2
1 0.7 0.4
2 0.8 0.5
model_id measure value
1 AUC 0.7
2 AUC 0.8
1 R2 0.4
2 R2 0.5
model_id measure value
1 AUC 0.7
2 AUC 0.8
1 R2 0.4
2 R2 0.5

(%//% and %**% being two operators introduced by the cdata package.)

And the two specialized operators have an inverse/adjoint relation.

model_id measure value
1 AUC 0.7
1 R2 0.4
2 AUC 0.8
2 R2 0.5

# identity
d4 <- d %//% t(transform) %**% transform

knitr::kable(d4)
model_id measure value
1 AUC 0.7
2 AUC 0.8
1 R2 0.4
2 R2 0.5

We can also pipe into the spec (and into its adjoint) using the wrapr dot pipe operator.

model_id AUC R2
1 0.7 0.4
2 0.8 0.5
model_id AUC R2
1 0.7 0.4
2 0.8 0.5
model_id measure value
1 AUC 0.7
2 AUC 0.8
1 R2 0.4
2 R2 0.5

And, of course, the exact same functionality for database tables.

have_db <- requireNamespace("DBI", quietly = TRUE) &&
   requireNamespace("RSQLite", quietly = TRUE)
model_id AUC R2
1 0.7 0.4
2 0.8 0.5

d_td %.>% 
  .(t(transform)) %.>%
  rquery::execute(db, .) %.>%
  knitr::kable(.)
model_id AUC R2
1 0.7 0.4
2 0.8 0.5
DBI::dbDisconnect(raw_connection)