R/RowsColsQ.R
rowrecs_to_blocks_q.Rd
Transform data facts from columns into additional rows using SQL and controlTable.
rowrecs_to_blocks_q( wideTable, controlTable, my_db, ..., columnsToCopy = NULL, tempNameGenerator = mk_tmp_name_source("mvtrq"), strict = FALSE, controlTableKeys = colnames(controlTable)[[1]], checkNames = TRUE, checkKeys = FALSE, showQuery = FALSE, defaultValue = NULL, temporary = FALSE, resultName = NULL, incoming_qualifiers = NULL, outgoing_qualifiers = NULL, temp_qualifiers = NULL, executeQuery = TRUE )
wideTable | name of table containing data to be mapped (db/Spark data) |
---|---|
controlTable | table specifying mapping (local data frame) |
my_db | db handle |
... | force later arguments to be by name. |
columnsToCopy | character array of column names to copy |
tempNameGenerator | a tempNameGenerator from cdata::mk_tmp_name_source() |
strict | logical, if TRUE check control table name forms |
controlTableKeys | character, which column names of the control table are considered to be keys. |
checkNames | logical, if TRUE check names |
checkKeys | logical, if TRUE check wideTable keys |
showQuery | if TRUE print query |
defaultValue | if not NULL literal to use for non-match values. |
temporary | logical, if TRUE make result temporary. |
resultName | character, name for result table. |
incoming_qualifiers | optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
outgoing_qualifiers | optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
temp_qualifiers | optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
executeQuery | logical, if TRUE execute the query and return result. |
long table built by mapping wideTable to one row per group (or query if executeQuery is FALSE)
This is using the theory of "fluid data"n (https://github.com/WinVector/cdata), which includes the principle that each data cell has coordinates independent of the storage details and storage detail dependent coordinates (usually row-id, column-id, and group-id) can be re-derived at will (the other principle is that there may not be "one true preferred data shape" and many re-shapings of data may be needed to match data to different algorithms and methods).
The controlTable defines the names of each data element in the two notations: the notation of the tall table (which is row oriented) and the notation of the wide table (which is column oriented). controlTable[ , 1] (the group label) cross colnames(controlTable) (the column labels) are names of data cells in the long form. controlTable[ , 2:ncol(controlTable)] (column labels) are names of data cells in the wide form. To get behavior similar to tidyr::gather/spread one builds the control table by running an appropriate query over the data.
Some discussion and examples can be found here: https://winvector.github.io/FluidData/FluidData.html and here https://github.com/WinVector/cdata.
if (requireNamespace("DBI", quietly = TRUE) && requireNamespace("RSQLite", quietly = TRUE)) { my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") # un-pivot example d <- data.frame(AUC = 0.6, R2 = 0.2) rquery::rq_copy_to(my_db, 'd', d, overwrite = TRUE, temporary = TRUE) cT <- build_unpivot_control(nameForNewKeyColumn= 'meas', nameForNewValueColumn= 'val', columnsToTakeFrom= c('AUC', 'R2')) tab <- rowrecs_to_blocks_q('d', cT, my_db = my_db) qlook(my_db, tab) DBI::dbDisconnect(my_db) }#> table `mvtrq_88012909446674514404_0000000001` SQLiteConnection #> nrow: 2 #> 'data.frame': 2 obs. of 2 variables: #> $ meas: chr "AUC" "R2" #> $ val : num 0.6 0.2