Map a set of columns to rows (query based, take name of table).

Transform data facts from columns into additional rows using SQL and controlTable.

rowrecs_to_blocks_q(
  wideTable,
  controlTable,
  my_db,
  ...,
  columnsToCopy = NULL,
  tempNameGenerator = mk_tmp_name_source("mvtrq"),
  strict = FALSE,
  controlTableKeys = colnames(controlTable)[[1]],
  checkNames = TRUE,
  checkKeys = FALSE,
  showQuery = FALSE,
  defaultValue = NULL,
  temporary = FALSE,
  resultName = NULL,
  incoming_qualifiers = NULL,
  outgoing_qualifiers = NULL,
  temp_qualifiers = NULL,
  executeQuery = TRUE
)

Arguments

wideTable	name of table containing data to be mapped (db/Spark data)
controlTable	table specifying mapping (local data frame)
my_db	db handle
...	force later arguments to be by name.
columnsToCopy	character array of column names to copy
tempNameGenerator	a tempNameGenerator from cdata::mk_tmp_name_source()
strict	logical, if TRUE check control table name forms
controlTableKeys	character, which column names of the control table are considered to be keys.
checkNames	logical, if TRUE check names
checkKeys	logical, if TRUE check wideTable keys
showQuery	if TRUE print query
defaultValue	if not NULL literal to use for non-match values.
temporary	logical, if TRUE make result temporary.
resultName	character, name for result table.
incoming_qualifiers	optional named ordered vector of strings carrying additional db hierarchy terms, such as schema.
outgoing_qualifiers	optional named ordered vector of strings carrying additional db hierarchy terms, such as schema.
temp_qualifiers	optional named ordered vector of strings carrying additional db hierarchy terms, such as schema.
executeQuery	logical, if TRUE execute the query and return result.

Value

long table built by mapping wideTable to one row per group (or query if executeQuery is FALSE)

Details

This is using the theory of "fluid data"n (https://github.com/WinVector/cdata), which includes the principle that each data cell has coordinates independent of the storage details and storage detail dependent coordinates (usually row-id, column-id, and group-id) can be re-derived at will (the other principle is that there may not be "one true preferred data shape" and many re-shapings of data may be needed to match data to different algorithms and methods).

The controlTable defines the names of each data element in the two notations: the notation of the tall table (which is row oriented) and the notation of the wide table (which is column oriented). controlTable[ , 1] (the group label) cross colnames(controlTable) (the column labels) are names of data cells in the long form. controlTable[ , 2:ncol(controlTable)] (column labels) are names of data cells in the wide form. To get behavior similar to tidyr::gather/spread one builds the control table by running an appropriate query over the data.

Some discussion and examples can be found here: https://winvector.github.io/FluidData/FluidData.html and here https://github.com/WinVector/cdata.

Examples


if (requireNamespace("DBI", quietly = TRUE) &&
  requireNamespace("RSQLite", quietly = TRUE)) {
  my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

  # un-pivot example
  d <- data.frame(AUC = 0.6, R2 = 0.2)
  rquery::rq_copy_to(my_db,
                    'd',
                    d,
                    overwrite = TRUE,
                    temporary = TRUE)
  cT <- build_unpivot_control(nameForNewKeyColumn= 'meas',
                              nameForNewValueColumn= 'val',
                              columnsToTakeFrom= c('AUC', 'R2'))
  tab <- rowrecs_to_blocks_q('d', cT, my_db = my_db)
  qlook(my_db, tab)
  DBI::dbDisconnect(my_db)
}
#> table `mvtrq_88012909446674514404_0000000001` SQLiteConnection 
#>  nrow: 2 
#> 'data.frame':	2 obs. of  2 variables:
#>  $ meas: chr  "AUC" "R2"
#>  $ val : num  0.6 0.2

Map a set of columns to rows (query based, take name of table).

Arguments

Value

Details

See also

Examples