This is my (new)1 “New” also meaning we need the latest versions of wrapr
and seplyr
. favorite way to bulk rename columns in R
, especially when using sparklyr
to work with big data.
There are, of course, many other ways to do this.
Consider the following data:2 Source code for this note can be found here.
class(d)
## [1] "tbl_spark" "tbl_sql" "tbl_lazy" "tbl"
d %.>%
head(.) %.>%
knitr::kable(.)
name | height | mass | hair_color | skin_color | eye_color | birth_year | gender | homeworld | species |
---|---|---|---|---|---|---|---|---|---|
Luke Skywalker | 172 | 77 | blond | fair | blue | 19.0 | male | Tatooine | Human |
C-3PO | 167 | 75 | NA | gold | yellow | 112.0 | NA | Tatooine | Droid |
R2-D2 | 96 | 32 | NA | white, blue | red | 33.0 | NA | Naboo | Droid |
Darth Vader | 202 | 136 | none | white | yellow | 41.9 | male | Tatooine | Human |
Leia Organa | 150 | 49 | brown | light | brown | 19.0 | female | Alderaan | Human |
Owen Lars | 178 | 120 | brown, grey | light | blue | 52.0 | male | Tatooine | Human |
Notice this data is a remote Spark
data handle (not an in-memory data.frame
).
Suppose (for no reason at all) we want all columns with under-bars to be in upper case.
This is easy to achieve:3 People have asked, so I will discuss it (again). A lot of the other methods for renaming columns do not work with Spark
data. For example, using the current CRAN versions of rlang
(0.1.4), dplyr
(0.7.4), dbplyr
(1.1.0), sparklyr
(0.6.4): both rlang::set_names(d, toupper)
, and dplyr::rename(d, HAIR_COLOR=hair_color, SKIN_COLOR=skin_color)
throw when given Spark
data handles. This is a topic of some discussion: 1, 2, 3, 4, and 5.
library("seplyr")
# requires version at least 1.0.1
packageVersion("wrapr")
## [1] '1.0.1'
# requires version at least 0.1.6
packageVersion("seplyr")
## [1] '0.1.6'
# find columns matching our condition
colsToRename <- grepdf(".*_.*", d)
print(colsToRename)
## [1] "hair_color" "skin_color" "eye_color" "birth_year"
# build the right to left (NEW := old) assignment
mapping <- toupper(colsToRename) := colsToRename
print(mapping)
## HAIR_COLOR SKIN_COLOR EYE_COLOR BIRTH_YEAR
## "hair_color" "skin_color" "eye_color" "birth_year"
# apply the transform
d %.>%
rename_se(., mapping) %.>%
head(.) %.>%
knitr::kable(.)
name | height | mass | HAIR_COLOR | SKIN_COLOR | EYE_COLOR | BIRTH_YEAR | gender | homeworld | species |
---|---|---|---|---|---|---|---|---|---|
Luke Skywalker | 172 | 77 | blond | fair | blue | 19.0 | male | Tatooine | Human |
C-3PO | 167 | 75 | NA | gold | yellow | 112.0 | NA | Tatooine | Droid |
R2-D2 | 96 | 32 | NA | white, blue | red | 33.0 | NA | Naboo | Droid |
Darth Vader | 202 | 136 | none | white | yellow | 41.9 | male | Tatooine | Human |
Leia Organa | 150 | 49 | brown | light | brown | 19.0 | female | Alderaan | Human |
Owen Lars | 178 | 120 | brown, grey | light | blue | 52.0 | male | Tatooine | Human |