Selecting Columns¶
In the previous section we saw how to select certain rows based on a set of conditions. In this section we show how to select certain columns, which we can do with select():
Syntax
tidyverse::select(df, var1, var2, var3, ...)
Required arguments
df: The tibble (data frame) with the data.var1: The name of the column to keep.
Optional arguments
var2, var3, ...: The name of additional columns to keep.
Imagine we wanted to explore the relationship between Degree, Division, and Salary, and did not care about any of the other columns in the employees data set. Using select(), we could create a new data frame with only those columns:
employeesTargetCols <- select(employees, Degree, Division, Salary)
head(employeesTargetCols)
| Degree | Division | Salary |
|---|---|---|
| High School | Operations | 108804 |
| Ph.D | Engineering | 182343 |
| Master's | Engineering | 206770 |
| High School | Sales | 183407 |
| Ph.D | Corporate | 236240 |
| Associate's | Sales | NA |
If we want to exclude column(s) by name, we can simply add a minus sign in front of the column names in the call to filter():
employeesExcludedCols <- select(employees, -Age, -Retired)
head(employeesExcludedCols)
| ID | Name | Gender | Rating | Degree | Start_Date | Division | Salary |
|---|---|---|---|---|---|---|---|
| 6881 | al-Rahimi, Tayyiba | Female | 10 | High School | 1990-02-23 | Operations | 108804 |
| 2671 | Lewis, Austin | Male | 4 | Ph.D | 2007-02-23 | Engineering | 182343 |
| 8925 | el-Jaffer, Manaal | Female | 10 | Master's | 1991-02-23 | Engineering | 206770 |
| 2769 | Soto, Michael | Male | 10 | High School | 1987-02-23 | Sales | 183407 |
| 2658 | al-Ebrahimi, Mamoon | Male | 8 | Ph.D | 1985-02-23 | Corporate | 236240 |
| 1933 | Medina, Brandy | Female | 7 | Associate's | 1979-02-23 | Sales | NA |