Selecting Columns¶
In the previous section we saw how to select certain rows based on a set of conditions. In this section we show how to select certain columns, which we can do with select()
:
Syntax
tidyverse::select(df, var1, var2, var3, ...)
Required arguments
df
: The tibble (data frame) with the data.var1
: The name of the column to keep.
Optional arguments
var2, var3, ...
: The name of additional columns to keep.
Imagine we wanted to explore the relationship between Degree
, Division
, and Salary
, and did not care about any of the other columns in the employees data set. Using select()
, we could create a new data frame with only those columns:
employeesTargetCols <- select(employees, Degree, Division, Salary)
head(employeesTargetCols)
Degree | Division | Salary |
---|---|---|
High School | Operations | 108804 |
Ph.D | Engineering | 182343 |
Master's | Engineering | 206770 |
High School | Sales | 183407 |
Ph.D | Corporate | 236240 |
Associate's | Sales | NA |
If we want to exclude column(s) by name, we can simply add a minus sign in front of the column names in the call to filter()
:
employeesExcludedCols <- select(employees, -Age, -Retired)
head(employeesExcludedCols)
ID | Name | Gender | Rating | Degree | Start_Date | Division | Salary |
---|---|---|---|---|---|---|---|
6881 | al-Rahimi, Tayyiba | Female | 10 | High School | 1990-02-23 | Operations | 108804 |
2671 | Lewis, Austin | Male | 4 | Ph.D | 2007-02-23 | Engineering | 182343 |
8925 | el-Jaffer, Manaal | Female | 10 | Master's | 1991-02-23 | Engineering | 206770 |
2769 | Soto, Michael | Male | 10 | High School | 1987-02-23 | Sales | 183407 |
2658 | al-Ebrahimi, Mamoon | Male | 8 | Ph.D | 1985-02-23 | Corporate | 236240 |
1933 | Medina, Brandy | Female | 7 | Associate's | 1979-02-23 | Sales | NA |