Selecting Columns¶

In the previous section we saw how to select certain rows based on a set of conditions. In this section we show how to select certain columns, which we can do with select():

Syntax

tidyverse::select(df, var1, var2, var3, ...)

Required arguments
- df: The tibble (data frame) with the data.
- var1: The name of the column to keep.
Optional arguments
- var2, var3, ...: The name of additional columns to keep.

Imagine we wanted to explore the relationship between Degree, Division, and Salary, and did not care about any of the other columns in the employees data set. Using select(), we could create a new data frame with only those columns:

employeesTargetCols <- select(employees, Degree, Division, Salary)
head(employeesTargetCols)

Degree	Division	Salary
High School	Operations	108804
Ph.D	Engineering	182343
Master's	Engineering	206770
High School	Sales	183407
Ph.D	Corporate	236240
Associate's	Sales	NA

If we want to exclude column(s) by name, we can simply add a minus sign in front of the column names in the call to filter():

employeesExcludedCols <- select(employees, -Age, -Retired)
head(employeesExcludedCols)

ID	Name	Gender	Rating	Degree	Start_Date	Division	Salary
6881	al-Rahimi, Tayyiba	Female	10	High School	1990-02-23	Operations	108804
2671	Lewis, Austin	Male	4	Ph.D	2007-02-23	Engineering	182343
8925	el-Jaffer, Manaal	Female	10	Master's	1991-02-23	Engineering	206770
2769	Soto, Michael	Male	10	High School	1987-02-23	Sales	183407
2658	al-Ebrahimi, Mamoon	Male	8	Ph.D	1985-02-23	Corporate	236240
1933	Medina, Brandy	Female	7	Associate's	1979-02-23	Sales	NA

Data Science for Managers

Selecting Columns¶