Selecting Columns

In the previous section we saw how to select certain rows based on a set of conditions. In this section we show how to select certain columns, which we can do with select():

Syntax

tidyverse::select(df, var1, var2, var3, ...)

  • Required arguments

    • df: The tibble (data frame) with the data.

    • var1: The name of the column to keep.

  • Optional arguments

    • var2, var3, ...: The name of additional columns to keep.

Imagine we wanted to explore the relationship between Degree, Division, and Salary, and did not care about any of the other columns in the employees data set. Using select(), we could create a new data frame with only those columns:

employeesTargetCols <- select(employees, Degree, Division, Salary)
head(employeesTargetCols)
DegreeDivisionSalary
High SchoolOperations 108804
Ph.D Engineering182343
Master's Engineering206770
High SchoolSales 183407
Ph.D Corporate 236240
Associate'sSales NA

If we want to exclude column(s) by name, we can simply add a minus sign in front of the column names in the call to filter():

employeesExcludedCols <- select(employees, -Age, -Retired)
head(employeesExcludedCols)
IDNameGenderRatingDegreeStart_DateDivisionSalary
6881 al-Rahimi, Tayyiba Female 10 High School 1990-02-23 Operations 108804
2671 Lewis, Austin Male 4 Ph.D 2007-02-23 Engineering 182343
8925 el-Jaffer, Manaal Female 10 Master's 1991-02-23 Engineering 206770
2769 Soto, Michael Male 10 High School 1987-02-23 Sales 183407
2658 al-Ebrahimi, MamoonMale 8 Ph.D 1985-02-23 Corporate 236240
1933 Medina, Brandy Female 7 Associate's 1979-02-23 Sales NA