Sorting Data¶
Often you would like to sort your data based on one or more of the columns in your data set. This can be done using the arrange()
function, which uses the following syntax:
Syntax
tidyverse::arrange(df, var1, var2, var3, ...)
Required arguments
df
: The tibble (data frame) with the data you would like to sort.var1
: The name of the column to use to sort the data.
Optional arguments
var2, var3, ...
: The name of additional columns to use to sort the data. When multiple columns are specified, each additional column is used to break ties in the preceding column.
By default, arrange()
sorts numeric
variables from smallest to largest and character
variables alphabetically. You can reverse the order of the sort by surrounding the column name with desc()
in the function call.
First, let’s create a new version of the data frame called employeesSortedAge
, with the employees sorted from youngest to oldest.
employeesSortedAge <- arrange(employees, Age)
head(employeesSortedAge)
ID | Name | Gender | Age | Rating | Degree | Start_Date | Retired | Division | Salary |
---|---|---|---|---|---|---|---|---|---|
7068 | Dimas, Roman | Male | 25 | 8 | High School | 2017-02-23 | FALSE | Operations | 84252 |
5464 | al-Pirani, Rajab | Male | 25 | 3 | Associate's | 2016-02-23 | FALSE | Operations | 37907 |
7910 | Hopper, Summer | Female | 25 | 7 | Bachelor's | 2017-02-23 | FALSE | Engineering | 100688 |
6784 | al-Siddique, Zaitoona | Female | 25 | 4 | Master's | 2015-02-23 | FALSE | Human Resources | 127618 |
3240 | Steggall, Shai | Female | 25 | 7 | Master's | 2017-02-23 | FALSE | Operations | 117062 |
1413 | Tanner, Sean | Male | 25 | 2 | Associate's | 2016-02-23 | FALSE | Operations | 61869 |
tail(employeesSortedAge)
ID | Name | Gender | Age | Rating | Degree | Start_Date | Retired | Division | Salary |
---|---|---|---|---|---|---|---|---|---|
6798 | Werkele, Jakob | Male | 65 | 7 | Ph.D | 1976-02-23 | TRUE | Engineering | NA |
6291 | Anderson, Collyn | Male | 65 | 6 | High School | 1977-02-23 | FALSE | Operations | 179634 |
8481 | Phillips, Jasmyn | Female | 65 | 5 | High School | 1975-02-23 | TRUE | Sales | NA |
4600 | Olivas, Julian | Male | 65 | 2 | Ph.D | 1976-02-23 | FALSE | Engineering | 204576 |
6777 | Mortimer, Kendall | Female | 65 | 7 | Master's | 1977-02-23 | FALSE | Corporate | 248925 |
2924 | Mills, Tasia | Female | 65 | 8 | High School | 1977-02-23 | FALSE | Operations | 138212 |
We can instead sort the data from oldest to youngest by adding desc()
around Age
:
employeesSortedAgeDesc <- arrange(employees, desc(Age))
head(employeesSortedAgeDesc)
ID | Name | Gender | Age | Rating | Degree | Start_Date | Retired | Division | Salary |
---|---|---|---|---|---|---|---|---|---|
8060 | al-Morad, Mastoor | Male | 65 | 8 | Ph.D | 1977-02-23 | FALSE | Corporate | 213381 |
9545 | Lloyd, Devante | Male | 65 | 9 | Bachelor's | 1974-02-23 | FALSE | Accounting | 243326 |
7305 | Law, Charisma | Female | 65 | 8 | Associate's | 1976-02-23 | FALSE | Human Resources | 214788 |
4141 | Herrera, Yarabbi | Female | 65 | 8 | High School | 1975-02-23 | FALSE | Operations | 143728 |
2559 | Holiday, Emma | Female | 65 | 7 | Bachelor's | 1975-02-23 | TRUE | Operations | NA |
4407 | Ross, Caitlyn | Female | 65 | 7 | Bachelor's | 1975-02-23 | TRUE | Corporate | NA |
tail(employeesSortedAgeDesc)
ID | Name | Gender | Age | Rating | Degree | Start_Date | Retired | Division | Salary |
---|---|---|---|---|---|---|---|---|---|
1413 | Tanner, Sean | Male | 25 | 2 | Associate's | 2016-02-23 | FALSE | Operations | 61869 |
8324 | Bancroft, Isaiah | Male | 25 | 7 | Master's | 2017-02-23 | FALSE | Corporate | 135935 |
1230 | Kirgis, Arissa | Female | 25 | 8 | Bachelor's | 2015-02-23 | FALSE | Operations | 113573 |
6308 | Barnett, Marquise | Male | 25 | 8 | Master's | 2016-02-23 | FALSE | Operations | 103798 |
3241 | Byrd, Sydny | Female | 25 | 6 | Ph.D | 2016-02-23 | FALSE | Engineering | 126366 |
9249 | Lopez, Karissa | Female | 25 | 8 | Associate's | 2016-02-23 | FALSE | Sales | 75689 |
Now imagine that we wanted to perform a multi-level sort, where we first sort the employees from oldest to youngest, and then within each age sort the names alphabetically. We can do this by adding the Name
column to our function call:
employeesSortedAgeDescName <- arrange(employees, desc(Age), Name)
head(employeesSortedAgeDescName)
ID | Name | Gender | Age | Rating | Degree | Start_Date | Retired | Division | Salary |
---|---|---|---|---|---|---|---|---|---|
8060 | al-Morad, Mastoor | Male | 65 | 8 | Ph.D | 1977-02-23 | FALSE | Corporate | 213381 |
6291 | Anderson, Collyn | Male | 65 | 6 | High School | 1977-02-23 | FALSE | Operations | 179634 |
3661 | el-Meskin, Asad | Male | 65 | 9 | Bachelor's | 1977-02-23 | FALSE | Engineering | 177504 |
5245 | Gowen, Hannah | Female | 65 | 7 | Bachelor's | 1975-02-23 | FALSE | Accounting | 191765 |
4141 | Herrera, Yarabbi | Female | 65 | 8 | High School | 1975-02-23 | FALSE | Operations | 143728 |
2559 | Holiday, Emma | Female | 65 | 7 | Bachelor's | 1975-02-23 | TRUE | Operations | NA |
tail(employeesSortedAgeDescName)
ID | Name | Gender | Age | Rating | Degree | Start_Date | Retired | Division | Salary |
---|---|---|---|---|---|---|---|---|---|
7068 | Dimas, Roman | Male | 25 | 8 | High School | 2017-02-23 | FALSE | Operations | 84252 |
7910 | Hopper, Summer | Female | 25 | 7 | Bachelor's | 2017-02-23 | FALSE | Engineering | 100688 |
1230 | Kirgis, Arissa | Female | 25 | 8 | Bachelor's | 2015-02-23 | FALSE | Operations | 113573 |
9249 | Lopez, Karissa | Female | 25 | 8 | Associate's | 2016-02-23 | FALSE | Sales | 75689 |
3240 | Steggall, Shai | Female | 25 | 7 | Master's | 2017-02-23 | FALSE | Operations | 117062 |
1413 | Tanner, Sean | Male | 25 | 2 | Associate's | 2016-02-23 | FALSE | Operations | 61869 |