In [1]:
library(tidyverse)
employees <- read_csv("../../_build/data/employee_data.csv")
employees$Salary <- parse_number(employees$Salary)
employees$Start_Date <- parse_date(employees$Start_Date, format = "%m/%d/%Y")

Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang


Registered S3 method overwritten by 'rvest':
  method            from
  read_xml.response xml2


-- Attaching packages --------------------------------------- tidyverse 1.2.1 --


v ggplot2 3.1.1       v purrr   0.3.2  
v tibble  2.1.1       v dplyr   0.8.0.1
v tidyr   0.8.3       v stringr 1.4.0  
v readr   1.3.1       v forcats 0.4.0  


-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()


Parsed with column specification:
cols(
  ID = col_double(),
  Name = col_character(),
  Gender = col_character(),
  Age = col_double(),
  Rating = col_double(),
  Degree = col_character(),
  Start_Date = col_character(),
  Retired = col_logical(),
  Division = col_character(),
  Salary = col_character()
)


# Sorting Data

Often you would like to sort your data based on one or more of the columns in your data set. This can be done using the `arrange()` function, which uses the following syntax:

```{admonition} Syntax
`tidyverse::arrange(df, var1, var2, var3, ...)`
+ *Required arguments*
  - `df`: The tibble (data frame) with the data you would like to sort. 
  - `var1`: The name of the column to use to sort the data.
+ *Optional arguments*
  - `var2, var3, ...`: The name of additional columns to use to sort the data. When multiple columns are specified, each additional column is used to break ties in the preceding column. 
```

By default, `arrange()` sorts `numeric` variables from smallest to largest and `character` variables alphabetically. You can reverse the order of the sort by surrounding the column name with `desc()` in the function call.

First, let's create a new version of the data frame called `employeesSortedAge`, with the employees sorted from youngest to oldest.

In [2]:
employeesSortedAge <- arrange(employees, Age)
head(employeesSortedAge)

ID,Name,Gender,Age,Rating,Degree,Start_Date,Retired,Division,Salary
7068,"Dimas, Roman",Male,25,8,High School,2017-02-23,False,Operations,84252
5464,"al-Pirani, Rajab",Male,25,3,Associate's,2016-02-23,False,Operations,37907
7910,"Hopper, Summer",Female,25,7,Bachelor's,2017-02-23,False,Engineering,100688
6784,"al-Siddique, Zaitoona",Female,25,4,Master's,2015-02-23,False,Human Resources,127618
3240,"Steggall, Shai",Female,25,7,Master's,2017-02-23,False,Operations,117062
1413,"Tanner, Sean",Male,25,2,Associate's,2016-02-23,False,Operations,61869


In [3]:
tail(employeesSortedAge)

ID,Name,Gender,Age,Rating,Degree,Start_Date,Retired,Division,Salary
6798,"Werkele, Jakob",Male,65,7,Ph.D,1976-02-23,True,Engineering,
6291,"Anderson, Collyn",Male,65,6,High School,1977-02-23,False,Operations,179634.0
8481,"Phillips, Jasmyn",Female,65,5,High School,1975-02-23,True,Sales,
4600,"Olivas, Julian",Male,65,2,Ph.D,1976-02-23,False,Engineering,204576.0
6777,"Mortimer, Kendall",Female,65,7,Master's,1977-02-23,False,Corporate,248925.0
2924,"Mills, Tasia",Female,65,8,High School,1977-02-23,False,Operations,138212.0


We can instead sort the data from oldest to youngest by adding `desc()` around `Age`:

In [4]:
employeesSortedAgeDesc <- arrange(employees, desc(Age))
head(employeesSortedAgeDesc)

ID,Name,Gender,Age,Rating,Degree,Start_Date,Retired,Division,Salary
8060,"al-Morad, Mastoor",Male,65,8,Ph.D,1977-02-23,False,Corporate,213381.0
9545,"Lloyd, Devante",Male,65,9,Bachelor's,1974-02-23,False,Accounting,243326.0
7305,"Law, Charisma",Female,65,8,Associate's,1976-02-23,False,Human Resources,214788.0
4141,"Herrera, Yarabbi",Female,65,8,High School,1975-02-23,False,Operations,143728.0
2559,"Holiday, Emma",Female,65,7,Bachelor's,1975-02-23,True,Operations,
4407,"Ross, Caitlyn",Female,65,7,Bachelor's,1975-02-23,True,Corporate,


In [5]:
tail(employeesSortedAgeDesc)

ID,Name,Gender,Age,Rating,Degree,Start_Date,Retired,Division,Salary
1413,"Tanner, Sean",Male,25,2,Associate's,2016-02-23,False,Operations,61869
8324,"Bancroft, Isaiah",Male,25,7,Master's,2017-02-23,False,Corporate,135935
1230,"Kirgis, Arissa",Female,25,8,Bachelor's,2015-02-23,False,Operations,113573
6308,"Barnett, Marquise",Male,25,8,Master's,2016-02-23,False,Operations,103798
3241,"Byrd, Sydny",Female,25,6,Ph.D,2016-02-23,False,Engineering,126366
9249,"Lopez, Karissa",Female,25,8,Associate's,2016-02-23,False,Sales,75689


Now imagine that we wanted to perform a multi-level sort, where we first sort the employees from oldest to youngest, and then within each age sort the names alphabetically. We can do this by adding the `Name` column to our function call:

In [6]:
employeesSortedAgeDescName <- arrange(employees, desc(Age), Name)
head(employeesSortedAgeDescName)

ID,Name,Gender,Age,Rating,Degree,Start_Date,Retired,Division,Salary
8060,"al-Morad, Mastoor",Male,65,8,Ph.D,1977-02-23,False,Corporate,213381.0
6291,"Anderson, Collyn",Male,65,6,High School,1977-02-23,False,Operations,179634.0
3661,"el-Meskin, Asad",Male,65,9,Bachelor's,1977-02-23,False,Engineering,177504.0
5245,"Gowen, Hannah",Female,65,7,Bachelor's,1975-02-23,False,Accounting,191765.0
4141,"Herrera, Yarabbi",Female,65,8,High School,1975-02-23,False,Operations,143728.0
2559,"Holiday, Emma",Female,65,7,Bachelor's,1975-02-23,True,Operations,


In [7]:
tail(employeesSortedAgeDescName)

ID,Name,Gender,Age,Rating,Degree,Start_Date,Retired,Division,Salary
7068,"Dimas, Roman",Male,25,8,High School,2017-02-23,False,Operations,84252
7910,"Hopper, Summer",Female,25,7,Bachelor's,2017-02-23,False,Engineering,100688
1230,"Kirgis, Arissa",Female,25,8,Bachelor's,2015-02-23,False,Operations,113573
9249,"Lopez, Karissa",Female,25,8,Associate's,2016-02-23,False,Sales,75689
3240,"Steggall, Shai",Female,25,7,Master's,2017-02-23,False,Operations,117062
1413,"Tanner, Sean",Male,25,2,Associate's,2016-02-23,False,Operations,61869
