{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "e6e77b5e",
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Registered S3 methods overwritten by 'ggplot2':\n",
" method from \n",
" [.quosures rlang\n",
" c.quosures rlang\n",
" print.quosures rlang\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Registered S3 method overwritten by 'rvest':\n",
" method from\n",
" read_xml.response xml2\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"-- Attaching packages --------------------------------------- tidyverse 1.2.1 --\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"v ggplot2 3.1.1 v purrr 0.3.2 \n",
"v tibble 2.1.1 v dplyr 0.8.0.1\n",
"v tidyr 0.8.3 v stringr 1.4.0 \n",
"v readr 1.3.1 v forcats 0.4.0 \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"-- Conflicts ------------------------------------------ tidyverse_conflicts() --\n",
"x dplyr::filter() masks stats::filter()\n",
"x dplyr::lag() masks stats::lag()\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Parsed with column specification:\n",
"cols(\n",
" ID = col_double(),\n",
" Name = col_character(),\n",
" Gender = col_character(),\n",
" Age = col_double(),\n",
" Rating = col_double(),\n",
" Degree = col_character(),\n",
" Start_Date = col_character(),\n",
" Retired = col_logical(),\n",
" Division = col_character(),\n",
" Salary = col_character()\n",
")\n"
]
}
],
"source": [
"library(tidyverse)\n",
"employees <- read_csv(\"../../_build/data/employee_data.csv\")\n",
"employees$Salary <- parse_number(employees$Salary)\n",
"employees$Start_Date <- parse_date(employees$Start_Date, format = \"%m/%d/%Y\")"
]
},
{
"cell_type": "markdown",
"id": "8b76cb4b",
"metadata": {},
"source": [
"# Selecting Columns\n",
"\n",
"In the previous section we saw how to select certain *rows* based on a set of conditions. In this section we show how to select certain *columns*, which we can do with `select()`: \n",
"\n",
"```{admonition} Syntax\n",
"`tidyverse::select(df, var1, var2, var3, ...)`\n",
"+ *Required arguments*\n",
" - `df`: The tibble (data frame) with the data. \n",
" - `var1`: The name of the column to keep.\n",
"+ *Optional arguments*\n",
" - `var2, var3, ...`: The name of additional columns to keep.\n",
"```\n",
"\n",
"Imagine we wanted to explore the relationship between `Degree`, `Division`, and `Salary`, and did not care about any of the other columns in the employees data set. Using `select()`, we could create a new data frame with only those columns:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fb98705b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"Degree | Division | Salary |
\n",
"\n",
"\tHigh School | Operations | 108804 |
\n",
"\tPh.D | Engineering | 182343 |
\n",
"\tMaster's | Engineering | 206770 |
\n",
"\tHigh School | Sales | 183407 |
\n",
"\tPh.D | Corporate | 236240 |
\n",
"\tAssociate's | Sales | NA |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" Degree & Division & Salary\\\\\n",
"\\hline\n",
"\t High School & Operations & 108804 \\\\\n",
"\t Ph.D & Engineering & 182343 \\\\\n",
"\t Master's & Engineering & 206770 \\\\\n",
"\t High School & Sales & 183407 \\\\\n",
"\t Ph.D & Corporate & 236240 \\\\\n",
"\t Associate's & Sales & NA \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| Degree | Division | Salary |\n",
"|---|---|---|\n",
"| High School | Operations | 108804 |\n",
"| Ph.D | Engineering | 182343 |\n",
"| Master's | Engineering | 206770 |\n",
"| High School | Sales | 183407 |\n",
"| Ph.D | Corporate | 236240 |\n",
"| Associate's | Sales | NA |\n",
"\n"
],
"text/plain": [
" Degree Division Salary\n",
"1 High School Operations 108804\n",
"2 Ph.D Engineering 182343\n",
"3 Master's Engineering 206770\n",
"4 High School Sales 183407\n",
"5 Ph.D Corporate 236240\n",
"6 Associate's Sales NA"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"employeesTargetCols <- select(employees, Degree, Division, Salary)\n",
"head(employeesTargetCols)"
]
},
{
"cell_type": "markdown",
"id": "77e2163a",
"metadata": {},
"source": [
"If we want to exclude column(s) by name, we can simply add a minus sign in front of the column names in the call to `filter()`:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "262fb18f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"ID | Name | Gender | Rating | Degree | Start_Date | Division | Salary |
\n",
"\n",
"\t6881 | al-Rahimi, Tayyiba | Female | 10 | High School | 1990-02-23 | Operations | 108804 |
\n",
"\t2671 | Lewis, Austin | Male | 4 | Ph.D | 2007-02-23 | Engineering | 182343 |
\n",
"\t8925 | el-Jaffer, Manaal | Female | 10 | Master's | 1991-02-23 | Engineering | 206770 |
\n",
"\t2769 | Soto, Michael | Male | 10 | High School | 1987-02-23 | Sales | 183407 |
\n",
"\t2658 | al-Ebrahimi, Mamoon | Male | 8 | Ph.D | 1985-02-23 | Corporate | 236240 |
\n",
"\t1933 | Medina, Brandy | Female | 7 | Associate's | 1979-02-23 | Sales | NA |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllll}\n",
" ID & Name & Gender & Rating & Degree & Start\\_Date & Division & Salary\\\\\n",
"\\hline\n",
"\t 6881 & al-Rahimi, Tayyiba & Female & 10 & High School & 1990-02-23 & Operations & 108804 \\\\\n",
"\t 2671 & Lewis, Austin & Male & 4 & Ph.D & 2007-02-23 & Engineering & 182343 \\\\\n",
"\t 8925 & el-Jaffer, Manaal & Female & 10 & Master's & 1991-02-23 & Engineering & 206770 \\\\\n",
"\t 2769 & Soto, Michael & Male & 10 & High School & 1987-02-23 & Sales & 183407 \\\\\n",
"\t 2658 & al-Ebrahimi, Mamoon & Male & 8 & Ph.D & 1985-02-23 & Corporate & 236240 \\\\\n",
"\t 1933 & Medina, Brandy & Female & 7 & Associate's & 1979-02-23 & Sales & NA \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| ID | Name | Gender | Rating | Degree | Start_Date | Division | Salary |\n",
"|---|---|---|---|---|---|---|---|\n",
"| 6881 | al-Rahimi, Tayyiba | Female | 10 | High School | 1990-02-23 | Operations | 108804 |\n",
"| 2671 | Lewis, Austin | Male | 4 | Ph.D | 2007-02-23 | Engineering | 182343 |\n",
"| 8925 | el-Jaffer, Manaal | Female | 10 | Master's | 1991-02-23 | Engineering | 206770 |\n",
"| 2769 | Soto, Michael | Male | 10 | High School | 1987-02-23 | Sales | 183407 |\n",
"| 2658 | al-Ebrahimi, Mamoon | Male | 8 | Ph.D | 1985-02-23 | Corporate | 236240 |\n",
"| 1933 | Medina, Brandy | Female | 7 | Associate's | 1979-02-23 | Sales | NA |\n",
"\n"
],
"text/plain": [
" ID Name Gender Rating Degree Start_Date Division \n",
"1 6881 al-Rahimi, Tayyiba Female 10 High School 1990-02-23 Operations \n",
"2 2671 Lewis, Austin Male 4 Ph.D 2007-02-23 Engineering\n",
"3 8925 el-Jaffer, Manaal Female 10 Master's 1991-02-23 Engineering\n",
"4 2769 Soto, Michael Male 10 High School 1987-02-23 Sales \n",
"5 2658 al-Ebrahimi, Mamoon Male 8 Ph.D 1985-02-23 Corporate \n",
"6 1933 Medina, Brandy Female 7 Associate's 1979-02-23 Sales \n",
" Salary\n",
"1 108804\n",
"2 182343\n",
"3 206770\n",
"4 183407\n",
"5 236240\n",
"6 NA"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"employeesExcludedCols <- select(employees, -Age, -Retired)\n",
"head(employeesExcludedCols)"
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"formats": "md:myst",
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.13,
"jupytext_version": "1.11.5"
}
},
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.6.1"
},
"source_map": [
16,
22,
39,
42,
46
]
},
"nbformat": 4,
"nbformat_minor": 5
}