{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "e6e77b5e", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Registered S3 methods overwritten by 'ggplot2':\n", " method from \n", " [.quosures rlang\n", " c.quosures rlang\n", " print.quosures rlang\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Registered S3 method overwritten by 'rvest':\n", " method from\n", " read_xml.response xml2\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "-- Attaching packages --------------------------------------- tidyverse 1.2.1 --\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "v ggplot2 3.1.1 v purrr 0.3.2 \n", "v tibble 2.1.1 v dplyr 0.8.0.1\n", "v tidyr 0.8.3 v stringr 1.4.0 \n", "v readr 1.3.1 v forcats 0.4.0 \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "-- Conflicts ------------------------------------------ tidyverse_conflicts() --\n", "x dplyr::filter() masks stats::filter()\n", "x dplyr::lag() masks stats::lag()\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Parsed with column specification:\n", "cols(\n", " ID = col_double(),\n", " Name = col_character(),\n", " Gender = col_character(),\n", " Age = col_double(),\n", " Rating = col_double(),\n", " Degree = col_character(),\n", " Start_Date = col_character(),\n", " Retired = col_logical(),\n", " Division = col_character(),\n", " Salary = col_character()\n", ")\n" ] } ], "source": [ "library(tidyverse)\n", "employees <- read_csv(\"../../_build/data/employee_data.csv\")\n", "employees$Salary <- parse_number(employees$Salary)\n", "employees$Start_Date <- parse_date(employees$Start_Date, format = \"%m/%d/%Y\")" ] }, { "cell_type": "markdown", "id": "8b76cb4b", "metadata": {}, "source": [ "# Selecting Columns\n", "\n", "In the previous section we saw how to select certain *rows* based on a set of conditions. In this section we show how to select certain *columns*, which we can do with `select()`: \n", "\n", "```{admonition} Syntax\n", "`tidyverse::select(df, var1, var2, var3, ...)`\n", "+ *Required arguments*\n", " - `df`: The tibble (data frame) with the data. \n", " - `var1`: The name of the column to keep.\n", "+ *Optional arguments*\n", " - `var2, var3, ...`: The name of additional columns to keep.\n", "```\n", "\n", "Imagine we wanted to explore the relationship between `Degree`, `Division`, and `Salary`, and did not care about any of the other columns in the employees data set. Using `select()`, we could create a new data frame with only those columns:" ] }, { "cell_type": "code", "execution_count": 2, "id": "fb98705b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
DegreeDivisionSalary
High SchoolOperations 108804
Ph.D Engineering182343
Master's Engineering206770
High SchoolSales 183407
Ph.D Corporate 236240
Associate'sSales NA
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " Degree & Division & Salary\\\\\n", "\\hline\n", "\t High School & Operations & 108804 \\\\\n", "\t Ph.D & Engineering & 182343 \\\\\n", "\t Master's & Engineering & 206770 \\\\\n", "\t High School & Sales & 183407 \\\\\n", "\t Ph.D & Corporate & 236240 \\\\\n", "\t Associate's & Sales & NA \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| Degree | Division | Salary |\n", "|---|---|---|\n", "| High School | Operations | 108804 |\n", "| Ph.D | Engineering | 182343 |\n", "| Master's | Engineering | 206770 |\n", "| High School | Sales | 183407 |\n", "| Ph.D | Corporate | 236240 |\n", "| Associate's | Sales | NA |\n", "\n" ], "text/plain": [ " Degree Division Salary\n", "1 High School Operations 108804\n", "2 Ph.D Engineering 182343\n", "3 Master's Engineering 206770\n", "4 High School Sales 183407\n", "5 Ph.D Corporate 236240\n", "6 Associate's Sales NA" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "employeesTargetCols <- select(employees, Degree, Division, Salary)\n", "head(employeesTargetCols)" ] }, { "cell_type": "markdown", "id": "77e2163a", "metadata": {}, "source": [ "If we want to exclude column(s) by name, we can simply add a minus sign in front of the column names in the call to `filter()`:" ] }, { "cell_type": "code", "execution_count": 3, "id": "262fb18f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
IDNameGenderRatingDegreeStart_DateDivisionSalary
6881 al-Rahimi, Tayyiba Female 10 High School 1990-02-23 Operations 108804
2671 Lewis, Austin Male 4 Ph.D 2007-02-23 Engineering 182343
8925 el-Jaffer, Manaal Female 10 Master's 1991-02-23 Engineering 206770
2769 Soto, Michael Male 10 High School 1987-02-23 Sales 183407
2658 al-Ebrahimi, MamoonMale 8 Ph.D 1985-02-23 Corporate 236240
1933 Medina, Brandy Female 7 Associate's 1979-02-23 Sales NA
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllll}\n", " ID & Name & Gender & Rating & Degree & Start\\_Date & Division & Salary\\\\\n", "\\hline\n", "\t 6881 & al-Rahimi, Tayyiba & Female & 10 & High School & 1990-02-23 & Operations & 108804 \\\\\n", "\t 2671 & Lewis, Austin & Male & 4 & Ph.D & 2007-02-23 & Engineering & 182343 \\\\\n", "\t 8925 & el-Jaffer, Manaal & Female & 10 & Master's & 1991-02-23 & Engineering & 206770 \\\\\n", "\t 2769 & Soto, Michael & Male & 10 & High School & 1987-02-23 & Sales & 183407 \\\\\n", "\t 2658 & al-Ebrahimi, Mamoon & Male & 8 & Ph.D & 1985-02-23 & Corporate & 236240 \\\\\n", "\t 1933 & Medina, Brandy & Female & 7 & Associate's & 1979-02-23 & Sales & NA \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| ID | Name | Gender | Rating | Degree | Start_Date | Division | Salary |\n", "|---|---|---|---|---|---|---|---|\n", "| 6881 | al-Rahimi, Tayyiba | Female | 10 | High School | 1990-02-23 | Operations | 108804 |\n", "| 2671 | Lewis, Austin | Male | 4 | Ph.D | 2007-02-23 | Engineering | 182343 |\n", "| 8925 | el-Jaffer, Manaal | Female | 10 | Master's | 1991-02-23 | Engineering | 206770 |\n", "| 2769 | Soto, Michael | Male | 10 | High School | 1987-02-23 | Sales | 183407 |\n", "| 2658 | al-Ebrahimi, Mamoon | Male | 8 | Ph.D | 1985-02-23 | Corporate | 236240 |\n", "| 1933 | Medina, Brandy | Female | 7 | Associate's | 1979-02-23 | Sales | NA |\n", "\n" ], "text/plain": [ " ID Name Gender Rating Degree Start_Date Division \n", "1 6881 al-Rahimi, Tayyiba Female 10 High School 1990-02-23 Operations \n", "2 2671 Lewis, Austin Male 4 Ph.D 2007-02-23 Engineering\n", "3 8925 el-Jaffer, Manaal Female 10 Master's 1991-02-23 Engineering\n", "4 2769 Soto, Michael Male 10 High School 1987-02-23 Sales \n", "5 2658 al-Ebrahimi, Mamoon Male 8 Ph.D 1985-02-23 Corporate \n", "6 1933 Medina, Brandy Female 7 Associate's 1979-02-23 Sales \n", " Salary\n", "1 108804\n", "2 182343\n", "3 206770\n", "4 183407\n", "5 236240\n", "6 NA" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "employeesExcludedCols <- select(employees, -Age, -Retired)\n", "head(employeesExcludedCols)" ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "-all", "formats": "md:myst", "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.13, "jupytext_version": "1.11.5" } }, "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.6.1" }, "source_map": [ 16, 22, 39, 42, 46 ] }, "nbformat": 4, "nbformat_minor": 5 }