Functions¶
A fundamental part of R programming is the use of functions, which are very similar to functions in Excel. For example, you may be familiar with the Excel function SUMIF()
, which sums the values in a range of cells that adhere to a certain criteria. Let’s return to the small data set from Fig. 3:
Imagine that we wanted to calculate the total amount spent on only those employees in the Operations department. We can accomplish this with SUMIF()
, which uses the following syntax:
Syntax
SUMIF(range, criteria, sum_range)
Required arguments
range
: The range of cells where the criteria should be evaluated.criteria
: The criteria to apply to the cells inrange
.
Optional arguments
sum_range
: The cells to sum based on the criteria.
To apply this function to the data shown in the figure, we would write =SUMIF(B2:B7, "Operations", A2:A7)
, which would evaluate to 302,000
. Feel free to verify this in Excel yourself.
Note that every time we use the SUMIF()
function, we must specify the range
and criteria
arguments. The sum_range
argument is optional, and we only need to use it if we want to sum a different set of cells than those specified in the range
argument.
Whenever we introduce a new R function, we will follow the same convention shown above to demonstrate the syntax of the function. The basic syntax of the function will be shown in a light blue box marked “Syntax”, and any required and optional arguments will be described within the box. Additionally, whenever you are learning a new function, we encourage you to search Google for examples, as well as review the official documentation for the function at rdocumentation.org.
Tip
You can look up the syntax for a function on rdocumentation.org, and/or search Google for helpful examples.
Now let’s learn an actual R function. One commonly used function is length()
, which (as you might guess) determines the length of an R object. This function uses the following syntax:
Syntax
length(x)
Required arguments
x
: An R object.
We can apply this function to the atomic vectors we created in the previous section to determine how many values each one contains:
length(v4)
[1] 4
length(v5)
[1] 3
length(v6)
[1] 2
length(v7)
[1] 4
Additionally, there is a sum()
function we can use to add up all the values of an atomic vector. This function uses the following syntax:
Syntax
sum(x, na.rm=FALSE)
Required arguments
x
: An R object.
Optional arguments
na.rm
: IfTRUE
, the function will remove any missing values (NA
s) in the atomic vector and sum the non-missing values. IfFALSE
, the function does not removeNA
s and will return a value ofNA
if there is anNA
in the atomic vector.
Note that this will not work for v6
, because there is no logical way to sum characters together. If we apply it to v5
, it will treat the TRUE
values like 1
and the FALSE
values like 0
.
sum(v4)
[1] 14
sum(v5)
[1] 2
sum(v7)
[1] NA
What happened with v7
? Recall that this vector contains a missing value (NA
). If you review the syntax for the sum()
function, you’ll see that it includes an optional parameter na.rm
, which determines how the function treats missing values. If this argument is set to TRUE
, it ignores missing values and calculates the sum of the non-missing values. If it is set to FALSE
, the function will return NA
if there is a single missing value present.
However, in our call to the sum()
function, we didn’t include this argument at all, so what is going on? Optional arguments often have a default value that is used if you fail to specify the desired value explicitly. In this case, the default value is FALSE
, so if we do not specify na.rm=TRUE
in our call to sum()
, the missing values will not be ignored and the function will return NA
.
If we re-run the function but add na.rm = TRUE
, we get the expected result:
sum(v7, na.rm = TRUE)
[1] 27
Similar to sum()
, there are many other functions that can be applied to atomic vectors:
Syntax
mean(x, na.rm=FALSE)
Required arguments
x
: An R object.
Optional arguments
na.rm
: IfTRUE
, the function will ignore any missing values (NA
s) in the atomic vector. IfFALSE
, the function does not ignoreNA
s and will return a value ofNA
if there is anNA
in the atomic vector.
Syntax
min(x, na.rm=FALSE)
& max(x, na.rm=FALSE)
Required arguments
x
: An R object.
Optional arguments
na.rm
: IfTRUE
, the function will ignore any missing values (NA
s) in the atomic vector. IfFALSE
, the function does not ignoreNA
s and will return a value ofNA
if there is anNA
in the atomic vector.
mean(v4)
[1] 3.5
min(v5)
max(v5)
[1] 0
[1] 1
mean(v7, na.rm = TRUE)
[1] 9
Before moving on to the next section, work through the two exercises below.