“Looping”/“Cycling”/“Iterating” is a very helpful way to automate a multi-step process by organizing sequences of activities by grouping the parts that need to be repeated. In R, there are 3 types of loops: ‘repeat’, ‘while’ and ‘for’.
The easiest loop among the 3. All it does is execute the same code over and over again until you ask it to stop. In other languages, it often goes by the name do while, or something similar. In general, we want our code to complete before the end of the world so that it is possible to break out of the infinite loop by including a break statement. Sometimes, rather than breaking out of the loop we just want to skip the rest of the current iteration and start the next iteration:
x <- 20 repeat{ print(x) x = x+1 if (x==30){ break } } #Result [1] 20 [1] 21 [1] 22 [1] 23 [1] 24 [1] 25 [1] 26 [1] 27 [1] 28 [1] 29
As you can see in the above code snippet that the ‘repeat’ loop whose block is executed at least once and that will terminate whenever the ‘if’ condition is verified. The ‘break’ clause helps us exit or interrupt the cycles within loops.
‘while’ loops are more like backward repeat loops. Instead of executing some code and then checking to see if the loop should end or not, this type of loops check first and then (maybe) execute. Since the check happens at the very beginning, it is possible that the contents of the loop will never be executed (unlike in a repeat loop).
Same results as above will be obtained using the ‘while’ loop as we got from the example of the ‘repeat’ loop.
i <- 20 while (i < 30) { print(i) i = i+1 }
In general, it is always possible to convert a ‘repeat’ loop to a ‘while’ loop or a ‘while’ loop to a ‘repeat’ loop, but usually the syntax is much cleaner one way or the other. If you know that the contents must execute at least once, use repeat; otherwise, use while.
The third type of loop is to be used when someone knows exactly how many times you want the code to repeat. The for loop accepts an iterator variable and a vector. It repeats the loop, giving the iterator each element from the vector in turn. In the simplest case, the vector contains integers:
x <- c(1,9,3,5,8,7,2) count <- 0 for (val in x) { if(val %% 2 == 0) count = count+1 } print(count) [1] 2
In the above example, the loop iterates 7 times as the vector x has got 7 elements. In each iteration, the variable takes on the value of the corresponding element of x. Here we have used a counter to count the number of even numbers in x. We can see that x contains 2 (2 and 8) even numbers.
When the R interpreter encounters a break, it will pass control to the instruction immediately after the end of the loop (if any). In the case of nested loops, the break will allow exit only from the innermost loop in the section.
# Make a lower triangular matrix (zeroes in upper right corner) m=5 n=5 # A counter to count the assignment ctr=0 # Create a 5 x 5 matrix with zeroes mat = matrix(0,m,n) for(i in 1:m) { for(j in 1:n) { if(i==j) { break; } else { # you assign the values only when i<>j mat[i,j] = i*j ctr=ctr+1 } } print(i*j) } # Result [1] 1 [1] 4 [1] 9 [1] 16 [1] 25 # Print how many matrix cell were assigned print(ctr) #Result [1] 10
The above code snippet defines an m x n (5 x 5) matrix of zeros and then enters a nested for loop to fill the locations of the matrix, but only if the two indexes differ. The purpose was to create a lower triangular matrix, that is a matrix whose elements below the main diagonal are non-zero. The others are left untouched to their initialized zero value. When the indexes are equal and thus the condition in the inner loop, which runs over the column index ‘j’ is fulfilled, a ‘break’ command is executed and the innermost loop is interrupted with a direct jump to the instruction following the inner loop. This instruction is to print(). Then, control gets to the outer for condition (over the rows, index ‘i’), which is evaluated again. If the indexes differ, the assignment is performed and the counter is incremented by 1. In the end, the program prints the counter ‘ctr', which contains the number of elements that were assigned.
‘next’ also discontinues a particular iteration and shift to the next cycle of operation. In other languages, you may find the (slightly confusing) equivalent called “continue”, which means the same: wherever you are, upon the verification of the condition, jump to the evaluation of the loop.
m=5 for (k in 1:m){ if (!k %% 2) next print(k) } [1] 1 [1] 3 [1] 5
An if-else statement is a very powerful tool to return output based on a condition. In R. Let’s think about a scenario where, for a transition data for a product, we have the information for the number units sold daily for say last 5years and we want to dig deeper and check how many days are there where the number of units sold is between 50 and 70 and for any day, it the value is higher than 70, we mark it as an exceptional day. The syntax would look something like this:
# Create vector quantity # Create vector quantity quantity <- 100 # Create multiple condition statement if (quantity <50) { print('Not enough for today') } else if (quantity > 50 &quantity <= 60) { print('Average day') } else { print('Great day!') } #Results [1] "Great day!"
Sometimes, you might end up writing a very big nested if-then-else conditions for a query and that might create some challenges if anything goes wrong inside that query. One effective solution would be to use ‘switch()’ function. It allows you to evaluate the selected code based on the position or name:
function(x, y, op) { switch(op, add = x + y, sub = x - y, mul = x * y, div = x / y, stop("Unknown operation!") ) }
Loops are very handy options for any repetitive operations and you just need to specify how many times or which conditions you would like operation to repeat itself. you assign initial values to a control loop variable, perform the loop and then, once the loop has finished, you typically do something with the results.
For loops are not as important in R as they are in other languages because R is a functional programming language. This means that it’s possible to wrap up for loops in a function and call that function instead of using the for loop directly.
There are a few limitations that any practitioner would highlight about ‘for’ loops or other types of the loop is that they are slow in operations (which is a fact though even after recent modifications!). Another major issue with the loop is that they are not very expressive. “A for loop conveys that it’s iterating over something, but doesn’t clearly convey a high-level goal. Instead of using a for loop, it’s better to use a functional. Each functional is tailored for a specific task, so when you recognize the functional you know immediately why it’s being used. Functionals play other roles as well as replacements for for-loops. They are useful for encapsulating common data manipulation tasks like split-apply-combine, for thinking “functionally”, and for working with mathematical functions” [Ref: “Advanced R” by Hadley Wickham]
Vectorization: Vectorization is the process of converting repeated operations on simple numbers (referred to as ‘scalers’ ) into single operations on vectors or matrices. Now, a vector is the elementary data structure in R and is “a single entity consisting of a collection of things”, according to the R base manual. If you combine vectors (of the same length), you obtain a matrix. You can do this vertically or horizontally, using different R instructions. Thus in R, a matrix is seen as a collection of horizontal or vertical vectors. By extension, you can vectorize repeated operations on vectors. So in that connection, many of the above operations can be made implicit by using vectorization.
The apply() family - In R, a very powerful and rich family of functions which is made of intrinsically vectorized functions, is the apply() functions. “The apply command or rather family of commands pertain to the R base package. It is populated with a number of functions (the [s, l, m, r, t, v]apply) to manipulate slices of data in the form of matrices or arrays in a repetitive way, allowing to cross or traverse the data and avoiding explicit use of loop constructs. The functions act on an input matrix or array and apply a chosen named function with one or several optional arguments.
“apply (): ‘apply()’ can be used to apply a function to a matrix. For example:
x <- 20 repeat{ print(x) x = x+1 if (x==30){ break } }
In this example, first we are creating a matrix of values generated using random numbers and then we are performing various operations using ‘apply’ functional
Let’s have a look at the results of both ‘lapply’ and ‘sapply’ functionals:
#Let's create a matrix mat_new <- matrix(data=cbind(rnorm(20, 0), rnorm(20, 2), rnorm(20, 5)), nrow=20, ncol=3) mat_new #First few records from the derived matrix [,1] [,2] [,3] [1,] -0.96051550 3.1468613 6.072214 [2,] -1.39166772 2.9056725 5.722543 [3,] 0.88049546 4.2234216 4.839496 [4,] 0.17057773 2.8729929 7.126668 [5,] -0.46655639 1.3653404 4.300621 [6,] 0.84594859 1.9774440 4.281742 #Let's apply apply function to calculate the row-wise means apply(mat_new, 1, mean) #Results [1] 2.752853 2.412182 3.314471 3.390080 1.733135 2.368378 2.761731 2.660597 2.175851 [10] 2.960857 1.782839 1.752489 1.453420 2.243107 2.191123 2.182224 2.444636 2.415256 [19] 3.026896 1.851031 #Let's apply apply function to calculate the column-wise means apply(mat_new, 2, mean) #Results [1] -0.1371763 2.3060624 5.0120875 # Let's find out how many negative numbers each column has got apply(mat_new,2, function(y) length(y[y<0])) #Results [1] 11 1 0 #Let's get the mean of the positive values in the matrix apply(mat_new, 2, function(y) mean(y[y>0])) #Results [1] 0.6629146 2.4402103 5.0120875
data_apply <- matrix(c(1:20, 11:30), nrow = 5, ncol = 4) data_apply #Result [,1] [,2] [,3] [,4] [1,] 1 6 11 16 [2,] 2 7 12 17 [3,] 3 8 13 18 [4,] 4 9 14 19 [5,] 5 10 15 20 # Now we can use the apply function to find the mean/median of each row as follows apply(data_apply, 1, mean) #Result [1] 8.5 9.5 10.5 11.5 12.5
tapply() : tapply() basically splits the array based on any data, usually at factor level and then applies the functions to it:
We will be using the ‘mtcars’ dataset:
library(datasets) tapply(mtcars$wt, mtcars$cyl, median) 4 6 8 2.200 3.215 3.755
The ‘tapply’ function first groups the cars together based on the number of cylinders they have and then calculate the median weight for each group.
mapply() : ‘mapply()’ is a multivariate version of sapply. It will apply the specified function to the first element of each argument first, followed by the second element, and so on. For example:
It adds 1 with 11, 2 with 12, and so on.
x <- 1:10 y <- 11:20 mapply(sum, x, y) [1] 12 14 16 18 20 22 24 26 28 30
Thanks for this info.
C# is an object-oriented programming developed by Microsoft that uses ...
Leave a Reply
Your email address will not be published. Required fields are marked *