library(bench)
Optimizations
A lot of hardware is not the answer to poorly written code. Before considering parallelization, you should think about ways to optimize your code sequentially.
Why?
- Not all code can be parallelized.
- Parallelization is costly (waiting time to access a cluster or money).
- The optimization of the sequential code will also benefit the parallel code.
In many cases, writing better code will save you more computing time than parallelization.
In this section, we will cover several principles by playing with the programmatic implementation of the fizz buzz game.
Toy example
Fizz buzz is a children game to practice divisions. Players take turn counting out loud while replacing:
- any number divisible by 3 with the word “Fizz”,
- any number divisible by 5 with the word “Buzz”,
- any number divisible by both 3 and 5 with the word “FizzBuzz”.
Let’s write functions to solve the game and time them to draw some general principles about more efficient code.
We will use bench::mark()
to benchmark our solutions. Let’s load it:
Pre-allocate memory
In this first function, we create an empty object z
of class integer and of length 0
that will hold the result of a loop, then we run the loop and at each iteration, we add a new value to z
:
<- function(n) {
f1 <- integer()
z for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(i %% 3 == 0) {
} <- "Fizz"
z[i] else if(i %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- i
z[i]
}
}
z }
The second function is very similar, but this time, we create an empty object z
of class integer and of length matching the final length z
will have after running the loop. This means that we are pre-allocating memory for the full vector before we run the loop instead of growing the vector at each iteration:
<- function(n) {
f2 <- integer(n)
z for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(i %% 3 == 0) {
} <- "Fizz"
z[i] else if(i %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- i
z[i]
}
}
z }
Let’s make sure that our functions work by testing it on a small number:
f1(20)
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
f2(20)
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
Now, let’s benchmark them for a large number:
<- 1e5
n mark(f1(n), f2(n))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f1(n) 136ms 139ms 7.14 16.55MB 25.0
2 f2(n) 122ms 125ms 7.99 1.15MB 28.0
f2()
is consistently faster. While in this example the difference is very slight, pre-allocating the object that will hold the result of a loop before running the loop can make a big difference.
Also, note the difference in memory allocation.
Aren’t loops a big ‘no no’ in R?
By now, you might be thinking: “Wait… aren’t loops a big ‘no no’ in R? I’ve always been told that they are slow and that one should always use functional programming! We are talking about optimization in this course and we are using loops?!?”
There are a lot of misconceptions around R loops. They can be very slow if you don’t pre-allocate memory. Otherwise they are almost always faster than functions (the apply()
family or the tidyverse equivalent of the purrr::map()
family). You can choose to use a functional programming approach for style and readability, but not for speed.
Let’s test it.
First we create a function:
<- function(n) {
fb if(n %% 3 == 0 && n %% 5 == 0) {
"FizzBuzz"
else if(n %% 3 == 0) {
} "Fizz"
else if(n %% 5 == 0) {
} "Buzz"
else {
}
n
} }
Then we pass it through sapply()
. We can test that it works on a small number:
sapply(1:20, fb)
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
Finally, we compare the timing with that of f2()
:
mark(f2(n), sapply(1:n, fb))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f2(n) 130ms 138ms 7.30 1.15MB 29.2
2 sapply(1:n, fb) 171ms 178ms 5.46 3.29MB 23.7
As you can see, the loop is faster.
Avoid unnecessary operations
Example 1
Calling z
as the last command in our function is the same as calling return(z)
.
From the R documentation:
If the end of a function is reached without calling return, the value of the last evaluated expression is returned.
Now, what about using print()
instead?
<- function(n) {
f3 <- integer(n)
z for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(i %% 3 == 0) {
} <- "Fizz"
z[i] else if(i %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- i
z[i]
}
}print(z)
}
Let’s benchmark it against f2()
:
mark(f2(n), f3(n))
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz" "Fizz" "22" "23" "Fizz"
[25] "Buzz" "26" "Fizz" "28" "29" "FizzBuzz"
[31] "31" "32" "Fizz" "34" "Buzz" "Fizz"
[37] "37" "38" "Fizz" "Buzz" "41" "Fizz"
...
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f2(1e+05) 151.88ms 157.65ms 6.30 1.25MB 29.9
2 f3(1e+05) 3.25s 3.25s 0.308 1.04GB 26.8
What happened?
print()
returns its argument, but it additionally prints it to the standard output. This is why the mark()
function printed the output of f3()
before printing the timings.
As you can see, printing takes a long time.
The code in this website is run by Quarto. Since, by default, RStudio will only print the first 1,000 results, the timing you will get for f3()
in RStudio will be much less bad as it won’t include the time it takes to print the remaining 99,000 results.
If you are evaluating f2()
on its own (e.g. f2(20)
), the returned result will also be printed to standard output and both functions will be equivalent. However, if you are using the function in another context, printing becomes an unnecessary and timely operation and f3()
would be a very bad option. f3()
is thus not a good function.
Here is an example in which f3()
would perform a totally unnecessary operation that f2()
avoids:
<- f2(20) a
No unnecessary printing.
<- f3(20) a
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
Unnecessary printing.
For 1e5, the difference in timing between running an unnecessary printing vs not is a factor of 21!
Even worse would be to use:
<- function(n) {
f4 for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
print("FizzBuzz")
else if(i %% 3 == 0) {
} print("Fizz")
else if(i %% 5 == 0) {
} print("Buzz")
else {
} print(i)
}
} }
Here the difference in timing is a factor of 50…
Example 2
One modulo operation and equality test can be removed by replacing i %% 3 == 0 && i %% 5 == 0
by i %% 15 == 0
. The difference isn’t huge, but there is a slight speedup:
<- function(n) {
f5 <- integer(n)
z for(i in 1:n) {
if(i %% 15 == 0) {
<- "FizzBuzz"
z[i] else if(i %% 3 == 0) {
} <- "Fizz"
z[i] else if(i %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- i
z[i]
}
}
z
}
mark(f2(n), f5(n))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f2(n) 166ms 167ms 5.81 1.15MB 25.2
2 f5(n) 99ms 101ms 9.92 1.22MB 33.7
Example 3
Louis Arsenault-Mahjoubi—who attended this workshop—found ways to get rid of several operations and get a speedup of 1.7 over f2()
.
First, we can assign 1:n
to z
instead of pre-allocating memory with an empty vector, thus rendering the assignment of i
to z[i]
unnecessary in the last else statement:
<- function(n) {
f_louis1 <- 1:n
z for(i in z) {
if(i %% 3 == 0 && i %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(i %% 3 == 0) {
} <- "Fizz"
z[i] else if(i %% 5 == 0) {
} <- "Buzz"
z[i]
}
}
z }
This function works:
f_louis1(20)
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
… and is faster (speedup of 1.3):
mark(f5(n), f_louis1(n))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f5(n) 107ms 109ms 9.18 1.15MB 34.9
2 f_louis1(n) 77ms 78ms 12.8 1.15MB 40.2
Then, we can prevent the repetitions of the modulo operations and equality tests by saving them to variables:
<- function(n) {
f_louis2 <- 1:n
z for(i in z) {
<- (i %% 3 == 0)
div3 <- (i %% 5 == 0)
div5 if(div3 && div5) {
<- "FizzBuzz"
z[i] else if(div3) {
} <- "Fizz"
z[i] else if(div5) {
} <- "Buzz"
z[i]
}
}
z }
This gets us an even greater speedup of 1.7:
mark(f5(n), f_louis2(n))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f5(n) 106.4ms 110.9ms 9.09 1.15MB 34.5
2 f_louis2(n) 63.3ms 64.5ms 15.4 1.21MB 36.7
But it gets even better: we can actually get rid of the for loop!
<- function(n) {
f_louis3 <- 1:n
z <- (z %% 3 == 0)
div3 <- (z %% 5 == 0)
div5 which(div3)] <- "Fizz"
z[which(div5)] <- "Buzz"
z[which(div3 & div5)] <- "FizzBuzz"
z[
z }
Now we get a speedup of 5.5 compared to our best f2
function:
mark(f5(n), f_louis3(n))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f5(n) 100.1ms 110.1ms 8.65 1.15MB 31.1
2 f_louis3(n) 18.5ms 19.2ms 49.7 5.19MB 7.65
You can ensure that we still get the same result:
f_louis3(20)
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
Replace costly operations where possible
Now imagine that we have a dataframe called dat
with a first column called datvar
filled with integers.
We want to write a function that will accept our dataframe as argument and play the fizz buzz game on the column datvar
.
One could imagine the following function:
<- function(dat) {
f6 <- integer(length(dat[[1]]))
z for(i in seq_along(dat[[1]])) {
if(dat[[1]][i] %% 3 == 0 && dat[[1]][i] %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(dat[[1]][i] %% 3 == 0) {
} <- "Fizz"
z[i] else if(dat[[1]][i] %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- dat[[1]][i]
z[i]
}
}
z }
Indexing a column from a dataframe in this fashion is a very costly operation. It is much more efficient to index with the name of the column:
<- function(dat) {
f7 <- integer(length(dat$datvar))
z for(i in seq_along(dat$datvar)) {
if(dat$datvar[i] %% 3 == 0 && dat$datvar[i] %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(dat$datvar[i] %% 3 == 0) {
} <- "Fizz"
z[i] else if(dat$datvar[i] %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- dat$datvar[i]
z[i]
}
}
z }
Now, let’s create a random dataframe to benchmark f6()
and f7()
:
set.seed(123)
<- data.frame(datvar = round(runif(n, 1, n)))
dat mark(f6(dat), f7(dat))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f6(dat) 1.75s 1.75s 0.571 1.26MB 28.6
2 f7(dat) 336.54ms 339.05ms 2.95 1.26MB 26.5
Avoid repetitions of costly operations
This made a big difference (speedup of 5), but notice that we are indexing the column 6 times in our function. Let’s remove the repetition of this operation:
<- function(dat) {
f8 <- dat$datvar
var <- integer(length(var))
z for(i in seq_along(var)) {
if(var[i] %% 3 == 0 && var[i] %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(var[i] %% 3 == 0) {
} <- "Fizz"
z[i] else if(var[i] %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- var[i]
z[i]
}
}
z }
Let’s benchmark all 3 versions:
mark(f6(dat), f7(dat), f8(dat))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 3 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f6(dat) 1.66s 1.66s 0.603 1.15MB 29.0
2 f7(dat) 331.71ms 333.81ms 3.00 1.15MB 27.0
3 f8(dat) 128.37ms 128.79ms 7.67 1.25MB 19.2
f8()
gave us another speedup of almost 3 over f7()
. f8()
runs 14 times faster than our initial function!
Indexing from a vector isn’t costly. There is thus no advantage at removing the repetition of that operation.
Your turn:
Show that this last statement is true.