Optimizations

Author

Marie-Hélène Burle

A lot of hardware is not the answer to poorly written code. Before considering parallelization, you should think about ways to optimize your code sequentially.

Why?

Not all code can be parallelized.
Parallelization is costly (waiting time to access a cluster or money).
The optimization of the sequential code will also benefit the parallel code.

In many cases, writing better code will save you more computing time than parallelization.

In this section, we will cover several principles by playing with the programmatic implementation of the fizz buzz game.

Toy example

Fizz buzz is a children game to practice divisions. Players take turn counting out loud while replacing:

any number divisible by 3 with the word “Fizz”,
any number divisible by 5 with the word “Buzz”,
any number divisible by both 3 and 5 with the word “FizzBuzz”.

Let’s write functions to solve the game and time them to draw some general principles about more efficient code.

We will use bench::mark() to benchmark our solutions. Let’s load it:

library(bench)

Pre-allocate memory

In this first function, we create an empty object z of class integer and of length 0 that will hold the result of a loop, then we run the loop and at each iteration, we add a new value to z:

f1 <- function(n) {
  z <- integer()
  for(i in 1:n) {
    if(i %% 3 == 0 && i %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(i %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(i %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- i
    }
  }
  z
}

The second function is very similar, but this time, we create an empty object z of class integer and of length matching the final length z will have after running the loop. This means that we are pre-allocating memory for the full vector before we run the loop instead of growing the vector at each iteration:

f2 <- function(n) {
  z <- integer(n)
  for(i in 1:n) {
    if(i %% 3 == 0 && i %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(i %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(i %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- i
    }
  }
  z
}

Let’s make sure that our functions work by testing it on a small number:

f1(20)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

f2(20)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

Now, let’s benchmark them for a large number:

n <- 1e5
mark(f1(n), f2(n))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f1(n)         136ms    139ms      7.14   16.55MB     25.0
2 f2(n)         122ms    125ms      7.99    1.15MB     28.0

f2() is consistently faster. While in this example the difference is very slight, pre-allocating the object that will hold the result of a loop before running the loop can make a big difference.

Also, note the difference in memory allocation.

Aren’t loops a big ‘no no’ in R?

By now, you might be thinking: “Wait… aren’t loops a big ‘no no’ in R? I’ve always been told that they are slow and that one should always use functional programming! We are talking about optimization in this course and we are using loops?!?”

There are a lot of misconceptions around R loops. They can be very slow if you don’t pre-allocate memory. Otherwise they are almost always faster than functions (the apply() family or the tidyverse equivalent of the purrr::map() family). You can choose to use a functional programming approach for style and readability, but not for speed.

Let’s test it.

First we create a function:

fb <- function(n) {
  if(n %% 3 == 0 && n %% 5 == 0) {
    "FizzBuzz"
  } else if(n %% 3 == 0) {
    "Fizz"
  } else if(n %% 5 == 0) {
    "Buzz"
  } else {
    n
  }
}

Then we pass it through sapply(). We can test that it works on a small number:

sapply(1:20, fb)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

Finally, we compare the timing with that of f2():

mark(f2(n), sapply(1:n, fb))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression           min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2(n)              130ms    138ms      7.30    1.15MB     29.2
2 sapply(1:n, fb)    171ms    178ms      5.46    3.29MB     23.7

As you can see, the loop is faster.

Avoid unnecessary operations

Example 1

Calling z as the last command in our function is the same as calling return(z).

From the R documentation:

If the end of a function is reached without calling return, the value of the last evaluated expression is returned.

Now, what about using print() instead?

f3 <- function(n) {
  z <- integer(n)
  for(i in 1:n) {
    if(i %% 3 == 0 && i %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(i %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(i %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- i
    }
  }
  print(z)
}

Let’s benchmark it against f2():

mark(f2(n), f3(n))

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"     "Fizz"     "22"       "23"       "Fizz"    
[25] "Buzz"     "26"       "Fizz"     "28"       "29"       "FizzBuzz"
[31] "31"       "32"       "Fizz"     "34"       "Buzz"     "Fizz"    
[37] "37"       "38"       "Fizz"     "Buzz"     "41"       "Fizz"
...

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2(1e+05)  151.88ms 157.65ms     6.30     1.25MB     29.9
2 f3(1e+05)     3.25s    3.25s     0.308    1.04GB     26.8

What happened?

print() returns its argument, but it additionally prints it to the standard output. This is why the mark() function printed the output of f3() before printing the timings.

As you can see, printing takes a long time.

The code in this website is run by Quarto. Since, by default, RStudio will only print the first 1,000 results, the timing you will get for f3() in RStudio will be much less bad as it won’t include the time it takes to print the remaining 99,000 results.

If you are evaluating f2() on its own (e.g. f2(20)), the returned result will also be printed to standard output and both functions will be equivalent. However, if you are using the function in another context, printing becomes an unnecessary and timely operation and f3() would be a very bad option. f3() is thus not a good function.

Here is an example in which f3() would perform a totally unnecessary operation that f2() avoids:

a <- f2(20)

No unnecessary printing.

a <- f3(20)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

Unnecessary printing.

For 1e5, the difference in timing between running an unnecessary printing vs not is a factor of 21!

Even worse would be to use:

f4 <- function(n) {
  for(i in 1:n) {
    if(i %% 3 == 0 && i %% 5 == 0) {
      print("FizzBuzz")
    } else if(i %% 3 == 0) {
      print("Fizz")
    } else if(i %% 5 == 0) {
      print("Buzz")
    } else {
      print(i)
    }
  }
}

Here the difference in timing is a factor of 50…

Example 2

One modulo operation and equality test can be removed by replacing i %% 3 == 0 && i %% 5 == 0 by i %% 15 == 0. The difference isn’t huge, but there is a slight speedup:

f5 <- function(n) {
  z <- integer(n)
  for(i in 1:n) {
    if(i %% 15 == 0) {
      z[i] <- "FizzBuzz"
    } else if(i %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(i %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- i
    }
  }
  z
}

mark(f2(n), f5(n))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2(n)         166ms    167ms      5.81    1.15MB     25.2
2 f5(n)          99ms    101ms      9.92    1.22MB     33.7

Example 3

Louis Arsenault-Mahjoubi—who attended this workshop—found ways to get rid of several operations and get a speedup of 1.7 over f2().

First, we can assign 1:n to z instead of pre-allocating memory with an empty vector, thus rendering the assignment of i to z[i] unnecessary in the last else statement:

f_louis1 <- function(n) {
  z <- 1:n
  for(i in z) {
    if(i %% 3 == 0 && i %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(i %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(i %% 5 == 0) {
      z[i] <- "Buzz"
    } 
  }
  z
}

This function works:

f_louis1(20)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

… and is faster (speedup of 1.3):

mark(f5(n), f_louis1(n))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression       min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f5(n)          107ms    109ms      9.18    1.15MB     34.9
2 f_louis1(n)     77ms     78ms     12.8     1.15MB     40.2

Then, we can prevent the repetitions of the modulo operations and equality tests by saving them to variables:

f_louis2 <- function(n) {
  z <- 1:n
  for(i in z) {
    div3 <- (i %% 3 == 0)
    div5 <- (i %% 5 == 0)
    if(div3 && div5) {
      z[i] <- "FizzBuzz"
    } else if(div3) {
      z[i] <- "Fizz"
    } else if(div5) {
      z[i] <- "Buzz"
    } 
  }
  z
}

This gets us an even greater speedup of 1.7:

mark(f5(n), f_louis2(n))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression       min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f5(n)        106.4ms  110.9ms      9.09    1.15MB     34.5
2 f_louis2(n)   63.3ms   64.5ms     15.4     1.21MB     36.7

But it gets even better: we can actually get rid of the for loop!

f_louis3 <- function(n) {
  z <- 1:n
  div3 <- (z %% 3 == 0)
  div5 <- (z %% 5 == 0)
  z[which(div3)] <- "Fizz"
  z[which(div5)] <- "Buzz"
  z[which(div3 & div5)] <- "FizzBuzz"
  z
}

Now we get a speedup of 5.5 compared to our best f2 function:

mark(f5(n), f_louis3(n))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression       min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f5(n)        100.1ms  110.1ms      8.65    1.15MB    31.1 
2 f_louis3(n)   18.5ms   19.2ms     49.7     5.19MB     7.65

You can ensure that we still get the same result:

f_louis3(20)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

Replace costly operations where possible

Now imagine that we have a dataframe called dat with a first column called datvar filled with integers.

We want to write a function that will accept our dataframe as argument and play the fizz buzz game on the column datvar.

One could imagine the following function:

f6 <- function(dat) {
  z <- integer(length(dat[[1]]))
  for(i in seq_along(dat[[1]])) {
    if(dat[[1]][i] %% 3 == 0 && dat[[1]][i] %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(dat[[1]][i] %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(dat[[1]][i] %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- dat[[1]][i]
    }
  }
  z
}

Indexing a column from a dataframe in this fashion is a very costly operation. It is much more efficient to index with the name of the column:

f7 <- function(dat) {
  z <- integer(length(dat$datvar))
  for(i in seq_along(dat$datvar)) {
    if(dat$datvar[i] %% 3 == 0 && dat$datvar[i] %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(dat$datvar[i] %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(dat$datvar[i] %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- dat$datvar[i]
    }
  }
  z
}

Now, let’s create a random dataframe to benchmark f6() and f7():

set.seed(123)
dat <- data.frame(datvar = round(runif(n, 1, n)))
mark(f6(dat), f7(dat))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f6(dat)       1.75s    1.75s     0.571    1.26MB     28.6
2 f7(dat)    336.54ms 339.05ms     2.95     1.26MB     26.5

Avoid repetitions of costly operations

This made a big difference (speedup of 5), but notice that we are indexing the column 6 times in our function. Let’s remove the repetition of this operation:

f8 <- function(dat) {
  var <- dat$datvar
  z <- integer(length(var))
  for(i in seq_along(var)) {
    if(var[i] %% 3 == 0 && var[i] %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(var[i] %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(var[i] %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- var[i]
    }
  }
  z
}

Let’s benchmark all 3 versions:

mark(f6(dat), f7(dat), f8(dat))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 3 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f6(dat)       1.66s    1.66s     0.603    1.15MB     29.0
2 f7(dat)    331.71ms 333.81ms     3.00     1.15MB     27.0
3 f8(dat)    128.37ms 128.79ms     7.67     1.25MB     19.2

f8() gave us another speedup of almost 3 over f7(). f8() runs 14 times faster than our initial function!

Indexing from a vector isn’t costly. There is thus no advantage at removing the repetition of that operation.

Your turn:

Show that this last statement is true.