The algorithm here, fast doubling, is sequential (there is no parallelism) and is quite fast, with less than a hundred operations to arrive to the final result. This runs in nanosecond on a CPU and the benchmark in the article is mostly measuring the communication time rather than actual computation time.
> Using the power of GPUs we are able to calculate F99999999 in only 17 milliseconds on my Consumer GPU
FWIW, on my 2020 era laptop, the following:
#include <stdio.h>
int main(void) {
int a = 1, b = 1, c, n = 99999999;
while (--n) {
c = (a+b) % 9837;
a = b;
b = c;
}
printf("%d\n", a);
return 0;
}
takes 330 ms measured as "time a.out".The problem with Fibonacci is there are algorithmic ways to speed things up. The following (using the method at https://en.wikipedia.org/wiki/Fibonacci_sequence to compute Fibonacci numbers recursively in O(log n) arithmetic operations) takes Python 0.1 ms to compute the same value:
import functools
@functools.cache
def F(n):
if n <= 2:
return 1
m = n // 2 # rounds down
if n % 2:
# odd
return (F(m+1)**2 + F(m)**2) % 9837
else:
return ((2*F(m+1) - F(m))*F(m)) % 9837
print(F(99999999))
They are calculating F(99999999) mod 9837. This can be done by 54 matrix multiplications of 2x2 matrices. It takes nanoseconds on a CPU. There's absolutely no point in using the GPU for this.