Multithreading
Now that I have taken single-core counting as far as I can (which is a little
less than 100 times the counting speed of a naive for
loop), it is time to
leverage all the cores of my CPU for absolute best-in-class leading counting
performance.
On the surface, that sounds easy enough:
#![allow(unused)] fn main() { pub fn thread_basic(target: u64, sequential: impl Fn(u64) -> u64 + Sync) -> u64 { let num_threads = std::thread::available_parallelism() .map(usize::from) .unwrap_or(2); let base_share = target / num_threads as u64; let extra = target % num_threads as u64; let sequential = &sequential; std::thread::scope(|s| { let mut threads = Vec::with_capacity(num_threads); for thread in 0..num_threads as u64 { let share = base_share + (thread < extra) as u64; threads.push(s.spawn(move || sequential(share))); } threads.into_iter().map(|t| t.join().unwrap()).sum() }) } }
But this version doesn’t perform super impressively, only achieving a 4.8x speed-up on 8 CPU cores even when counting to a very high 69 billion limit. Surely we can do better than that.