Multithreading

Now that I have taken single-core counting as far as I can (which is a little less than 100 times the counting speed of a naive for loop), it is time to leverage all the cores of my CPU for absolute best-in-class leading counting performance.

On the surface, that sounds easy enough:

#![allow(unused)]
fn main() {
pub fn thread_basic(target: u64, sequential: impl Fn(u64) -> u64 + Sync) -> u64 {
    let num_threads = std::thread::available_parallelism()
        .map(usize::from)
        .unwrap_or(2);

    let base_share = target / num_threads as u64;
    let extra = target % num_threads as u64;
    let sequential = &sequential;

    std::thread::scope(|s| {
        let mut threads = Vec::with_capacity(num_threads);
        for thread in 0..num_threads as u64 {
            let share = base_share + (thread < extra) as u64;
            threads.push(s.spawn(move || sequential(share)));
        }
        threads.into_iter().map(|t| t.join().unwrap()).sum()
    })
}
}

But this version doesn’t perform super impressively, only achieving a 4.8x speed-up on 8 CPU cores even when counting to a very high 69 billion limit. Surely we can do better than that.