BTW I did make the full 64-bit version, using 64-bit registers to go up to a few hundred billion, and including the necessary large-number algos. Best I could do at higher numbers (primes up to 100 billion for instance) was 50% slower than Chris Wallich's. So I gave up on figuring it out myself, and studied his code. He uses various clever tricks to get tiny improvements (5% per trick or so). Every one requires quite a lot of code.
So I asked myself, "Self, do you want to spend months implementing these PITA algo's, just to speed up prime number generation by a few seconds?" - the answer came back, "No".
Anyway if anyone cares I could post this "full" 64-bit version. It actually does the 2 billion even faster than previous, less than 110 ms (so much for the theory that 64-bit is always slower than 32!). Plus a few other minor improvements.
Actually I hope you don't want it because it's been a long time since I looked at it and it'll be some trouble to remember how it works ... but it would give me a project to ease back into programming (took a few months off to smell the roses), and of course I'm always glad to be of use.