The MASM Forum
Microsoft 64 bit MASM => Tools & Toys => Topic started by: hutch on April 14, 2019, 01:02:21 PM

I can't find anything else to do on this release version, I have added command line options that has a user defined output file name and the options of 2 to 10 XOR passes against the original pad. The first argument is a user defined integer that is multiplied by 1 million to set the pad size. The arguments are as follows.
SYNTAX : arg1 = integer in the following form
1 = 1 million bytes, 2 = 2 million bytes, etc ....
arg2 = The file name with no spaces
arg3 = The number of XOR passes, 2 to 10
The public domain application ENT is included in the zip file and when rpg.exe is run, it calls ENT to immediately test the randomness of the output. Preferred range is when the CHI SQUARED is between 25% and 75% but the reference material that you find with ENT says 10% to 90% are OK.
The random output is usually OK but as with proper random you get some results outside the 10  90% range which you should discard. The real work with this tool is to make the pad non reproducable and its done with a massive level of complexity to make unwinding its construction near impossible. This addresses the potential of a single DWORD pad being broken by a brute force attack on very high powered computer hardware by simply running the full DWORD range.
The internal pad generator uses 5 identical pseudo random generators that construct each pad at randomly variable lengths and each pad is then XORRED against another random pad generated in the same manner. Multiple reseeding occurs with each of the 5 random generators to break any patterns that may emerge with the random generators. The basic random generator is a custom modified shiftxor type that exhibits good randomness in a single pass.
It has been tested on win10 64 bit up to 1.5 billion byte pad sizes, above that you run into memory allocation failures. On older machines you will not be able to build pads of that size as the limit is determined by available memory.

Please don't do this; there are standard functions for generating cyptographically strong random data  I'll create an example if necessary.
Also, XORing multiple times is no more secure than XORing once, assuming your data is sufficiently random (and if it's not, this is exactly why you should be using functions that do produce sufficiently random data.)

Hi Tedd,
Good to hear from you. I am a testing person myself so I have ENT attached to immediately check the output but the real reason for the tool is to make reproducing the pad close enough to impossible. The random does not improve after the first pass but the capacity to unwind the method of creating the pad increases in complexity with each pass given that each pass has pseudo random length variation as it is being constructed.
As you would be aware, with DWORD size keys, even if the quality of randomness is very high, you can brute force the data by exhausting the DWORD range and with high powered computers, this does not take all that long. The other factor with using a crude XOR of the pad to the data is the is no indicator of success which makes automated encryption cracking a far more difficult task.
I have long ago written test code for RDRAND and the seed creation in later hardware but did not like the tested randomness but I am also aware of potential backdoors in the standard Intel method where the mechanism I have used cannot have a backdoor put into it.

XOR encryption with a onetime random pad is 100% secure (uncrackable) as long as two conditions are met:
 the random pad is unpredictable, i.e. sufficiently random (ideally, truly random); and
 the pad is at least the length of the data being encrypted (meaning there will be no repeats).
We could argue over whether your random data is random enough, but from a quick look it's pseudorandom based on the current timestamp; so, given that you've provided a working program, the random data is as predictable as the timestamp.
On the second point, you're still producing a repeating pattern (a longer one, but still repeating.)
As a simple example, if we use a key of length 5 for the first pass, and length 7 for the second pass, the end result is identical to XORing with a single repeating key of length 35 (an XOR of the 5key and 7key until they line up; note this doesn't always multiply  keys of lengths 5 and 10 only produce an effective key of length 10.)
As you add more passes, you can increase the effective length of the key used, but it's always going to repeat eventually; so there really is no difference to XORing once with a random key of the same effective length. The only appropriate length is: not less than the length of the data you're encrypting. Also, don't reuse the pad to encrypt anything else, as that's a repeat.
There are standard ways to extract the effective key length used, and after that it's purely a matter of using minimal knowledge of the encrypted data's format to extract parts of the key, which becomes increasingly easy with the number of repeats; so the only safe number of repeats is 0.

The basic seeding design is as follows.
rdtsc ; get number
bswap eax ; reverse byte order
mov esi, eax ; write 1st result into esi
cpuid ; clear cache
pause ; spinlock delay
mov edi, 4 ; load the loop counter
lpst:
rdtsc ; get number
bswap eax ; reverse byte order
xor esi, eax ; xor eax to esi
cpuid ; produce a delay between samples
pause ; spinlock delay
sub edi, 1 ; decrement counter
jnz lpst ; loop back if not zero
The reason for the bswap is to get the fast changing end of the integer returned by rdtsc, the following loop and the delays from cpuid and pause are designed to capitalise on the fact that no 2 machines have identical timings and/or operations. This is used for multiple reseeding at different parts of the pad generation. Each pass uses randomly different lengths of reseeding to break up any possible patterns from the 5 individually seeded pseudo random algos.
The purpose of the additional complexity is to make reconstructing the pad close to impossible, it breaks brute force attacks on the DWORD range as the pad is not created by a single DWORD, the further complexity is to break partial cracking by multiple passes at the same data and then correlating the partials.
We agree on your points 1 and 2
the random pad is unpredictable, i.e. sufficiently random (ideally, truly random); and
the pad is at least the length of the data being encrypted (meaning there will be no repeats).
I would add that the pad cannot be reused but if you create a pad big enough you can keep using different offsets for a very long time with no risk. The main pseudo random generators are a modified shift/xor design that seems to generate reasonable random sequences and by sequencing the output it passes the available tests I have available with ENT. Always the hardest to get right is the CHI SQUARED result.

Example of using CryptGenRandom

Seems to work OK and its fast. CHI SQUARED is borderline.
Entropy = 7.999999 bits per byte.
Optimum compression would reduce the size
of this 268435456 byte file by 0 percent.
Chi square distribution for 268435456 samples is 290.43, and randomly
would exceed this value 10.00 percent of the times.
Arithmetic mean value of data bytes is 127.5026 (127.5 = random).
Monte Carlo value for Pi is 3.141953456 (error 0.01 percent).
Serial correlation coefficient is 0.000014 (totally uncorrelated = 0.0).
Here is the comparison.
K:\basic_code_2\rpg>rpg 268 randpad.pad 2
Pass 1
Pass 2
Writing output file
268000000 bytes written to : randpad.pad
Entropy = 7.999999 bits per byte.
Optimum compression would reduce the size
of this 268000000 byte file by 0 percent.
Chi square distribution for 268000000 samples is 247.74, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.4981 (127.5 = random).
Monte Carlo value for Pi is 3.141929241 (error 0.01 percent).
Serial correlation coefficient is 0.000056 (totally uncorrelated = 0.0).
Yours is a lot faster as it does a lot less work. I am aiming at making the pad nonreproducible.

Here is a direct comparison with RDRAND. It was just a test piece and it is unseeded.
Entropy = 7.999999 bits per byte.
Optimum compression would reduce the size
of this 268000000 byte file by 0 percent.
Chi square distribution for 268000000 samples is 246.24, and randomly
would exceed this value 64.16 percent of the times.
Arithmetic mean value of data bytes is 127.4877 (127.5 = random).
Monte Carlo value for Pi is 3.141803957 (error 0.01 percent).
Serial correlation coefficient is 0.000026 (totally uncorrelated = 0.0).
That's all folks
Press any key to exit ....