# The MASM Forum

## General => The Laboratory => Topic started by: Siekmanski on March 07, 2016, 07:41:33 AM

Title: Fast Bit Reversal algorithm
Post by: Siekmanski on March 07, 2016, 07:41:33 AM
Bit-reversal is needed for a radix 2 FFT so, checked the output again.

When I was looking for repeating patterns in my earlier bit-reversal routine, I noticed this:
I could do bit-reversal without reversing bits.

8:   0  4  2  6                   |   1 5 3  7
16: 0  8  4 12 2 10  6 14  |   1 9 5 13 3 11 7 15
32: 0 16 8 24 4 20 12 28 etc. ( every next row is the next power of two )

Second half = first half + 1 and my first thought was, cool, reduce size by 2 and add 1 to get the rest.
To do it this way, you need a lot of memory read\writes so, not an option.
Looked over it a few times and, EUREKA !

I can create a 4*4 matrix to calculate 32 bit-reversals with 1 index ( 16 + 16+1 )
by transposing 0 8 4 12 ( first quarter of length 16 ) to the first column.
Then fill the rest of the columns for a 512 size ( 16 * 32 values from 16 indices )
And it works, the matrix can be used to calculate all powers of two lengths and writes 4 values at once.

Just by only adding the values with 1 and putting them at the right place via the matrix indices I can get all bit-reversals.

It's easy to create a new matrix from the BitreversalMatrix for all other power of two sizes.
By multiplying ( power of two up ) or dividing the BitreversalMatrix ( power of two down ).
Multiplying is of cource done by shifting ( because we are dealing with power of 2 calculations ).

For example size 1024:
Multiply the Matrix by 2 ( 1024 / 512 ) and add 4 for every next 4*4 Matrix step.
For size 2048 then multiply the Matrix by 4 ( 2048 / 512 ) etc.
Why dividing by 512 ?, because the BitreversalMatrix represents length 512.

Next thing to do:
Initialize the BitreversalMatrix for the size you need.
Put the complete BitreversalMatrix into xmm4 - xmm7.
Copy every next row to xmm3 then shuffle each next index to eax ( no memory reads anymore ).
When you have done all 4 rows, add 4 for to every next 4*4 Matrix step.
So you can keep all code in registers and use the complete data cache ( size is also a power of two ) to do fast bit-reversals.
For example if you have a 32 KB L1 data cache, you can do a 8192 size bit-reversal very fast.

I'll code this soon. Decided to post it at this stage because it is easier to explain the algorithm this way.

Some eaxamples to show the logic of creating the 4*4 matrix:

BitreversalMatrix   dd 00,32,16,48
dd 08,40,24,56
dd 04,36,20,52
dd 12,44,28,60

1024 dd 000, 064, 032, 096   ; First  row = BitreversalMatrix * 2 ( 1024 / 512 = 2 )
dd 016, 080, 048, 112   ; Second row = BitreversalMatrix * 2
dd 008, 072, 040, 104   ; Third  row = BitreversalMatrix * 2
dd 024, 088, 056, 120   ; Fourth row = BitreversalMatrix * 2

Next step refresh the Matrix with these new 16 values:

dd 004, 068, 036, 100   ; First  row + 4
dd 020, 084, 052, 116   ; Second row + 4
dd 012, 076, 044, 108   ; Third  row + 4
dd 028, 092, 060, 124   ; Fourth row + 4

512     ==  BitreversalMatrix

256     dd 00, 16, 08, 24       ; First  row = BitreversalMatrix / 2 ( 256 / 512 = 0.5 )
dd 04, 20, 12, 28       ; Second row = BitreversalMatrix / 2

128     dd 00, 08, 04, 12       ; First  row = BitreversalMatrix / 4 ( 128 / 512 = 0.25 )

64       dd 00, 04, 02, 06       ; First  row = BitreversalMatrix / 8 (  64 / 512 = 0.125 )

32       dd 00, 02, 01, 03       ; First  row = BitreversalMatrix / 16 ( 32 / 512 = 0.0625 )

16  and 8 are less than 32 and are therefore special cases, but can be calculated like this,

lea     esi,BitreversalMatrix
lea     edi,BitreversalTable

movdqa  xmm0,oword ptr [esi]

cmp     BitreversalSize,16
je      size_16
cmp     BitreversalSize,8
jne     done
psrld   xmm0,3
movdqa  oword ptr[edi],xmm0
movdqa  oword ptr[edi+BitreversalSize*2],xmm0
done:
ret
size_16:
psrld   xmm0,2
movdqa  oword ptr[edi],xmm0
movdqa  oword ptr[edi+BitreversalSize*2],xmm0
movdqa  oword ptr[edi+BitreversalSize],xmm0
movdqa  oword ptr[edi+BitreversalSize*2+BitreversalSize],xmm0
ret
Title: Re: Fast Bit Reversal algorithm
Post by: dedndave on March 09, 2016, 11:22:50 PM
you may find this post helpful Marinus

http://www.masmforum.com/board/index.php?topic=12722.msg98224#msg98224 (http://www.masmforum.com/board/index.php?topic=12722.msg98224#msg98224)
Title: Re: Fast Bit Reversal algorithm
Post by: Siekmanski on March 10, 2016, 12:05:31 AM
Hi Dave,

I'll compare the execution speed of those algorithms with mine this weekend.
Title: Re: Fast Bit Reversal algorithm
Post by: dedndave on March 10, 2016, 04:19:38 AM
it wasn't my intention that you should use that code, directly
but, if you use that method with SSE code, it should be quite fast   :t
Title: Re: Fast Bit Reversal algorithm
Post by: Siekmanski on March 10, 2016, 05:36:25 AM
I'm always in the learning mood so, i'll study those examples.   :t
My version does 4 bit-reversals in one go. ( in fact 32 with 1 index pointer )
It doesn't do bit-reversal of single values but creates a table for a power of two size.
For example 128 values from 0 - 127 bit-reversed.
I could use a fast one to construct a time decomposition table for my FFT routine.
Title: Re: Fast Bit Reversal algorithm
Post by: Siekmanski on March 10, 2016, 08:59:39 AM
Hi Dave,

I have studied the bit reversal routines from the thread you posted.
I can not make a fair comparison because you all did a 32 bit bit-reversal of 1 value.

My routine does 128 values and writes them in a memory-buffer.
In my routine it doesn't matter how manny bits you need to reverse.
I still have to implement the Matrix preparation code for every bit-size.
No additional calculations are needed in the main loop, only for matrix shuffles and summations.

The routine in the attachment is fixed for a table of 128 values.
In this example for 128 values it reverses all 7 bits per dword, from value 0 --- > 127

I think it's rather fast including mem-writes.  :biggrin:

Timings on my machine:

Siekmanski bitswap: 82 cycles per 128 values = 0.640625 cycles per value
drizzNew stir-fry bitswap: 3 cycles per value ( the fastest from your posted thread )

Code: [Select]
`Fast Bit-reversal using a 4*4 Matrix by Siekmanski 2016.Please wait 4 seconds before timing begins82 Cycles for 128 values.Bitreversed Table results:000 064 032 096 016 080 048 112 008 072 040 104 024 088 056 120 004 068 036 100020 084 052 116 012 076 044 108 028 092 060 124 002 066 034 098 018 082 050 114010 074 042 106 026 090 058 122 006 070 038 102 022 086 054 118 014 078 046 110030 094 062 126 001 065 033 097 017 081 049 113 009 073 041 105 025 089 057 121005 069 037 101 021 085 053 117 013 077 045 109 029 093 061 125 003 067 035 099019 083 051 115 011 075 043 107 027 091 059 123 007 071 039 103 023 087 055 119015 079 047 111 031 095 063 127Press any key to continue...`
Marinus
Title: Re: Fast Bit Reversal algorithm
Post by: Siekmanski on March 10, 2016, 09:50:38 AM
If have included "drizzNew stir-fry bitswap routine" to create a 128 size table.
Note: his routine includes memory writes now. ( for creating the table )

Code: [Select]
`Fast Bit-reversal using a 4*4 Matrix by Siekmanski 2016.81 Cycles Siekmanski3585 Cycles drizzNew stir-frySiekmanski's table:000 064 032 096 016 080 048 112 008 072 040 104 024 088 056 120 004 068 036 100020 084 052 116 012 076 044 108 028 092 060 124 002 066 034 098 018 082 050 114010 074 042 106 026 090 058 122 006 070 038 102 022 086 054 118 014 078 046 110030 094 062 126 001 065 033 097 017 081 049 113 009 073 041 105 025 089 057 121005 069 037 101 021 085 053 117 013 077 045 109 029 093 061 125 003 067 035 099019 083 051 115 011 075 043 107 027 091 059 123 007 071 039 103 023 087 055 119015 079 047 111 031 095 063 127Drizznew's table:000 064 032 096 016 080 048 112 008 072 040 104 024 088 056 120 004 068 036 100020 084 052 116 012 076 044 108 028 092 060 124 002 066 034 098 018 082 050 114010 074 042 106 026 090 058 122 006 070 038 102 022 086 054 118 014 078 046 110030 094 062 126 001 065 033 097 017 081 049 113 009 073 041 105 025 089 057 121005 069 037 101 021 085 053 117 013 077 045 109 029 093 061 125 003 067 035 099019 083 051 115 011 075 043 107 027 091 059 123 007 071 039 103 023 087 055 119015 079 047 111 031 095 063 127Press any key to continue...`
Title: Re: Fast Bit Reversal algorithm
Post by: Siekmanski on March 10, 2016, 07:33:03 PM
Hi ognil,
Replaced it with the faster drizzNew stir-fry  :t
It is still not a fair comparison, because drizznew's routine is optimised for one 32 bit reversal.

Code: [Select]
`Fast Bit-reversal using a 4*4 Matrix by Siekmanski 2016.76 Cycles Siekmanski2765 Cycles drizzNew stir-frySiekmanski's table:000 064 032 096 016 080 048 112 008 072 040 104 024 088 056 120 004 068 036 100020 084 052 116 012 076 044 108 028 092 060 124 002 066 034 098 018 082 050 114010 074 042 106 026 090 058 122 006 070 038 102 022 086 054 118 014 078 046 110030 094 062 126 001 065 033 097 017 081 049 113 009 073 041 105 025 089 057 121005 069 037 101 021 085 053 117 013 077 045 109 029 093 061 125 003 067 035 099019 083 051 115 011 075 043 107 027 091 059 123 007 071 039 103 023 087 055 119015 079 047 111 031 095 063 127Drizznew's table:000 064 032 096 016 080 048 112 008 072 040 104 024 088 056 120 004 068 036 100020 084 052 116 012 076 044 108 028 092 060 124 002 066 034 098 018 082 050 114010 074 042 106 026 090 058 122 006 070 038 102 022 086 054 118 014 078 046 110030 094 062 126 001 065 033 097 017 081 049 113 009 073 041 105 025 089 057 121005 069 037 101 021 085 053 117 013 077 045 109 029 093 061 125 003 067 035 099019 083 051 115 011 075 043 107 027 091 059 123 007 071 039 103 023 087 055 119015 079 047 111 031 095 063 127Press any key to continue...`
Title: Re: Fast Bit Reversal algorithm
Post by: dedndave on March 14, 2016, 04:24:50 AM
you missed my intention
examine the stir-fry code - see how it works
then, write SSE code that uses a similar technique   :P
Title: Re: Fast Bit Reversal algorithm
Post by: Siekmanski on March 14, 2016, 09:18:49 AM
Hi Dave,

I did understand your intention for me to have a look at Drizz's stir-fry method.  :t
I have studied it, and it can be done with that very clever technique from Drizz in SSE code to calculate 4 DWORD values in bit-reversed order at once.
For that case the technique of Drizz is unbeatable.

But, my goal was to get an array of all the values from 0 to 255 in a bit-reversed order ( or another power of two length array ).

I found a way to calculate a bit-reversed array without the need of bit mangling or bit manipulation.  :biggrin:

It skips all the calculations needed to reverse the bits of each value.
Because of skipping those calculations it has become a very fast routine which exactly fits my need of a fast calculated bit-reversed array.
In this case it is much faster then using Drizz's stir-fry method in SSE code.

The only math in my routine is, start with the first values from the matrix, then increase those values with 1 to get the next values and place them at the right position in the array and repeat this untill all 256 values are done.
It only needs 256/32 steps to get all the 256 DWORD values in bit-reversed order.
With 1 index pointer from the matrix it puts 8 * 4 values at once at the right position in the array.

Note: This algorithm only works for power of two length arrays !
Title: Re: Fast Bit Reversal algorithm
Post by: Raistlin on March 14, 2016, 04:45:02 PM
Code: [Select]
`Note: This algorithm only works for power of two length arrays !`
Which makes perfect sense as data runs are typically 512, 1024, 2048, 4096 and at the extreme 32768 in length
per FFT sub-set. (I have seen 65536 and larger runs in hardware based systems) As far as I understand it the bit-reversal is
calculated once and only once. Then dumped as a lookup table. The data is then dispersed per this lookup-table. The result is fed into
the actual FFT algorithm. Normally a filter or transform is then applied. The result is then reversed (iFFT) and a new
waveform generated that thus incorporates such filters. Power of two use; as per Siekmanski seems intuitive and well conceived.
Please correct me, if I left something major out -  I'am only a novice at this myself.

refs see:
http://www.fftw.org/accuracy/Pentium4-3.60GHz-icc/ (Shows typical data bin sizes for powers of two)
http://www.originlab.com/doc/Origin-Help/FFT-Filter (Filters - general)
http://retouchpro.com/tutorials/?m=show&id=185 (Filters - photographic enhancement)
http://stackoverflow.com/questions/2929401/dsp-filtering-in-the-frequency-domain-via-fft  (Filters - audio)
http://www3.mpifr-bonn.mpg.de/staff/bklein/FFTS/URSI-FFTS.pdf  (hardware based FFT)

Title: Re: Fast Bit Reversal algorithm
Post by: hutch-- on March 14, 2016, 05:56:32 PM
i don't claim to have absorbed all of the code present but a 256 item table is one idea I would explore. It means for any give 8 bits you have a 1 memory access to get its inversion. Does this make sense ?
Title: Re: Fast Bit Reversal algorithm
Post by: K_F on March 14, 2016, 09:53:11 PM
Sorry Hutch, I meant the duplicate entry... I had two duplicate posts appear ?

If you're creating a single filter to work on many times a single bit reversal would do.
But if you're creating many filters (say a running filter) you have the problem of having to bit reverse every filter.

I've got around this by making an offset address table with 2ExpN sizes and bit reversed this table.
The filter calculations then use this table as an index into the untouched/original filter for it's operations.
On changing the filter, you can just reload a saved filter or make a new one without having to bit reverse anything.

This is what I done so far.. I'll clean it up later.
:)
Title: Re: Fast Bit Reversal algorithm
Post by: Siekmanski on March 15, 2016, 12:20:36 AM
Hi guys,

We had this discussion before in the fft thread, do we realy need a fast bit-reversal routine for the fft routine ?
Not realy if you are repeating the same "small" fft length over and over again.
But if you need very large fft's it can take some time to create a bit-reversal table.
Especially if you need many fft's of differrent sizes.

My routine doesn't need a lookup table or do actual bit-reversal for any power of 2 size.
In fact it only needs these 4 numbers ( 00, 02, 01, 03 ) to create a matrix on the fly in code for all power of 2 sizes.
It shifts the 4 numbers into the right values for the size you want and transposes the first row to the first collumn and calculates the stride in that column needed for the places in the table.

The only thing it has to do is setting up the matrix values for the power of 2 size you want.
Then it processes the first row of the matrix then the second row and then updates the first and second row to row 3 and 4 if larger sizes are needed etc.

As soon as I have the time to code it, I will post a complete routine which does it all.
Just posted it at this stage to explain how the algoritm works with some fixed examples.

This technique ( with a bit of tweaking ) can be used to create the cos and sin tables.
To create a cos table and a sin table for a 1024 size fft, you only need to calculate 255 sines to construct both the cos and sin tables.
Title: Re: Fast Bit Reversal algorithm
Post by: K_F on March 15, 2016, 03:01:41 AM
I'm always curious... when you've finished  :biggrin:
Title: Re: Fast Bit Reversal algorithm
Post by: Siekmanski on March 26, 2016, 08:26:45 PM
I finished a routine that calculates 16 to 1024 size tables.
The algorithm works but I don't have enough registers to calculate larger then 1024 size tables. ( at least I think so.... )
I may need to store temporary pre calculated column values inside the table memory to calculate larger power of 2 size tables.

After all the goal is, to use this algorithm to calculate very large tables as fast as possible.

Edit: see Reply #26 for the source code
Title: Re: Fast Bit Reversal algorithm
Post by: nidud on March 26, 2016, 11:04:57 PM
It crashes if I run it from the console, but for some reason seems to work inside Olly.
Code: [Select]
`CPU DisasmAddress   Hex dump          Command                                  Comments00401154  |.  33C0          XOR EAX,EAX00401156  |.  BB 04000000   MOV EBX,40040115B  |.  BA 04000000   MOV EDX,400401160  |.  B9 20000000   MOV ECX,2000401165  |.  8DA424 000000 LEA ESP,[LOCAL.1]0040116C  |.  8D6424 00     LEA ESP,[LOCAL.1]00401170  |>  660F7F0487    /MOVDQA DQWORD PTR DS:[EAX*4+EDI],XMM0   ; FLOAT 1.697596632914509e-311, 1.154365710300918e-31100401175  |.  660F7F8C87 00 |MOVDQA DQWORD PTR DS:[EAX*4+EDI+800],XM0040117E  |.  660FFEC2      |PADDD XMM0,XMM200401182  |.  660FFECA      |PADDD XMM1,XMM200401186  |.  660F7F8487 00 |MOVDQA DQWORD PTR DS:[EAX*4+EDI+400],XM0040118F  |.  660F7F8C87 00 |MOVDQA DQWORD PTR DS:[EAX*4+EDI+0C00],XCPU - main thread, module FastBitReversalEAX 00150788ECX 0000001CEDX 00000003EBX 00000004ESP 0012FFB8EBP 0012FFF0ESI 01D18756EDI 00403080 FastBitReversal.00403080EIP 00401170 FastBitReversal.00401170C 0  ES 0023 32bit 0(FFFFFFFF)P 0  CS 001B 32bit 0(FFFFFFFF)A 0  SS 0023 32bit 0(FFFFFFFF)Z 0  DS 0023 32bit 0(FFFFFFFF)S 0  FS 003B 32bit 7FFDF000(FFF)T 0  GS 0000 NULLD 0O 0  LastErr 00000000 ERROR_SUCCESSEFL 00010202 (NO,NB,NE,A,NS,PO,GE,G)ST0 empty +UNORM 0061 00730072 00000001ST1 empty -UNORM B411 7C90D80A 00650078ST2 empty +UNORM 0024 0013F550 00000017ST3 empty -UNORM B461 001599C8 00000001ST4 empty +UNORM 0305 02020202 03030000ST5 empty 0.0000000000000028100e-4933ST6 empty 0.0000000000050182150e-4933ST7 empty 0.0               3 2 1 0      E S P U O Z D IFST 0000  Cond 0 0 0 0  Err 0 0 0 0 0 0 0 0 (GT)FCW 027F  Prec NEAR,53  Mask    1 1 1 1 1 1Last cmnd 001B:0288EBBDXMM0 00000320 00000120 00000220 00000020XMM1 00000321 00000121 00000221 00000021XMM2 00000002 00000002 00000002 00000002XMM3 00158B40 00158B20 7C9100AD 00150788XMM4 00158AD0 00158AF0 7C91005D 00150778XMM5 00000000 00000060 00000020 00000040XMM6 00000060 00000020 00000040 00000000XMM7 7C913F92 7C980620 7C910460 7C90E920                                P U O Z D IMXCSR 00001F80  FZ 0 DZ 0  Err  0 0 0 0 0 0                Rnd NEAR   Mask 1 1 1 1 1 1`
Title: Re: Fast Bit Reversal algorithm
Post by: jj2007 on March 27, 2016, 06:34:16 AM
Works fine for me on Win7-64, even if launched directly from the archive.
Title: Re: Fast Bit Reversal algorithm
Post by: Siekmanski on March 27, 2016, 03:04:30 PM
Hi Nidud,
Can't find what could cause this....
Title: Re: Fast Bit Reversal algorithm
Post by: jj2007 on March 27, 2016, 07:05:29 PM
Could be unaligned access. @Nidud: If you configure Olly as just-in-time debugger, where does it crash?7
->Options/debugging/just-in-time
Title: Re: Fast Bit Reversal algorithm
Post by: Siekmanski on March 27, 2016, 08:39:55 PM
Memory access is 16 byte aligned.
Title: Re: Fast Bit Reversal algorithm
Post by: jj2007 on March 27, 2016, 09:03:07 PM
In fact, I can't produce a crash, whatever I try ::)
Title: Re: Fast Bit Reversal algorithm
Post by: nidud on March 27, 2016, 10:16:05 PM
If you configure Olly as just-in-time debugger, where does it crash?

The dump shows where the crash is: 00401170  |>
EAX = 00150788 --> DS:[EAX*4+EDI]

It could be a hardware problem. The CPU is <= SSE3.
Title: Re: Fast Bit Reversal algorithm
Post by: jj2007 on March 27, 2016, 10:46:01 PM
It could be a hardware problem. The CPU is <= SSE3.

movdqa is SSE2 (https://msdn.microsoft.com/en-us/library/bytwczae%28v=vs.90%29.aspx). Besides, Olly is powerful but it can't upgrade the hardware :icon_mrgreen:
Title: Re: Fast Bit Reversal algorithm
Post by: nidud on March 27, 2016, 11:42:46 PM
It could be a hardware problem. The CPU is <= SSE3.

movdqa is SSE2 (https://msdn.microsoft.com/en-us/library/bytwczae%28v=vs.90%29.aspx). Besides, Olly is powerful but it can't upgrade the hardware :icon_mrgreen:

:biggrin:

I didn't look into the up-codes used but...

When stepping through the code using MS, EAX (from XMM3) is 40/20/60. Then EBX becomes 0 and EDX 3, so it jumps to NextColumn. At this point XMM4 is undefined.

Code: [Select]
`    dec     ebx    jnz     NextSwap ; Check if we need the next row ?    mov     ebx,4 ; Row index counter    dec     edx    jnz     NextColumn ; Check if we need the next row ?    mov     edx,4 ; Column index counter    movdqa  xmm4,xmm5 ; Get Matrix constant    pshufd  xmm4,xmm4,Shuffle(2,2,2,2)    psrld   xmm4,4 ; Calculate Matrix stepNextColumn:    movdqa  xmm3,xmm5 ; First row indices for matrix calculations    pshufd  xmm5,xmm5,Shuffle(0,3,2,1) ; Prepare next column value    pshufd  xmm3,xmm3,Shuffle(1,1,1,1) ; Get second value and transpose to next column ( incl. adding values for next row )    psrld   xmm3,2 ; Calculate Matrix row step    paddd   xmm3,xmm4    paddd   xmm3,xmm6 ; New row indices for Table positions    jmp     NextIndexNextSwap:    pshufd  xmm3,xmm3,Shuffle(0,3,2,1) ; Get next index from the rowNextIndex:    movd    eax,xmm3 `
As a result EAX is now 7C910232, so I guess it depend on the value of XMM4 in the paddd xmm3,xmm4 above.
Title: Re: Fast Bit Reversal algorithm
Post by: nidud on March 27, 2016, 11:46:13 PM
So, this works:
Code: [Select]
`    pxor    xmm4,xmm4    call    FastBitReversal16_1024`
Title: Re: Fast Bit Reversal algorithm
Post by: Siekmanski on March 28, 2016, 02:26:01 AM
Nidud.

Thanks, for finding this bug, xmm4 is indeed undefined when it jumps to NextColumn.  :t
Title: Re: Fast Bit Reversal algorithm
Post by: jj2007 on May 28, 2016, 07:40:57 PM
Here is a slow but easy-to-use MirrorBits() macro ruthlessly stolen from The Aggregate at University of Kentucky (http://aggregate.org/MAGIC/#Bit%20Reversal) :biggrin:

Code: [Select]
`include \masm32\MasmBasic\MasmBasic.incMirrorBits MACRO arg32  ifdifi <arg32>, <eax> mov eax, arg32  endif  mov edx, eax  and eax, 0aaaaaaaah  and edx, 055555555h  shr eax, 1  add edx, edx ; shl edx,1  or eax, edx  mov edx, eax  and eax, 0cccccccch  and edx, 033333333h  shr eax, 2  lea edx, [4*edx] ; shl edx, 2  or eax, edx  mov edx, eax  and eax, 0f0f0f0f0h  and edx, 0f0f0f0fh  shr eax, 4  shl edx, 4  or eax, edx  mov edx, eax  and eax, 0ff00ff00h  and edx, 0ff00ffh  shr eax, 8  shl edx, 8  or eax, edx  mov edx, eax  shr eax, 16  shl edx, 16  or eax, edx  EXITM <eax>ENDM  Init  push 29  .Repeat mov ecx, Rand(-1) Print Bin\$(ecx), " " Print Bin\$(MirrorBits(ecx)), CrLf\$ dec stack  .Until Sign?EndOfCode`
Code: [Select]
`01110011101010011011100010001101 1011000100011101100101011100111000010010011100011100110110011101 1011100110110011100011100100100000100010111111101000101011011000 0001101101010001011111110100010011011101000111011011100110100010 0100010110011101101110001011101110100111111001110100000001001101 1011001000000010111001111110010111010010000110000100011110110011 1100110111100010000110000100101100111000011011011111010011000100 0010001100101111101101100001110011001001100100010111001100000000 0000000011001110100010011001001100000101111100101101011011111111 1111111101101011010011111010000010000100010101101000011100001011 1101000011100001011010100010000110010000000000000001001101011000 0001101011001000000000000000100101011101011001101111011010001110 0111000101101111011001101011101010010011101011101110101110101000 0001010111010111011101011100100110101101000101100011100110111011 1101110110011100011010001011010101000000010110111111100100100000 0000010010011111110110100000001000100101011010000101111100010001 1000100011111010000101101010010010010110110101010011100001110000 0000111000011100101010110110100101110101111111110011100000101011 1101010000011100111111111010111010110000001000010110001101110101 1010111011000110100001000000110111111010101001110110111110011110 0111100111110110111001010101111110100011100111111110111110100101 1010010111110111111110011100010100101010000111001010100011010010 0100101100010101001110000101010011010111001111101100100000011010 0101100000010011011111001110101100011111101110101101101000110110 0110110001011011010111011111100000111011101111100100110101000001 1000001010110010011111011101110011000110101010111001011101011011 1101101011101001110101010110001111100000111010001101111101101111 1111011011111011000101110000011111110100001001100111100101101011 1101011010011110011001000010111111110000110000101110100111011111 1111101110010111010000110000111101100100111110010100110110001110 01110001101100101001111100100110`