Author Topic: Bits Scaling to 2  (Read 12651 times)

Antariy

  • Member
  • ****
  • Posts: 541
Bits Scaling to 2
« on: January 06, 2013, 12:25:19 PM »
Test for some variations of the proc to scale bits twice (double "width"). I.e., for instance, if we have the binary number: 1010Y then after processing it will be 11001100Y

Interesting to see how it will perform on different machines. The test processing full width - 16 bits to 32 bits, i.e. 16 iterations through the loop.

Code: [Select]
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
171     cycles for 1
205     cycles for 2 (dec ecx)
173     cycles for 3 (lea edi)
174     cycles for 4 (sbb edx)
169     cycles for 1
202     cycles for 2 (dec ecx)
175     cycles for 3 (lea edi)
176     cycles for 4 (sbb edx)
170     cycles for 1
203     cycles for 2 (dec ecx)
174     cycles for 3 (lea edi)
175     cycles for 4 (sbb edx)

--- ok ---


Also, please note on how the testbed is implemented: this is a kind of a suggestion. Each tested code piece was put in different section of the code, i.e., it starts at new page boundary (similar to align 1000h) and this should help to eliminate the influence of the code layout and placement.
Commented out buffer allocation and walking through all the bytes of the buffer at the start of the test - just to be used in the tests which require data processing.

dedndave

  • Member
  • *****
  • Posts: 8752
  • Still using Abacus 2.0
    • DednDave
Re: Bits Scaling to 2
« Reply #1 on: January 06, 2013, 12:56:01 PM »
prescott w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
165     cycles for 1
197     cycles for 2 (dec ecx)
170     cycles for 3 (lea edi)
168     cycles for 4 (sbb edx)
163     cycles for 1
191     cycles for 2 (dec ecx)
168     cycles for 3 (lea edi)
168     cycles for 4 (sbb edx)
166     cycles for 1
199     cycles for 2 (dec ecx)
167     cycles for 3 (lea edi)
169     cycles for 4 (sbb edx)

Antariy

  • Member
  • ****
  • Posts: 541
Re: Bits Scaling to 2
« Reply #2 on: January 06, 2013, 01:09:54 PM »
Hi, Dave :t

It is interesting that the second variation of the code only differs by counter decrementation instruction (ADD ECX,-1 / DEC ECX), and overall it slower ~2 cycles per iteration.

sinsi

  • Member
  • *****
  • Posts: 1007
Re: Bits Scaling to 2
« Reply #3 on: January 06, 2013, 01:26:12 PM »
AMD Phenom(tm) II X6 1100T Processor (SSE3)
88      cycles for 1
77      cycles for 2 (dec ecx)
87      cycles for 3 (lea edi)
89      cycles for 4 (sbb edx)
86      cycles for 1
76      cycles for 2 (dec ecx)
86      cycles for 3 (lea edi)
89      cycles for 4 (sbb edx)
87      cycles for 1
76      cycles for 2 (dec ecx)
92      cycles for 3 (lea edi)
88      cycles for 4 (sbb edx)
I can walk on water but stagger on beer.

Antariy

  • Member
  • ****
  • Posts: 541
Re: Bits Scaling to 2
« Reply #4 on: January 06, 2013, 01:51:05 PM »
Hi, sinsi :t

But AMD favors the things that Intel dislike, as usually :biggrin:

dedndave

  • Member
  • *****
  • Posts: 8752
  • Still using Abacus 2.0
    • DednDave
Re: Bits Scaling to 2
« Reply #5 on: January 06, 2013, 01:58:18 PM »
search the old forum for the words "stir fry"   :biggrin:
there was a routine in there for reversing the bits in a register
the method used may be applied, here
very cool algo

dedndave

  • Member
  • *****
  • Posts: 8752
  • Still using Abacus 2.0
    • DednDave
Re: Bits Scaling to 2
« Reply #6 on: January 06, 2013, 02:07:04 PM »
here you go, Alex
if you can understand how that algorithm works, i believe you can apply it to what you are doing   :t

http://www.masmforum.com/board/index.php?topic=12722.msg98224#msg98224

Antariy

  • Member
  • ****
  • Posts: 541
Re: Bits Scaling to 2
« Reply #7 on: January 06, 2013, 02:08:59 PM »
No, that's not reversion of the bits, but just making them "twice wide" (i.e. 0101 becomes 00110011, 1101 becomes 11110011 etc).

But it is interesting to see that algo, thank you, Dave :t

dedndave

  • Member
  • *****
  • Posts: 8752
  • Still using Abacus 2.0
    • DednDave
Re: Bits Scaling to 2
« Reply #8 on: January 06, 2013, 02:24:54 PM »
i understand that
still, i think the algo could be modified to "double" the bits

sinsi

  • Member
  • *****
  • Posts: 1007
Re: Bits Scaling to 2
« Reply #9 on: January 06, 2013, 06:35:09 PM »
What number of bits are we looking at? Maximum of 16 I guess but any number? Like 3 bits or 11?
I can walk on water but stagger on beer.

Gunther

  • Member
  • *****
  • Posts: 3518
  • Forgive your enemies, but never forget their names
Re: Bits Scaling to 2
« Reply #10 on: January 06, 2013, 11:01:50 PM »
Hi Alex,

here is my result:

Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
102 cycles for 1
95 cycles for 2 (dec ecx)
130 cycles for 3 (lea edi)
100 cycles for 4 (sbb edx)
129 cycles for 1
100 cycles for 2 (dec ecx)
130 cycles for 3 (lea edi)
129 cycles for 4 (sbb edx)
128 cycles for 1
95 cycles for 2 (dec ecx)
100 cycles for 3 (lea edi)
128 cycles for 4 (sbb edx)

--- ok ---

Gunther
Get your facts first, and then you can distort them.

FORTRANS

  • Member
  • ****
  • Posts: 946
Re: Bits Scaling to 2
« Reply #11 on: January 07, 2013, 12:36:21 AM »

pre-P4 (SSE1)
129     cycles for 1
126     cycles for 2 (dec ecx)
137     cycles for 3 (lea edi)
128     cycles for 4 (sbb edx)
123     cycles for 1
124     cycles for 2 (dec ecx)
134     cycles for 3 (lea edi)
125     cycles for 4 (sbb edx)
122     cycles for 1
124     cycles for 2 (dec ecx)
134     cycles for 3 (lea edi)
125     cycles for 4 (sbb edx)

--- ok ---

six_L

  • Member
  • **
  • Posts: 95
Re: Bits Scaling to 2
« Reply #12 on: January 07, 2013, 12:45:21 AM »
Quote
Intel(R) Core(TM) i3 CPU       M 370  @ 2.40GHz (SSE4)
76   cycles for 1
77   cycles for 2 (dec ecx)
75   cycles for 3 (lea edi)
75   cycles for 4 (sbb edx)
74   cycles for 1
76   cycles for 2 (dec ecx)
76   cycles for 3 (lea edi)
76   cycles for 4 (sbb edx)
74   cycles for 1
73   cycles for 2 (dec ecx)
75   cycles for 3 (lea edi)
75   cycles for 4 (sbb edx)

--- ok ---


dedndave

  • Member
  • *****
  • Posts: 8752
  • Still using Abacus 2.0
    • DednDave
Re: Bits Scaling to 2
« Reply #13 on: January 07, 2013, 01:26:02 AM »
sinsi - my understanding is that a word gets "doubled" into a dword
some kind of exponentiation, actually

i can't think of when or why, but i have had need to do something like this in the past - lol
i think it can be done much faster, though   :P
a 256-byte LUT would be some improvement
but, i think the "Sexy Stir-Fry" technique could be employed
similar to how this bit-reversing algo works
Code: [Select]
;------------------------------------------------------------

        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

BitReverse PROC dwVal:DWORD

;Dword Bit-Swap Algorithm
;Sexy Stir-Fry Technique
;
;The original concept was developed by BitRake and Nexo.
;This version was updated by Drizz and Lingo.

        pop     ecx
        mov     edx,0F0F0F0Fh
        pop     eax
        and     edx,eax
        and     eax,0F0F0F0F0h
        shl     edx,4
        shr     eax,4
        or      eax,edx
        mov     edx,33333333h
        and     edx,eax
        and     eax,0CCCCCCCCh
        shr     eax,2
        lea     eax,[edx*4+eax]
        mov     edx,55555555h
        and     edx,eax
        and     eax,0AAAAAAAAh
        shr     eax,1
        lea     eax,[edx*2+eax]
        bswap   eax
        jmp     ecx

BitReverse ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  EPILOGUE:EpilogueDef

;------------------------------------------------------------

Antariy

  • Member
  • ****
  • Posts: 541
Re: Bits Scaling to 2
« Reply #14 on: January 07, 2013, 03:22:41 AM »
Hi, Dave :biggrin:

Well, the bits swapping algo based on shifts to shuffle the bits, but it cannot be applied here. The only "similar" part is that both algos use shifting instructions (SHR/SHL/LEA) to get job done, but here we need literally to scale bit map, not to reverse it, i.e., when you reversing bits you are working with fixed pattern (map) of bits and "just" shifting and combining it, in scaling here you should duplicate the bit map, but with higher "resolution" - probably that's most suitable description. Like if you resizing 1bpp bitmap :biggrin:

Hi, sinsi, yes, any 16 bit (or less) number, which will be scaled (up to 32 bits for 16 bit number).