Author Topic: Bits Scaling to 2  (Read 11729 times)

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: Bits Scaling to 2
« Reply #30 on: January 07, 2013, 11:50:36 AM »
wow - didn't think it was that difficult
if that's too much for you, i suggest you stay with console apps   :lol:

but - you have options
for example, you could just make the table in the .DATA section - it's ony 512 bytes in size

another option is to compare the value of the last byte when you call the proc
if it isn't FFh, you call the routine to build the table
that way, it gets built automatically the first time you call it

now - as for 3 or 4 or more bits....
you never said anything about that   ;)
we can't read your mind - lol

Antariy

  • Member
  • ****
  • Posts: 541
Re: Bits Scaling to 2
« Reply #31 on: January 07, 2013, 12:37:11 PM »
another option is to compare the value of the last byte when you call the proc
if it isn't FFh, you call the routine to build the table
that way, it gets built automatically the first time you call it

now - as for 3 or 4 or more bits....
you never said anything about that   ;)
we can't read your mind - lol

Why to check lower byte? That's not the question of how big is table or index, the table just should be a "resized copy" of a smaller 256 byte time. I.e., most space in it will be just a dummy bytes. Example:

original table:

AA BB

CC DD

the processed table

AA 00 BB 00

CC 00 DD 00



But this is example, real table (not the topic algo) will contain over 65000 dummy bytes and only 256 bytes of the original, but "bit map resized" data.
It is required to avoid any index-fixing computations at runtime.

As for number of bits, yes, you are right, that was not said, but the algo contains a bit count parameter (not scalable algo yet, though), it is not unrolled at all etc what means the speed is not main concern, and the test was merely for seeing how such type of the code runs on dfferent CPUs, for asking a new mathematical ways to do this thing, and to conversate with you all (that's not joke) :P

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: Bits Scaling to 2
« Reply #32 on: January 07, 2013, 12:57:18 PM »
what i meant was something like this...

Code: [Select]
        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

BitDbl  PROC    dwVal:DWORD

        cmp byte ptr waBdLut+511,0FFh
        jz      BitDb0

        call    InitBdLut

BitDb0: movzx   eax,byte ptr [esp+5]
        mov     ecx,offset waBdLut
        dec     eax
        movzx   edx,byte ptr [esp+4]
        mov     eax,[ecx+2*eax]
        mov     ax,[ecx+2*edx]
        ret     4

BitDbl  ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  EPILOGUE:EpilogueDef

that will add a few clock cycles
better to just define the table in .DATA   :P

but, if you want to handle a variety of bit-widths, you need some other way

sinsi

  • Member
  • ****
  • Posts: 996
Re: Bits Scaling to 2
« Reply #33 on: January 07, 2013, 05:33:14 PM »
This gives me 57 cycles for 16 bits
Code: [Select]
    push ebx
    push esi
    sub eax,eax
    mov edx,[esp+12]
    mov ecx,[esp+16]
    mov ebx,11b
@@: sub esi,esi
    bt edx,0
    cmovc esi,ebx
    shr edx,1
    or eax,esi
    shl ebx,2
    sub ecx,1
    jnz @b
    pop esi
    pop ebx
    ret 8
I can walk on water but stagger on beer.

Antariy

  • Member
  • ****
  • Posts: 541
Re: Bits Scaling to 2
« Reply #34 on: January 09, 2013, 04:36:06 AM »
Hi sinsi :t

I have incorporated your code as well as added (probably crude) "universal" version to scale number of 2...16 bits by scale factor 16...2 (respectively).

Code: [Select]
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
165     cycles for 1
198     cycles for 1.2 (dec ecx)
74      cycles for 2 (universal scaling proc) CMOV
108     cycles for 2.1 (universal scaling proc) i386
100     cycles for sinsi's
76      cycles for sinsi's mod
76      cycles for sinsi's mod 2
84      cycles for sinsi's mod 3 (AMD)
98      cycles for sinsi's mod 4 (AMD)
165     cycles for 1
200     cycles for 1.2 (dec ecx)
73      cycles for 2 (universal scaling proc) CMOV
109     cycles for 2.1 (universal scaling proc) i386
99      cycles for sinsi's
76      cycles for sinsi's mod
76      cycles for sinsi's mod 2
82      cycles for sinsi's mod 3 (AMD)
81      cycles for sinsi's mod 4 (AMD)
164     cycles for 1
197     cycles for 1.2 (dec ecx)
74      cycles for 2 (universal scaling proc) CMOV
108     cycles for 2.1 (universal scaling proc) i386
100     cycles for sinsi's
76      cycles for sinsi's mod
76      cycles for sinsi's mod 2
82      cycles for sinsi's mod 3 (AMD)
81      cycles for sinsi's mod 4 (AMD)

--- ok ---

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: Bits Scaling to 2
« Reply #35 on: January 09, 2013, 08:43:52 AM »
prescott w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
167     cycles for 1
197     cycles for 1.2 (dec ecx)
79      cycles for 2 (universal scaling proc) CMOV
108     cycles for 2.1 (universal scaling proc) i386
100     cycles for sinsi's
76      cycles for sinsi's mod
76      cycles for sinsi's mod 2
82      cycles for sinsi's mod 3 (AMD)
81      cycles for sinsi's mod 4 (AMD)
167     cycles for 1
197     cycles for 1.2 (dec ecx)
76      cycles for 2 (universal scaling proc) CMOV
108     cycles for 2.1 (universal scaling proc) i386
99      cycles for sinsi's
76      cycles for sinsi's mod
76      cycles for sinsi's mod 2
82      cycles for sinsi's mod 3 (AMD)
85      cycles for sinsi's mod 4 (AMD)
167     cycles for 1
197     cycles for 1.2 (dec ecx)
75      cycles for 2 (universal scaling proc) CMOV
108     cycles for 2.1 (universal scaling proc) i386
100     cycles for sinsi's
76      cycles for sinsi's mod
76      cycles for sinsi's mod 2
82      cycles for sinsi's mod 3 (AMD)
81      cycles for sinsi's mod 4 (AMD)

Gunther

  • Member
  • *****
  • Posts: 3515
  • Forgive your enemies, but never forget their names
Re: Bits Scaling to 2
« Reply #36 on: January 09, 2013, 09:12:57 AM »
Hi Alex,

here the results:

Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
129 cycles for 1
100 cycles for 1.2 (dec ecx)
86 cycles for 2 (universal scaling proc) CMOV
102 cycles for 2.1 (universal scaling proc) i386
90 cycles for sinsi's
88 cycles for sinsi's mod
98 cycles for sinsi's mod 2
79 cycles for sinsi's mod 3 (AMD)
78 cycles for sinsi's mod 4 (AMD)
100 cycles for 1
129 cycles for 1.2 (dec ecx)
93 cycles for 2 (universal scaling proc) CMOV
84 cycles for 2.1 (universal scaling proc) i386
101 cycles for sinsi's
98 cycles for sinsi's mod
84 cycles for sinsi's mod 2
77 cycles for sinsi's mod 3 (AMD)
80 cycles for sinsi's mod 4 (AMD)
126 cycles for 1
95 cycles for 1.2 (dec ecx)
87 cycles for 2 (universal scaling proc) CMOV
102 cycles for 2.1 (universal scaling proc) i386
83 cycles for sinsi's
83 cycles for sinsi's mod
98 cycles for sinsi's mod 2
49 cycles for sinsi's mod 3 (AMD)
82 cycles for sinsi's mod 4 (AMD)

--- ok ---

Gunther
Get your facts first, and then you can distort them.

Antariy

  • Member
  • ****
  • Posts: 541
Re: Bits Scaling to 2
« Reply #37 on: January 09, 2013, 09:25:59 AM »
Hi Dave and Gunther :biggrin:
Thank you for the tests, I was posting in Gunther's thread and at that moment not noticed this topic updated.

jj2007

  • Member
  • *****
  • Posts: 7542
  • Assembler is fun ;-)
    • MasmBasic
Re: Bits Scaling to 2
« Reply #38 on: January 09, 2013, 09:29:15 AM »
One more :biggrin:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
104     cycles for 1
122     cycles for 1.2 (dec ecx)
63      cycles for 2 (universal scaling proc) CMOV
80      cycles for 2.1 (universal scaling proc) i386
87      cycles for sinsi's
89      cycles for sinsi's mod
74      cycles for sinsi's mod 2
100     cycles for sinsi's mod 3 (AMD)
99      cycles for sinsi's mod 4 (AMD)

sinsi

  • Member
  • ****
  • Posts: 996
Re: Bits Scaling to 2
« Reply #39 on: January 09, 2013, 08:01:27 PM »
Funny how the (AMD) ones are slowest on the AMD  :lol:

AMD Phenom(tm) II X6 1100T Processor (SSE3)
87      cycles for 1
78      cycles for 1.2 (dec ecx)
43      cycles for 2 (universal scaling proc) CMOV
46      cycles for 2.1 (universal scaling proc) i386
57      cycles for sinsi's
57      cycles for sinsi's mod
56      cycles for sinsi's mod 2
57      cycles for sinsi's mod 3 (AMD)
56      cycles for sinsi's mod 4 (AMD)

Intel(R) Core(TM) i3 CPU       M 380  @ 2.53GHz (SSE4)
75      cycles for 1
76      cycles for 1.2 (dec ecx)
43      cycles for 2 (universal scaling proc) CMOV
48      cycles for 2.1 (universal scaling proc) i386
65      cycles for sinsi's
55      cycles for sinsi's mod
54      cycles for sinsi's mod 2
48      cycles for sinsi's mod 3 (AMD)
48      cycles for sinsi's mod 4 (AMD)

Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz (SSE4)
63      cycles for 1
59      cycles for 1.2 (dec ecx)
47      cycles for 2 (universal scaling proc) CMOV
51      cycles for 2.1 (universal scaling proc) i386
45      cycles for sinsi's
43      cycles for sinsi's mod
42      cycles for sinsi's mod 2
36      cycles for sinsi's mod 3 (AMD)
33      cycles for sinsi's mod 4 (AMD)

I can walk on water but stagger on beer.

FORTRANS

  • Member
  • ****
  • Posts: 944
Re: Bits Scaling to 2
« Reply #40 on: January 10, 2013, 01:59:54 AM »

pre-P4 (SSE1)
126     cycles for 1
126     cycles for 1.2 (dec ecx)
87      cycles for 2 (universal scaling proc) CMOV
88      cycles for 2.1 (universal scaling proc) i386
102     cycles for sinsi's
97      cycles for sinsi's mod
102     cycles for sinsi's mod 2
97      cycles for sinsi's mod 3 (AMD)
91      cycles for sinsi's mod 4 (AMD)
122     cycles for 1
124     cycles for 1.2 (dec ecx)
84      cycles for 2 (universal scaling proc) CMOV
85      cycles for 2.1 (universal scaling proc) i386
96      cycles for sinsi's
94      cycles for sinsi's mod
99      cycles for sinsi's mod 2
94      cycles for sinsi's mod 3 (AMD)
88      cycles for sinsi's mod 4 (AMD)
122     cycles for 1
124     cycles for 1.2 (dec ecx)
84      cycles for 2 (universal scaling proc) CMOV
85      cycles for 2.1 (universal scaling proc) i386
96      cycles for sinsi's
94      cycles for sinsi's mod
99      cycles for sinsi's mod 2
95      cycles for sinsi's mod 3 (AMD)
88      cycles for sinsi's mod 4 (AMD)

--- ok ---

Mobile Intel(R) Celeron(R) processor     600MHz (SSE2)
106   cycles for 1
110   cycles for 1.2 (dec ecx)
71   cycles for 2 (universal scaling proc) CMOV
68   cycles for 2.1 (universal scaling proc) i386
82   cycles for sinsi's
79   cycles for sinsi's mod
86   cycles for sinsi's mod 2
75   cycles for sinsi's mod 3 (AMD)
76   cycles for sinsi's mod 4 (AMD)
106   cycles for 1
110   cycles for 1.2 (dec ecx)
72   cycles for 2 (universal scaling proc) CMOV
67   cycles for 2.1 (universal scaling proc) i386
82   cycles for sinsi's
79   cycles for sinsi's mod
86   cycles for sinsi's mod 2
75   cycles for sinsi's mod 3 (AMD)
77   cycles for sinsi's mod 4 (AMD)
106   cycles for 1
110   cycles for 1.2 (dec ecx)
71   cycles for 2 (universal scaling proc) CMOV
68   cycles for 2.1 (universal scaling proc) i386
83   cycles for sinsi's
79   cycles for sinsi's mod
86   cycles for sinsi's mod 2
76   cycles for sinsi's mod 3 (AMD)
76   cycles for sinsi's mod 4 (AMD)

--- ok ---

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: Bits Scaling to 2
« Reply #41 on: January 10, 2013, 03:48:51 AM »
i had an idea, this morning   :biggrin:
the results aren't as fast, which is to be expected
but the function is very flexible...
Quote
;Flexible Bit Stretcher - DednDave - 1, 2013
;
;  Creates variable-multiples of bits from an input word.
;The repeat count for each input bit is derived as follows:
; (1) high-order count bit from high word of wParam
; (2) low-order count bits from lParam
;These 3 bits allow for individual repeat counts of 0 to 7 for each input bit.

these are the results on my prescott if i set it up to double all the bits...
Code: [Select]
00001h: 496 498 498 497 497
000FFh: 593 594 593 590 613
0AAAAh: 596 598 599 596 599
0FF00h: 593 595 594 592 593
0FFFFh: 640 639 641 640 641

Antariy

  • Member
  • ****
  • Posts: 541
Re: Bits Scaling to 2
« Reply #42 on: January 10, 2013, 01:16:24 PM »
Thanks to all for your tests! :biggrin:

Hi Dave :t
Your test:
Code: [Select]
00001h: 538 540 523 524 530
000FFh: 617 618 618 623 624
0AAAAh: 633 636 647 630 631
0FF00h: 639 624 638 641 645
0FFFFh: 679 674 678 675 674
Press any key to continue ...

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: Bits Scaling to 2
« Reply #43 on: January 10, 2013, 01:29:32 PM »
hi Alex
what do you think of the idea, though   :P

i was thinking of ways it might be used
among other things, it could be used to create complex state machines, i guess
it could be used to create variable bit-masks

Antariy

  • Member
  • ****
  • Posts: 541
Re: Bits Scaling to 2
« Reply #44 on: January 10, 2013, 01:39:50 PM »
I have not read the code yet, Dave :biggrin:
From what you said it sounds good.