Author Topic: SSE DwordtoBinary String  (Read 328 times)

guga

  • Member
  • *****
  • Posts: 1346
  • Assembly is a state of art.
    • RosAsm
SSE DwordtoBinary String
« on: May 03, 2020, 07:24:13 PM »
Hi Guys

I gave a test on a Dword to binary string converter using SSE2. Can someone benchmark it for me please ? (Many thanks to Peter Cordes for the tip  :thumbsup:)

Code: [Select]
; only used for SSSE3
[<16 shuf_broadcast_hi_lo:
        B$ 1,1,1,1, 1,1,1,1     ; broadcast the second 8 bits to the first 8 bytes
        B$ 0,0,0,0, 0,0,0,0]     ; broadcast the first 8 bits to the second 8 bytes

  ; select the relevant bit within each byte, from high to low for printing
[<16 bitmask:  B$ 128, 64, 32, 16,      ; 1<<7,  1<<6, 1<<5, 1<<4
               B$ 8, 4, 2, 1,            ; 1<<3,  1<<2, 1<<1, 1<<0
               B$ 128, 64, 32, 16,       ; 1<<7,  1<<6, 1<<5, 1<<4
               B$ 8, 4, 2, 1]            ; 1<<3,  1<<2, 1<<1, 1<<0

[<16 ascii_ones: '1' #16] ; Number "1" (in Ascii) duplicated 16 times.

Proc numberToBin:
    Arguments @Number, @Output

    movd xmm0 D@Number    ; 32-bit load even though we only care about the low 16 bits.
    mov eax D@Output        ; Output buffer pointer

    ; to print left-to-right, we need the high bit to go in the first (low) byte
    punpcklbw xmm0 xmm0              ; llhh      (from low to high byte elements)
    pshuflw xmm0 xmm0 5               ;  5 hhhhllll
    punpckldq xmm0 xmm0              ; hhhhhhhhllllllll

    ; or with SSSE3:
    ; pshufb  xmm0 X$[shuf_broadcast_hi_lo]  ; SSSE3

    pand  xmm0 X$bitmask ; each input bit is now isolated within the corresponding output byte

    ; compare it against zero
        pxor    xmm1 xmm1
        pcmpeqb xmm0  xmm1          ; -1 in elements that are 0,   0 in elements with any non-zero bit.

    paddb xmm0 X$ascii_ones  ; '1' +  (-1 or 0) = '0' or 1'

    mov B$eax+16 0    ; terminating zero
    movups X$eax xmm0

EndP


Example of usage:
Code: [Select]
[testing: B$  0 #256]

            call numberToBin 123456, testing


I personally prefer a version without having to align the data, but, i tested it 1t to see if it was working :) . So perhaps using movdqu to load the values at bitmask and ascii_ones Tables would be better to avoid the need of alignment of data.

References:
https://stackoverflow.com/questions/40811218/creating-an-x86-assembler-program-that-converts-an-integer-to-a-16-bit-binary-st
https://www.agner.org/optimize
« Last Edit: May 04, 2020, 11:53:21 AM by guga »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

daydreamer

  • Member
  • *****
  • Posts: 1385
  • building nextdoor
Re: DwordtoBinary String
« Reply #1 on: May 03, 2020, 08:14:31 PM »
Hi Guys

I gave a test on a Dword to binary string converter using SSE2. Can someone benchmark it for me please ? (Many thanks to Peter Cordes for the tip  :thumbsup:)

Code: [Select]
; only used for SSSE3
[<16 shuf_broadcast_hi_lo:
        B$ 1,1,1,1, 1,1,1,1     ; broadcast the second 8 bits to the first 8 bytes
        B$ 0,0,0,0, 0,0,0,0]     ; broadcast the first 8 bits to the second 8 bytes

  ; select the relevant bit within each byte, from high to low for printing
[<16 bitmask:  B$ 128, 64, 32, 16,      ; 1<<7,  1<<6, 1<<5, 1<<4
               B$ 8, 4, 2, 1,            ; 1<<3,  1<<2, 1<<1, 1<<0
               B$ 128, 64, 32, 16,       ; 1<<7,  1<<6, 1<<5, 1<<4
               B$ 8, 4, 2, 1]            ; 1<<3,  1<<2, 1<<1, 1<<0

[<16 ascii_ones: '1' #16] ; Number "1" (in Ascii) duplicated 16 times.

Proc numberToBin:
    Arguments @Number, @Output

    movd xmm0 D@Number    ; 32-bit load even though we only care about the low 16 bits.
    mov eax D@Output        ; Output buffer pointer

    ; to print left-to-right, we need the high bit to go in the first (low) byte
    punpcklbw xmm0 xmm0              ; llhh      (from low to high byte elements)
    pshuflw xmm0 xmm0 5               ;  5 hhhhllll
    punpckldq xmm0 xmm0              ; hhhhhhhhllllllll

    ; or with SSSE3:
    ; pshufb  xmm0 X$[shuf_broadcast_hi_lo]  ; SSSE3

    pand  xmm0 X$bitmask ; each input bit is now isolated within the corresponding output byte

    ; compare it against zero
        pxor    xmm1 xmm1
        pcmpeqb xmm0  xmm1          ; -1 in elements that are 0,   0 in elements with any non-zero bit.

    paddb xmm0 X$ascii_ones  ; '1' +  (-1 or 0) = '0' or 1'

    mov B$eax+16 0    ; terminating zero
    movups X$eax xmm0

EndP


Example of usage:
Code: [Select]
[testing: B$  0 #256]

            call numberToBin 123456, testing


I personally prefer a version without having to align the data, but, i tested it 1t to see if it was working :) . So perhaps using movdqu to load the values at bitmask and ascii_ones Tables would be better to avoid the need of alignment of data.


References:
https://stackoverflow.com/questions/40811218/creating-an-x86-assembler-program-that-converts-an-integer-to-a-16-bit-binary-st
https://www.agner.org/optimize
why unaligned?you can use alignas(); in C/C++
I have seen earlier combine of non-SSE2 algo used in the first unaligned and last unaligned few bytes and in the middle aligned SS2 algo when it comes to character algos and I think thats a good alternative
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

guga

  • Member
  • *****
  • Posts: 1346
  • Assembly is a state of art.
    • RosAsm
Re: SSE DwordtoBinary String
« Reply #2 on: May 04, 2020, 11:54:08 AM »
One issue. The fucntion seems to work only for 16 bit (Word) and not a Dword.

How to extend it to work with 32 bit numbers ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com