News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

SSE DwordtoBinary String

Started by guga, May 03, 2020, 07:24:13 PM

Previous topic - Next topic

guga

Hi Guys

I gave a test on a Dword to binary string converter using SSE2. Can someone benchmark it for me please ? (Many thanks to Peter Cordes for the tip  :thumbsup:)

; only used for SSSE3
[<16 shuf_broadcast_hi_lo:
        B$ 1,1,1,1, 1,1,1,1     ; broadcast the second 8 bits to the first 8 bytes
        B$ 0,0,0,0, 0,0,0,0]     ; broadcast the first 8 bits to the second 8 bytes

  ; select the relevant bit within each byte, from high to low for printing
[<16 bitmask:  B$ 128, 64, 32, 16,      ; 1<<7,  1<<6, 1<<5, 1<<4
               B$ 8, 4, 2, 1,            ; 1<<3,  1<<2, 1<<1, 1<<0
               B$ 128, 64, 32, 16,       ; 1<<7,  1<<6, 1<<5, 1<<4
               B$ 8, 4, 2, 1]            ; 1<<3,  1<<2, 1<<1, 1<<0

[<16 ascii_ones: '1' #16] ; Number "1" (in Ascii) duplicated 16 times.

Proc numberToBin:
    Arguments @Number, @Output

    movd xmm0 D@Number    ; 32-bit load even though we only care about the low 16 bits.
    mov eax D@Output        ; Output buffer pointer

    ; to print left-to-right, we need the high bit to go in the first (low) byte
    punpcklbw xmm0 xmm0              ; llhh      (from low to high byte elements)
    pshuflw xmm0 xmm0 5               ;  5 hhhhllll
    punpckldq xmm0 xmm0              ; hhhhhhhhllllllll

    ; or with SSSE3:
    ; pshufb  xmm0 X$[shuf_broadcast_hi_lo]  ; SSSE3

    pand  xmm0 X$bitmask ; each input bit is now isolated within the corresponding output byte

    ; compare it against zero
        pxor    xmm1 xmm1
        pcmpeqb xmm0  xmm1          ; -1 in elements that are 0,   0 in elements with any non-zero bit.

    paddb xmm0 X$ascii_ones  ; '1' +  (-1 or 0) = '0' or 1'

    mov B$eax+16 0    ; terminating zero
    movups X$eax xmm0

EndP



Example of usage:

[testing: B$  0 #256]

            call numberToBin 123456, testing



I personally prefer a version without having to align the data, but, i tested it 1t to see if it was working :) . So perhaps using movdqu to load the values at bitmask and ascii_ones Tables would be better to avoid the need of alignment of data.

References:
https://stackoverflow.com/questions/40811218/creating-an-x86-assembler-program-that-converts-an-integer-to-a-16-bit-binary-st
https://www.agner.org/optimize
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

daydreamer

Quote from: guga on May 03, 2020, 07:24:13 PM
Hi Guys

I gave a test on a Dword to binary string converter using SSE2. Can someone benchmark it for me please ? (Many thanks to Peter Cordes for the tip  :thumbsup:)

; only used for SSSE3
[<16 shuf_broadcast_hi_lo:
        B$ 1,1,1,1, 1,1,1,1     ; broadcast the second 8 bits to the first 8 bytes
        B$ 0,0,0,0, 0,0,0,0]     ; broadcast the first 8 bits to the second 8 bytes

  ; select the relevant bit within each byte, from high to low for printing
[<16 bitmask:  B$ 128, 64, 32, 16,      ; 1<<7,  1<<6, 1<<5, 1<<4
               B$ 8, 4, 2, 1,            ; 1<<3,  1<<2, 1<<1, 1<<0
               B$ 128, 64, 32, 16,       ; 1<<7,  1<<6, 1<<5, 1<<4
               B$ 8, 4, 2, 1]            ; 1<<3,  1<<2, 1<<1, 1<<0

[<16 ascii_ones: '1' #16] ; Number "1" (in Ascii) duplicated 16 times.

Proc numberToBin:
    Arguments @Number, @Output

    movd xmm0 D@Number    ; 32-bit load even though we only care about the low 16 bits.
    mov eax D@Output        ; Output buffer pointer

    ; to print left-to-right, we need the high bit to go in the first (low) byte
    punpcklbw xmm0 xmm0              ; llhh      (from low to high byte elements)
    pshuflw xmm0 xmm0 5               ;  5 hhhhllll
    punpckldq xmm0 xmm0              ; hhhhhhhhllllllll

    ; or with SSSE3:
    ; pshufb  xmm0 X$[shuf_broadcast_hi_lo]  ; SSSE3

    pand  xmm0 X$bitmask ; each input bit is now isolated within the corresponding output byte

    ; compare it against zero
        pxor    xmm1 xmm1
        pcmpeqb xmm0  xmm1          ; -1 in elements that are 0,   0 in elements with any non-zero bit.

    paddb xmm0 X$ascii_ones  ; '1' +  (-1 or 0) = '0' or 1'

    mov B$eax+16 0    ; terminating zero
    movups X$eax xmm0

EndP



Example of usage:

[testing: B$  0 #256]

            call numberToBin 123456, testing



I personally prefer a version without having to align the data, but, i tested it 1t to see if it was working :) . So perhaps using movdqu to load the values at bitmask and ascii_ones Tables would be better to avoid the need of alignment of data.


References:
https://stackoverflow.com/questions/40811218/creating-an-x86-assembler-program-that-converts-an-integer-to-a-16-bit-binary-st
https://www.agner.org/optimize
why unaligned?you can use alignas(); in C/C++
I have seen earlier combine of non-SSE2 algo used in the first unaligned and last unaligned few bytes and in the middle aligned SS2 algo when it comes to character algos and I think thats a good alternative
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

guga

One issue. The fucntion seems to work only for 16 bit (Word) and not a Dword.

How to extend it to work with 32 bit numbers ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com