MASM32 Downloads
; only used for SSSE3[<16 shuf_broadcast_hi_lo: B$ 1,1,1,1, 1,1,1,1 ; broadcast the second 8 bits to the first 8 bytes B$ 0,0,0,0, 0,0,0,0] ; broadcast the first 8 bits to the second 8 bytes ; select the relevant bit within each byte, from high to low for printing[<16 bitmask: B$ 128, 64, 32, 16, ; 1<<7, 1<<6, 1<<5, 1<<4 B$ 8, 4, 2, 1, ; 1<<3, 1<<2, 1<<1, 1<<0 B$ 128, 64, 32, 16, ; 1<<7, 1<<6, 1<<5, 1<<4 B$ 8, 4, 2, 1] ; 1<<3, 1<<2, 1<<1, 1<<0[<16 ascii_ones: '1' #16] ; Number "1" (in Ascii) duplicated 16 times.Proc numberToBin: Arguments @Number, @Output movd xmm0 D@Number ; 32-bit load even though we only care about the low 16 bits. mov eax D@Output ; Output buffer pointer ; to print left-to-right, we need the high bit to go in the first (low) byte punpcklbw xmm0 xmm0 ; llhh (from low to high byte elements) pshuflw xmm0 xmm0 5 ; 5 hhhhllll punpckldq xmm0 xmm0 ; hhhhhhhhllllllll ; or with SSSE3: ; pshufb xmm0 X$[shuf_broadcast_hi_lo] ; SSSE3 pand xmm0 X$bitmask ; each input bit is now isolated within the corresponding output byte ; compare it against zero pxor xmm1 xmm1 pcmpeqb xmm0 xmm1 ; -1 in elements that are 0, 0 in elements with any non-zero bit. paddb xmm0 X$ascii_ones ; '1' + (-1 or 0) = '0' or 1' mov B$eax+16 0 ; terminating zero movups X$eax xmm0EndP
[testing: B$ 0 #256] call numberToBin 123456, testing
Hi GuysI gave a test on a Dword to binary string converter using SSE2. Can someone benchmark it for me please ? (Many thanks to Peter Cordes for the tip )Code: [Select]; only used for SSSE3[<16 shuf_broadcast_hi_lo: B$ 1,1,1,1, 1,1,1,1 ; broadcast the second 8 bits to the first 8 bytes B$ 0,0,0,0, 0,0,0,0] ; broadcast the first 8 bits to the second 8 bytes ; select the relevant bit within each byte, from high to low for printing[<16 bitmask: B$ 128, 64, 32, 16, ; 1<<7, 1<<6, 1<<5, 1<<4 B$ 8, 4, 2, 1, ; 1<<3, 1<<2, 1<<1, 1<<0 B$ 128, 64, 32, 16, ; 1<<7, 1<<6, 1<<5, 1<<4 B$ 8, 4, 2, 1] ; 1<<3, 1<<2, 1<<1, 1<<0[<16 ascii_ones: '1' #16] ; Number "1" (in Ascii) duplicated 16 times.Proc numberToBin: Arguments @Number, @Output movd xmm0 D@Number ; 32-bit load even though we only care about the low 16 bits. mov eax D@Output ; Output buffer pointer ; to print left-to-right, we need the high bit to go in the first (low) byte punpcklbw xmm0 xmm0 ; llhh (from low to high byte elements) pshuflw xmm0 xmm0 5 ; 5 hhhhllll punpckldq xmm0 xmm0 ; hhhhhhhhllllllll ; or with SSSE3: ; pshufb xmm0 X$[shuf_broadcast_hi_lo] ; SSSE3 pand xmm0 X$bitmask ; each input bit is now isolated within the corresponding output byte ; compare it against zero pxor xmm1 xmm1 pcmpeqb xmm0 xmm1 ; -1 in elements that are 0, 0 in elements with any non-zero bit. paddb xmm0 X$ascii_ones ; '1' + (-1 or 0) = '0' or 1' mov B$eax+16 0 ; terminating zero movups X$eax xmm0EndPExample of usage:Code: [Select][testing: B$ 0 #256] call numberToBin 123456, testingI personally prefer a version without having to align the data, but, i tested it 1t to see if it was working :) . So perhaps using movdqu to load the values at bitmask and ascii_ones Tables would be better to avoid the need of alignment of data.References:https://stackoverflow.com/questions/40811218/creating-an-x86-assembler-program-that-converts-an-integer-to-a-16-bit-binary-sthttps://www.agner.org/optimize