Author Topic: Fastest way to move 3 bytes into a dword  (Read 15110 times)

KeepingRealBusy

  • Member
  • ***
  • Posts: 426

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #16 on: January 21, 2013, 03:37:13 PM »
My results so far:
Quote
------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU  6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
------------------------------------------------------------------------
       72,205   cycles for Dave - 48 bytes MOV 4 bytes / AND
       72,142   cycles for Dave - 48 bytes MOV 4 bytes / AND
       72,138   cycles for Dave - 48 bytes MOV 4 bytes / AND
------------------------------------------------------------------------
       34,586   cycles for Siekmanski - 48 bytes SSSE3_24_32
       34,571   cycles for Siekmanski - 48 bytes SSSE3_24_32
       34,575   cycles for Siekmanski - 48 bytes SSSE3_24_32
------------------------------------------------------------------------
       30,106   cycles for Siekmanski - 48 bytes SSSE3_24_32 unrolled
       30,067   cycles for Siekmanski - 48 bytes SSSE3_24_32 unrolled
       30,111   cycles for Siekmanski - 48 bytes SSSE3_24_32 unrolled
------------------------------------------------------------------------
       34,651   cycles for Siekmanski - 48 bytes SSSE3_32_24
       34,758   cycles for Siekmanski - 48 bytes SSSE3_32_24
       34,736   cycles for Siekmanski - 48 bytes SSSE3_32_24
------------------------------------------------------------------------
       28,661   cycles for Siekmanski - 48 bytes SSSE3_32_24 unrolled
       28,645   cycles for Siekmanski - 48 bytes SSSE3_32_24 unrolled
       28,637   cycles for Siekmanski - 48 bytes SSSE3_32_24 unrolled
------------------------------------------------------------------------

I slightly modified the source to avoid the too many separator lines.
Siekmanski is apparently new on the forum and not used to our "traditional"
way of doing tests.  :lol:

Compliments Siekmanski, you made a very good job.  :t

Waiting for our masters version, though.  :P

Attached the modified source.

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #17 on: January 21, 2013, 03:55:49 PM »
i have a prescott that supports SSE3, Marinus
crashes at PSHUFB XMM0,XMM1   :P

87,665 cycles for the first test, though

You need an "S" more on your SSE level, otherwise no PSHUFB
for you.

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #18 on: January 22, 2013, 01:55:23 AM »
Somebody messed up my string  :eusa_naughty:
Code: [Select]
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
       48,058   cycles for Dave - 48 bytes MOV 4 bytes / AND
       48,069   cycles for Dave - 48 bytes MOV 4 bytes / AND
       48,036   cycles for Dave - 48 bytes MOV 4 bytes / AND
----------------------------------------------------------------------------
       54,914   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
       58,549   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
       60,832   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
----------------------------------------------------------------------------
       55,680   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
       55,653   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
       55,674   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
----------------------------------------------------------------------------
       34,575   cycles for Siekmanski - 48 bytes SSSE3_24_32
       34,625   cycles for Siekmanski - 48 bytes SSSE3_24_32
       34,540   cycles for Siekmanski - 48 bytes SSSE3_24_32
----------------------------------------------------------------------------
       30,030   cycles for Siekmanski - 48 bytes SSSE3_24_32 unrolled
       30,036   cycles for Siekmanski - 48 bytes SSSE3_24_32 unrolled
       30,037   cycles for Siekmanski - 48 bytes SSSE3_24_32 unrolled
----------------------------------------------------------------------------
       34,534   cycles for Siekmanski - 48 bytes SSSE3_32_24
       34,533   cycles for Siekmanski - 48 bytes SSSE3_32_24
       34,575   cycles for Siekmanski - 48 bytes SSSE3_32_24
----------------------------------------------------------------------------
       28,543   cycles for Siekmanski - 48 bytes SSSE3_32_24 unrolled
       28,533   cycles for Siekmanski - 48 bytes SSSE3_32_24 unrolled
       28,538   cycles for Siekmanski - 48 bytes SSSE3_32_24 unrolled
----------------------------------------------------------------------------
Here it is a string with 48 characters inside me
Her itis  stingwit 48chaactrs nsie meeeearaitis  stingwit 48chaactrs nsie meeeeara  stingwit 48chaac
trs nsie meeeearaingwit 48chaactrs nsie meeeearait 48chaactrs nsie meeeeara8chaactrs nsie meeeearaac
trs nsie meeeearas nsie meeeearaie meeeearaeeeearaaracte

Was it mr. Siekmansky  ::)

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #19 on: January 22, 2013, 02:38:22 AM »
Well Siekmansky, this report can help you find the bugs:
Code: [Select]
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
       48,125   cycles for Dave - 48 bytes MOV 4 bytes / AND
       48,093   cycles for Dave - 48 bytes MOV 4 bytes / AND
       48,180   cycles for Dave - 48 bytes MOV 4 bytes / AND
 Destination data Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
       54,307   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
       57,137   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
       57,148   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
 Destination data Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
       54,605   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
       44,780   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
       55,807   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
 Destination data Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
       34,573   cycles for Siekmanski - 48 bytes SSSE3_24_32
       34,576   cycles for Siekmanski - 48 bytes SSSE3_24_32
       34,565   cycles for Siekmanski - 48 bytes SSSE3_24_32
 Destination data  strinaracte
----------------------------------------------------------------------------
       30,059   cycles for Siekmanski - 48 bytes SSSE3_24_32 unrolled
       30,071   cycles for Siekmanski - 48 bytes SSSE3_24_32 unrolled
       30,095   cycles for Siekmanski - 48 bytes SSSE3_24_32 unrolled
 Destination data aracte
----------------------------------------------------------------------------
       34,573   cycles for Siekmanski - 48 bytes SSSE3_32_24
       34,611   cycles for Siekmanski - 48 bytes SSSE3_32_24
       34,572   cycles for Siekmanski - 48 bytes SSSE3_32_24
 Destination data HHHHHHHHHHHHiiiiiiiiiiiiaaaaaaaaaaaaaaaaaraHHHHHHHHiiiiiiiiiiiiaaaaaaaaaaaaaaaaara
HHHHiiiiiiiiiiiiaaaaaaaaaaaaaaaaaraiiiiiiiiiiiiaaaaaaaaaaaaaaaaaraiiiiiiiiaaaaaaaaaaaaaaaaaraiiiiaaa
aaaaaaaaaaaaaaraaaaaaaaaaaaaaaaaaraaaaaaaaaaaaaaraaaaaaaaaaraaaaaaraaracte
----------------------------------------------------------------------------
       28,563   cycles for Siekmanski - 48 bytes SSSE3_32_24 unrolled
       28,565   cycles for Siekmanski - 48 bytes SSSE3_32_24 unrolled
       28,603   cycles for Siekmanski - 48 bytes SSSE3_32_24 unrolled
 Destination data Her itis  stingwit 48chaactrs nsie meeeearaitis  stingwit 48chaactrs nsie meeeeara
  stingwit 48chaactrs nsie meeeearaingwit 48chaactrs nsie meeeearait 48chaactrs nsie meeeeara8chaact
rs nsie meeeearaactrs nsie meeeearas nsie meeeearaie meeeearaeeeearaaracte
----------------------------------------------------------------------------

Attached the program to test the correctness of destination data.

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: Fastest way to move 3 bytes into a dword
« Reply #20 on: January 22, 2013, 02:50:45 AM »
Code: [Select]
    mov     eax,[esi]
    and     eax,0FFFFFFh

AND is dependant on MOV being completed
it would help if you could put a "do something else" instruction in there
it would also help if 0FFFFFFh is in a register

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #21 on: January 22, 2013, 03:00:59 AM »
Code: [Select]
    mov     eax,[esi]
    and     eax,0FFFFFFh

AND is dependant on MOV being completed
it would help if you could put a "do something else" instruction in there
it would also help if 0FFFFFFh is in a register

Well, that's right, why don't you use the keyboard to do that?  :P

Siekmanski

  • Member
  • *****
  • Posts: 2329
Re: Fastest way to move 3 bytes into a dword
« Reply #22 on: January 22, 2013, 04:49:37 AM »
It's not easy to understand someone elses code  :lol:
I made it work OK now and exchanged the in and output.
Creative coders use backward thinking techniques as a strategy.

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #23 on: January 22, 2013, 05:16:53 AM »
It's not easy to understand someone elses code  :lol:
I made it work OK now and exchanged the in and output.


Grab the code I posted in previous post and test your code, something is
still buggy, as it appears by the output.
« Last Edit: January 22, 2013, 06:19:56 AM by frktons »

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #24 on: January 22, 2013, 11:21:44 AM »
Added my routine with XMM/SSE instructions. It requires a SSSE3
capable machine.
Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
       70,248   cycles for Dave - 48 bytes MOV 4 bytes / AND
       48,078   cycles for Dave - 48 bytes MOV 4 bytes / AND
       48,171   cycles for Dave - 48 bytes MOV 4 bytes / AND
 Destination data:Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
       50,346   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
       59,081   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
       56,535   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
 Destination data:Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
       47,055   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
       57,040   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
       57,420   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
 Destination data:Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
       25,662   cycles for Frank - 48 bytes XMM/SSSE3 - Unrolled
       38,100   cycles for Frank - 48 bytes XMM/SSSE3 - Unrolled
       38,110   cycles for Frank - 48 bytes XMM/SSSE3 - Unrolled
 Destination data:Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
----------------------------------------------------------------------------

I skipped Siekmansky code for the time being, waiting for his
tests with the attached program.

Frank

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #25 on: January 22, 2013, 01:10:50 PM »
Code: [Select]
And the last entry:
[quote]
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
       50,769   cycles for Dave - 48 bytes MOV 4 bytes / AND
       51,653   cycles for Dave - 48 bytes MOV 4 bytes / AND
       52,787   cycles for Dave - 48 bytes MOV 4 bytes / AND
 Destination data:Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
       43,982   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
       43,484   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
       44,191   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled
 Destination data:Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
       44,852   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
       48,817   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
       52,075   cycles for Dave - 48 bytes MOV 4 bytes / AND - Unrolled 2
 Destination data:Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
       30,254   cycles for Frank - 48 bytes XMM/SSSE3 - Unrolled
       25,419   cycles for Frank - 48 bytes XMM/SSSE3 - Unrolled
       29,016   cycles for Frank - 48 bytes XMM/SSSE3 - Unrolled
 Destination data:Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
       25,407   cycles for Frank - 48 bytes XMM/SSSE3 - Unrolled II
       28,221   cycles for Frank - 48 bytes XMM/SSSE3 - Unrolled II
       28,249   cycles for Frank - 48 bytes XMM/SSSE3 - Unrolled II
 Destination data:Here it is a string with 48 characters inside me
----------------------------------------------------------------------------
----------------------------------------------------------------------------