Author Topic: xlat is pretty fast  (Read 584 times)

jj2007

  • Member
  • *****
  • Posts: 12962
  • Assembler is fun ;-)
    • MasmBasic
xlat is pretty fast
« on: September 19, 2022, 08:11:42 AM »
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

394     cycles for 100 * xlat
458     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

386     cycles for 100 * xlat
457     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

387     cycles for 100 * xlat
457     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

387     cycles for 100 * xlat
458     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

13      bytes for xlat
14      bytes for movzx eax, byte ptr[ebx+ecx]

72      = eax xlat
72      = eax movzx eax, byte ptr[ebx+ecx]

Code: [Select]
align_64
NameA equ xlat ; assign a descriptive name for each test
TestA proc
  mov ebx, offset somestring
  push 99
  .Repeat
xor ecx, ecx ; 256
align 4
.Repeat
mov eax, ecx
xlat ; mov al,[ebx+al]
dec ecx
.Until Sign?
dec stack
  .Until Sign?
  pop edx
  ret
TestA endp

align_64
NameB equ movzx eax, byte ptr[ebx+ecx]
TestB proc
  mov ebx, offset somestring
  push 99
  .Repeat
xor ecx, ecx ; 256
align 4
.Repeat
movzx eax, byte ptr[ebx+ecx] ; mov al,[ebx+al]
dec ecx
.Until Sign?
dec stack
  .Until Sign?
  pop edx
  ret
TestB endp

zedd151

  • Member
  • *****
  • Posts: 1268
Re: xlat is pretty fast
« Reply #1 on: September 19, 2022, 08:19:55 AM »
Code: [Select]
Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (SSE4)

500     cycles for 100 * xlat
480     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

500     cycles for 100 * xlat
481     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

500     cycles for 100 * xlat
479     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

501     cycles for 100 * xlat
480     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

13      bytes for xlat
14      bytes for movzx eax, byte ptr[ebx+ecx]

72      = eax xlat
72      = eax movzx eax, byte ptr[ebx+ecx]

--- ok ---


Looks like a mixed bag. A little slower for my machine.
Been a little while since the last algo speed tests
... :biggrin:

HSE

  • Member
  • *****
  • Posts: 2194
  • AMD 7-32 / i3 10-64
Re: xlat is pretty fast
« Reply #2 on: September 19, 2022, 08:28:21 AM »
 :thumbsup:

But for Test A must be
Code: [Select]
xor eax, eax
...
dec eax
without mov eax, ecx ¿No?
Equations in Assembly: SmplMath

jj2007

  • Member
  • *****
  • Posts: 12962
  • Assembler is fun ;-)
    • MasmBasic
Re: xlat is pretty fast
« Reply #3 on: September 19, 2022, 09:54:12 AM »
:thumbsup:

But for Test A must be
Code: [Select]
xor eax, eax
...
dec eax
without mov eax, ecx ¿No?

xlat changes al. So you can't use eax as the loop counter.

HSE

  • Member
  • *****
  • Posts: 2194
  • AMD 7-32 / i3 10-64
Re: xlat is pretty fast
« Reply #4 on: September 19, 2022, 10:11:53 AM »
 :biggrin: :biggrin: Sorry.

Yet, that mov eax, ecx make comparison unfair.
Equations in Assembly: SmplMath

zedd151

  • Member
  • *****
  • Posts: 1268
Re: xlat is pretty fast
« Reply #5 on: September 19, 2022, 10:27:52 AM »
Yet, that mov eax, ecx make comparison unfair.
  :tongue:
... :biggrin:

jj2007

  • Member
  • *****
  • Posts: 12962
  • Assembler is fun ;-)
    • MasmBasic
Re: xlat is pretty fast
« Reply #6 on: September 19, 2022, 10:40:55 AM »
:biggrin: :biggrin: Sorry.

Yet, that mov eax, ecx make comparison unfair.

Propose a better solution :thumbsup:

HSE

  • Member
  • *****
  • Posts: 2194
  • AMD 7-32 / i3 10-64
Re: xlat is pretty fast
« Reply #7 on: September 19, 2022, 08:28:50 PM »
Propose a better solution :thumbsup:

  :thumbsup: Next week.

Perhaps you can't evaluate xlat out of context.
Equations in Assembly: SmplMath

TimoVJL

  • Member
  • *****
  • Posts: 1112
Re: xlat is pretty fast
« Reply #8 on: September 19, 2022, 08:33:00 PM »
Code: [Select]
AMD Athlon(tm) II X2 220 Processor (SSE3)

452     cycles for 100 * xlat
460     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

453     cycles for 100 * xlat
458     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

451     cycles for 100 * xlat
457     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

670     cycles for 100 * xlat
459     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

13      bytes for xlat
14      bytes for movzx eax, byte ptr[ebx+ecx]

72      = eax xlat
72      = eax movzx eax, byte ptr[ebx+ecx]

--- ok ---
May the source be with you

daydreamer

  • Member
  • *****
  • Posts: 2094
  • beer glass
Re: xlat is pretty fast
« Reply #9 on: September 19, 2022, 10:40:43 PM »
Code: [Select]
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (SSE4)

385     cycles for 100 * xlat
363     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

385     cycles for 100 * xlat
367     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

385     cycles for 100 * xlat
366     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

387     cycles for 100 * xlat
365     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

13      bytes for xlat
14      bytes for movzx eax, byte ptr[ebx+ecx]

72      = eax xlat
72      = eax movzx eax, byte ptr[ebx+ecx]


« Last Edit: September 21, 2022, 06:34:59 PM by hutch-- »
SIMD fan and macro fan
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 9764
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: xlat is pretty fast
« Reply #10 on: September 19, 2022, 11:34:30 PM »
This is the second window that popped up. I don't have one of the Xeons turned on at the moment.

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (SSE4)

403     cycles for 100 * xlat
398     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

403     cycles for 100 * xlat
397     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

405     cycles for 100 * xlat
397     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

403     cycles for 100 * xlat
395     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

13      bytes for xlat
14      bytes for movzx eax, byte ptr[ebx+ecx]

72      = eax xlat
72      = eax movzx eax, byte ptr[ebx+ecx]

--- ok ---
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

FORTRANS

  • Member
  • *****
  • Posts: 1190
Re: xlat is pretty fast
« Reply #11 on: September 20, 2022, 12:35:59 AM »
Hi,

   Three systems, two runs each.

Code: [Select]
F:\TEMP\TEST>xlattimi
pre-P4 (SSE1)

439     cycles for 100 * xlat
412     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

439     cycles for 100 * xlat
412     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

439     cycles for 100 * xlat
412     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

439     cycles for 100 * xlat
412     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

13      bytes for xlat
14      bytes for movzx eax, byte ptr[ebx+ecx]

72      = eax xlat
72      = eax movzx eax, byte ptr[ebx+ecx]

--- ok ---

pre-P4 (SSE1)

440 cycles for 100 * xlat
415 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

444 cycles for 100 * xlat
413 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

440 cycles for 100 * xlat
424 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

441 cycles for 100 * xlat
413 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

13 bytes for xlat
14 bytes for movzx eax, byte ptr[ebx+ecx]

72 = eax xlat
72 = eax movzx eax, byte ptr[ebx+ecx]

--- ok ---

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

510     cycles for 100 * xlat
298     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

508     cycles for 100 * xlat
295     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

511     cycles for 100 * xlat
309     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

503     cycles for 100 * xlat
294     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

13      bytes for xlat
14      bytes for movzx eax, byte ptr[ebx+ecx]

72      = eax xlat
72      = eax movzx eax, byte ptr[ebx+ecx]

--- ok ---

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

510 cycles for 100 * xlat
297 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

509 cycles for 100 * xlat
301 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

509 cycles for 100 * xlat
296 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

510 cycles for 100 * xlat
305 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

13 bytes for xlat
14 bytes for movzx eax, byte ptr[ebx+ecx]

72 = eax xlat
72 = eax movzx eax, byte ptr[ebx+ecx]

--- ok ---



Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

488     cycles for 100 * xlat
480     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

489     cycles for 100 * xlat
480     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

490     cycles for 100 * xlat
480     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

488     cycles for 100 * xlat
479     cycles for 100 * movzx eax, byte ptr[ebx+ecx]

13      bytes for xlat
14      bytes for movzx eax, byte ptr[ebx+ecx]

72      = eax xlat
72      = eax movzx eax, byte ptr[ebx+ecx]

--- ok ---

Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

490 cycles for 100 * xlat
486 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

490 cycles for 100 * xlat
481 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

486 cycles for 100 * xlat
483 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

491 cycles for 100 * xlat
480 cycles for 100 * movzx eax, byte ptr[ebx+ecx]

13 bytes for xlat
14 bytes for movzx eax, byte ptr[ebx+ecx]

72 = eax xlat
72 = eax movzx eax, byte ptr[ebx+ecx]

--- ok ---

Regards,

Steve

jj2007

  • Member
  • *****
  • Posts: 12962
  • Assembler is fun ;-)
    • MasmBasic
Re: xlat is pretty fast
« Reply #12 on: September 20, 2022, 12:39:50 AM »
Thanks, interesting :rolleyes:

HSE

  • Member
  • *****
  • Posts: 2194
  • AMD 7-32 / i3 10-64
Re: xlat is pretty fast
« Reply #13 on: September 20, 2022, 04:12:47 AM »
I am curious on performance on LUT sine/cosine ,xlat bytes vs other with real4 LUT

No Magnus. Xlat input is a byte and output also is a byte. You can't retrieve a floating point number.
Equations in Assembly: SmplMath

jj2007

  • Member
  • *****
  • Posts: 12962
  • Assembler is fun ;-)
    • MasmBasic
Re: xlat is pretty fast
« Reply #14 on: September 20, 2022, 05:08:14 AM »
I am curious on performance on LUT sine/cosine ,xlat bytes vs other with real4 LUT

No Magnus. Xlat input is a byte and output also is a byte. You can't retrieve a floating point number.

True, but mov eax, [ebx+4*ecx] is equally fast and would work for REAL4