News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Optimizing some code

Started by RuiLoureiro, June 10, 2014, 06:54:45 PM

Previous topic - Next topic

dedndave

Quote from: RuiLoureiro on June 11, 2014, 11:07:08 PM

Quoteyep, that was better
What ? Did you see it ? Where is it ? Show us.
We should compare only comparable things.

the test depends on what you're after
most strings are not aligned
well - BSTR's are - and strings that are inside structures probably are
otherwise... you have to devise a test that tests all alignments

just as an example, i attached a test to look at

1) select a single core, and wait 750 mS to bind before testing
2) select loop counts that yield ~0.5 seconds per pass
3) all alignments are tested
(notice that each string is differently aligned)
4) 16 strings are tested - the overall result is divided by 16 and rounded to nearest
5) you get fewer outliers if you open a console window and type the program name than if you click on it
6) the test should show the processor (this one does not)   :P

Gunther

Results for Jochen:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

9731    cycles for 100 * Rui
1146    cycles for 100 * MB
3871    cycles for 100 * strlen32
2730    cycles for 100 * strlen32M
8707    cycles for 100 * Habran
5088    cycles for 100 * ShortLen (Dave)
5891    cycles for 100 * slen (Hutch)

8489    cycles for 100 * Rui
2387    cycles for 100 * MB
3879    cycles for 100 * strlen32
2722    cycles for 100 * strlen32M
8698    cycles for 100 * Habran
5101    cycles for 100 * ShortLen (Dave)
5866    cycles for 100 * slen (Hutch)

18823   cycles for 100 * Rui
2029    cycles for 100 * MB
7357    cycles for 100 * strlen32
7455    cycles for 100 * strlen32M
20309   cycles for 100 * Habran
12587   cycles for 100 * ShortLen (Dave)
17441   cycles for 100 * slen (Hutch)

20010   cycles for 100 * Rui
2033    cycles for 100 * MB
7387    cycles for 100 * strlen32
7442    cycles for 100 * strlen32M
19070   cycles for 100 * Habran
12663   cycles for 100 * ShortLen (Dave)
18717   cycles for 100 * slen (Hutch)

100     = eax Rui
100     = eax MB
100     = eax strlen32
100     = eax strlen32M
100     = eax Habran
100     = eax ShortLen (Dave)
100     = eax slen (Hutch)

--- ok ---


Results for Rui:


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
STRLEN test:
short - <1,3,6,10,20,30,40,50,70>
long  - <100,200,300,400,500,700,1000>
------------------------------------------------------
920     cycles - std
775     cycles - Rui
562     cycles - habran
378     cycles - Dave
544     cycles - Hutch
406     cycles - JJ
332     cycles - Rui32
246     cycles - Rui32M
209     cycles - JJ2

14480   cycles - std
10105   cycles - Rui
14099   cycles - habran
8649    cycles - Dave
9680    cycles - Hutch
7488    cycles - JJ
4513    cycles - Rui32
4634    cycles - Rui32M
1162    cycles - JJ2

--- ok ---


Gunther
You have to know the facts before you can distort them.

RuiLoureiro

Dave,
          must show the proc.
          it must be clear. otherwise i give him 0.
          if in the school, he must explain all bits.
          This is my rule:
          i don't accept any results unless
          you show your exercise.

Gunther

Rui,

Quote from: RuiLoureiro on June 11, 2014, 11:35:46 PM
Dave,
          must show the proc.
          it must be clear. otherwise i give him 0.
          if in the school, he must explain all bits.
          This is my rule:
          i don't accept any results unless
          you show your exercise.
but slen.zip contains the source. What's your point?

Gunther
You have to know the facts before you can distort them.

nidud

#34
deleted

RuiLoureiro

Gunther,
              what are you talking about ?
              Could you post here the proc
              that i am talking about ?

Gunther

Rui,

no offense. But the zip archive under post #30 contains the source. Or do you mean another procedure?

Gunther
You have to know the facts before you can distort them.

RuiLoureiro

No, iam not talking about it.
Do you know this:
This is the place to post assembler algorithms and code design for discussion, optimisation and any other...
You may do any tests you want to do with the code
i posted. I want to do the same. That's the point.
Read my reply 10.

nidud,
          it is not correct to call "Rui32" and "Rui32M" but
          AgnerFog.

:biggrin: :biggrin:
EDIT: Gunther,
                      Could i have a discussion with you about "gambuzinos" ?
("gambuzino" is a creature that noone never saw him, noone never catch him. Sometimes we say to one: go to hunt "gambuzinos".)

Gunther

Rui,

Quote from: RuiLoureiro on June 12, 2014, 12:46:41 AM
EDIT: Gunther,
                      Could i have a discussion with you about "gambuzinos" ?
("gambuzino" is a creature that noone never saw him, noone never catch him. Sometimes we say to one: go to hunt "gambuzinos".)

I've read your post #10. I think that I'm talking about algorithms, I'm posting test results (not only in your thread), I'm not talking about gambuzinos, Yetis and other impossibilities.

But anyway, it's your thread. My apology, I won't post into your threads in the future.

Gunther
You have to know the facts before you can distort them.

RuiLoureiro

Hi
What did you do wrong, Gunther ?
I never saw anything wrong. It's clear.
My apology.

dedndave

fixed my routine - and added ShowCpu   :P
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
134 135 135 135 134
141 141 141 140 141

FORTRANS

Hi Dave,

   Here are some results.

Pre-Pentium4 (SSE1)
106 106 106 106 106
104 104 104 104 104
Press any key to continue ...

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
103 104 104 104 103
111 111 111 111 111
Press any key to continue ...


HTH,

Steve N.


Gunther

Dave,

results from slen2.exe:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
55 51 52 51 51
77 77 77 77 77
Press any key to continue ...


Gunther
You have to know the facts before you can distort them.

nidud

#43
deleted

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
79459   cycles - 0: standard (scasb)
7603    cycles - 1: AgnerFog
7785    cycles - 2: AgnerFog (unaligned)
9866    cycles - 3: Dave

16232   cycles - 0: standard (scasb)
7615    cycles - 1: AgnerFog
9145    cycles - 2: AgnerFog (unaligned)
10294   cycles - 3: Dave

16081   cycles - 0: standard (scasb)
7759    cycles - 1: AgnerFog
7763    cycles - 2: AgnerFog (unaligned)
10762   cycles - 3: Dave


you might want to increase the loop counts for better stability