Optimizing some code

dedndave · June 11, 2014, 11:14:33 PM

Quote from: RuiLoureiro on June 11, 2014, 11:07:08 PM

Quoteyep, that was better
What ? Did you see it ? Where is it ? Show us.
We should compare only comparable things.

the test depends on what you're after
most strings are not aligned
well - BSTR's are - and strings that are inside structures probably are
otherwise... you have to devise a test that tests all alignments

just as an example, i attached a test to look at

1) select a single core, and wait 750 mS to bind before testing
2) select loop counts that yield ~0.5 seconds per pass
3) all alignments are tested
(notice that each string is differently aligned)
4) 16 strings are tested - the overall result is divided by 16 and rounded to nearest
5) you get fewer outliers if you open a console window and type the program name than if you click on it
6) the test should show the processor (this one does not) :P

Gunther · June 11, 2014, 11:33:41 PM

Results for Jochen:

Code Select


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

9731    cycles for 100 * Rui
1146    cycles for 100 * MB
3871    cycles for 100 * strlen32
2730    cycles for 100 * strlen32M
8707    cycles for 100 * Habran
5088    cycles for 100 * ShortLen (Dave)
5891    cycles for 100 * slen (Hutch)

8489    cycles for 100 * Rui
2387    cycles for 100 * MB
3879    cycles for 100 * strlen32
2722    cycles for 100 * strlen32M
8698    cycles for 100 * Habran
5101    cycles for 100 * ShortLen (Dave)
5866    cycles for 100 * slen (Hutch)

18823   cycles for 100 * Rui
2029    cycles for 100 * MB
7357    cycles for 100 * strlen32
7455    cycles for 100 * strlen32M
20309   cycles for 100 * Habran
12587   cycles for 100 * ShortLen (Dave)
17441   cycles for 100 * slen (Hutch)

20010   cycles for 100 * Rui
2033    cycles for 100 * MB
7387    cycles for 100 * strlen32
7442    cycles for 100 * strlen32M
19070   cycles for 100 * Habran
12663   cycles for 100 * ShortLen (Dave)
18717   cycles for 100 * slen (Hutch)

100     = eax Rui
100     = eax MB
100     = eax strlen32
100     = eax strlen32M
100     = eax Habran
100     = eax ShortLen (Dave)
100     = eax slen (Hutch)

--- ok ---

Results for Rui:

Code Select



Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
STRLEN test:
short - <1,3,6,10,20,30,40,50,70>
long  - <100,200,300,400,500,700,1000>
------------------------------------------------------
920     cycles - std
775     cycles - Rui
562     cycles - habran
378     cycles - Dave
544     cycles - Hutch
406     cycles - JJ
332     cycles - Rui32
246     cycles - Rui32M
209     cycles - JJ2

14480   cycles - std
10105   cycles - Rui
14099   cycles - habran
8649    cycles - Dave
9680    cycles - Hutch
7488    cycles - JJ
4513    cycles - Rui32
4634    cycles - Rui32M
1162    cycles - JJ2

--- ok ---

Gunther

RuiLoureiro · June 11, 2014, 11:35:46 PM

Dave,
must show the proc.
it must be clear. otherwise i give him 0.
if in the school, he must explain all bits.
This is my rule:
i don't accept any results unless
you show your exercise.

Gunther · June 12, 2014, 12:14:01 AM

Rui,

Quote from: RuiLoureiro on June 11, 2014, 11:35:46 PM
Dave,
must show the proc.
it must be clear. otherwise i give him 0.
if in the school, he must explain all bits.
This is my rule:
i don't accept any results unless
you show your exercise.

but slen.zip contains the source. What's your point?

Gunther

nidud · June 12, 2014, 12:23:04 AM

deleted

RuiLoureiro · June 12, 2014, 12:40:12 AM

Gunther,
what are you talking about ?
Could you post here the proc
that i am talking about ?

Gunther · June 12, 2014, 12:42:41 AM

Rui,

no offense. But the zip archive under post #30 contains the source. Or do you mean another procedure?

Gunther

RuiLoureiro · June 12, 2014, 12:46:41 AM

No, iam not talking about it.
Do you know this:
This is the place to post assembler algorithms and code design for discussion, optimisation and any other...
You may do any tests you want to do with the code
i posted. I want to do the same. That's the point.
Read my reply 10.

nidud,
it is not correct to call "Rui32" and "Rui32M" but
AgnerFog.

EDIT: Gunther,
Could i have a discussion with you about "gambuzinos" ?
("gambuzino" is a creature that noone never saw him, noone never catch him. Sometimes we say to one: go to hunt "gambuzinos".)

Gunther · June 12, 2014, 04:57:59 AM

Rui,

Quote from: RuiLoureiro on June 12, 2014, 12:46:41 AM
EDIT: Gunther,
Could i have a discussion with you about "gambuzinos" ?
("gambuzino" is a creature that noone never saw him, noone never catch him. Sometimes we say to one: go to hunt "gambuzinos".)

I've read your post #10. I think that I'm talking about algorithms, I'm posting test results (not only in your thread), I'm not talking about gambuzinos, Yetis and other impossibilities.

But anyway, it's your thread. My apology, I won't post into your threads in the future.

Gunther

RuiLoureiro · June 12, 2014, 05:53:36 AM

Hi
What did you do wrong, Gunther ?
I never saw anything wrong. It's clear.
My apology.

dedndave · June 12, 2014, 09:36:12 AM

fixed my routine - and added ShowCpu :P

Code Select

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
134 135 135 135 134
141 141 141 140 141

FORTRANS · June 12, 2014, 11:23:18 PM

Hi Dave,

Here are some results.

Code Select

Pre-Pentium4 (SSE1)
106 106 106 106 106
104 104 104 104 104
Press any key to continue ...

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
103 104 104 104 103
111 111 111 111 111
Press any key to continue ...

HTH,

Steve N.

Gunther · June 12, 2014, 11:45:57 PM

Dave,

results from slen2.exe:

Code Select


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
55 51 52 51 51
77 77 77 77 77
Press any key to continue ...

Gunther

nidud · June 15, 2014, 01:28:58 AM

deleted

dedndave · June 15, 2014, 01:33:34 AM

prescott w/htt

Code Select

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
79459   cycles - 0: standard (scasb)
7603    cycles - 1: AgnerFog
7785    cycles - 2: AgnerFog (unaligned)
9866    cycles - 3: Dave

16232   cycles - 0: standard (scasb)
7615    cycles - 1: AgnerFog
9145    cycles - 2: AgnerFog (unaligned)
10294   cycles - 3: Dave

16081   cycles - 0: standard (scasb)
7759    cycles - 1: AgnerFog
7763    cycles - 2: AgnerFog (unaligned)
10762   cycles - 3: Dave

you might want to increase the loop counts for better stability

The MASM Forum

News: