News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Min And Max function?

Started by Farabi, June 13, 2012, 01:25:00 PM

Previous topic - Next topic

RuiLoureiro

Quote from: dedndave on June 15, 2012, 11:50:17 PM
Quote from: jj2007 on June 15, 2012, 11:36:13 PMnumbers are always a bit different because the pseudo random generator produces a new set every time

seems like they ought to test the same array   :redface:
Dave,
           Could you show what you get ?

RuiLoureiro

Here is Myminmax
On exit st(0) = MIN
     and st(1) = MAX

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
Myminmax    proc    p:dword, n:dword

        mov     ecx, [esp+8]    ;n
        sub     ecx, 1
        mov     edx, [esp+4]    ;p
        fld     real4 ptr [edx+ecx*4]       ; set st(1) to MAX value
        fld     st(0)                       ; set st(0) to MIN value
        sub     ecx, 1                      ; point to the next value
  L0:
        fld     real4 ptr [edx+ecx*4]
        fcomi   st, st(1)                   ; compare st(1)=MIN with st(0)
        jae     L1
        fxch    st(1)
        jb      L2
               
  L1:   fcomi   st, st(2)                   ; compare st(2)=MAX with st(0)
        jbe     L2
        fxch    st(2)

  L2:   fstp    st
        sub     ecx, 1
        jns     L0                          ; if ecx>0 or ecx=0 loop to L0
        ret     8
Myminmax    endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

dedndave

Quote from: RuiLoureiro on June 16, 2012, 12:18:27 AMDave,
           Could you show what you get ?

sure   :biggrin:

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Getting min & max for 10000000 REAL8 values, version B:
105308 µs for FPU
32522 µs for ArrayMinMax, REAL4
46646 µs for ArrayMinMax, REAL8
27849 µs for SSE2, REAL8
101967 µs for fltminmax
21209 µs for Mymin
20266 µs for Mymin
43820 µs for Mymin+Mymax

103278 µs for FPU
29161 µs for ArrayMinMax, REAL4
48483 µs for ArrayMinMax, REAL8
27742 µs for SSE2, REAL8
101931 µs for fltminmax
20891 µs for Mymin
22931 µs for Mymin
39875 µs for Mymin+Mymax

Results:
ArrayMinMax=    -888.887786/999.998822
r4MinMax=       -888.887695/999.998962
r4MinMax=       -888.887634/999.998535
r8MinMax=       -888.887485/999.998558
SSE2Min=        -888.887614/999.998955

jj2007

Quote from: dedndave on June 15, 2012, 11:50:17 PM
Quote from: jj2007 on June 15, 2012, 11:36:13 PMnumbers are always a bit different because the pseudo random generator produces a new set every time

seems like they ought to test the same array   :redface:

RandFill proc uses ebx
  mov MbRndSeed, Mirror$("Ciao")

... but it doesn't make your algos faster :biggrin:

KeepingRealBusy

Here is my laptop:


Intel(R) Pentium(R) 4 CPU 3.20GHz (SSE2)
Getting min & max for 10000000 REAL8 values, version B:
72043 µs for FPU
28304 µs for ArrayMinMax, REAL4
43843 µs for ArrayMinMax, REAL8
34622 µs for SSE2, REAL8
69628 µs for fltminmax
20043 µs for Mymin
19272 µs for Mymin
38684 µs for Mymin+Mymax

72595 µs for FPU
25563 µs for ArrayMinMax, REAL4
41723 µs for ArrayMinMax, REAL8
39040 µs for SSE2, REAL8
72373 µs for fltminmax
19587 µs for Mymin
21988 µs for Mymin
43626 µs for Mymin+Mymax

Results:
ArrayMinMax=    -888.887786/999.998822
r4MinMax=       -888.887695/999.998962
r4MinMax=       -888.887634/999.998535
r8MinMax=       -888.887485/999.998558
SSE2Min=        -888.887614/999.998955


Dave.

RuiLoureiro

Thank you all for testing it
Jochen,
Quote
        but it doesn't make your algos faster
:biggrin:
          No, the result would be «00009 µs for Mymin»
          More ... or ... less !  :greensml:

MichaelW,
          Take a look at your reply #37
Quote
I converted qword's code to a procedure and applied Dave's modifications
(except the last) to my code, and did a cycle count comparison.
You NEVER compare the first value
          because you did this:
         
          sub ecx, 1
          jnz L0            ; when ECX=0 dont COMPARE !

          We MUST use jns and NOT jnz

          sub ecx, 1
          jns L0            ; when ECX>0 or ECX= 0 COMPARE !

KeepingRealBusy
Hi Dave
          Take a look at your reply #31
Quote
Always seed max and min with the first value, predecrement the index, 
then loop with jnz (since the first element was the basis of min/max) instead of jns

          No. We need to use jns instead of jnz
          Your error is here : "the first element was the basis of min/max"
          Here, the first element (is the last) is at ECX=length_of_array-1
          and not at ECX=0.
          In any way, if we test from first (ECX=0) to last [ECX=length_of_array-1]
          we compare ECX with length_of_array
          add   ecx, 1
          cmp   ecx, length_of_array
          jne   L?

KeepingRealBusy

Rui,

My statement is correct. When I said First, I meant First (element index 0 - whatever the array pointer points to). With using the index as both an index and a count, the first COMPARE will be the last element against the Low (if min testing) or against the high (if max testing), then decrementing the index/count, and skipping index 0 because the value was initially put into min or max (or both if testing min/max together).

Dave.

RuiLoureiro

#52
Quote from: KeepingRealBusy on June 16, 2012, 02:25:17 AM
Rui,

My statement is correct. When I said First, I meant First (element index 0 - whatever the array pointer points to). With using the index as both an index and a count, the first COMPARE will be the last element against the Low (if min testing) or against the high (if max testing), then decrementing the index/count, and skipping index 0 because the value was initially put into min or max (or both if testing min/max together).
Dave
Dave,
        Yes, this statement is completely correct.
         Now, i understood ! ;)
        But i would say it is out of context because we are using
        the last value as the first min/max and not «element index 0».
        And this is why MichaelW changed jns to jnz but not the first element.
        Following your suggestion, we could write a proc without one
        instruction dec ecx.

Here are the new procedures

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
Mymin   proc    p:dword, n:dword

        mov     ecx, [esp+8]    ;n
        ;sub     ecx, 1
        mov     edx, [esp+4]    ;p
        ;fld     real4 ptr [edx+ecx*4]       ; set st(0) to MIN value
        fld     real4 ptr [edx]             ; set st(0) to MIN value       
        sub     ecx, 1                      ; point to the next value
  L0:
        fld     real4 ptr [edx+ecx*4]
        fcomi   st, st(1)                   ; compare st(1)=MIN with st(0)
        jae     L1
        fstp    st(1)                       ; remove the MIM, st(0) is the MIN
        sub     ecx, 1
       ;jns     L0                          ; if ecx>0 or ecx=0 loop to L0       
        jnz     L0                          ; if ecx>0 loop to L0
        ret     8
  L1:   fstp    st
        sub     ecx, 1
       ;jns     L0                          ; if ecx>0 or ecx=0 loop to L0       
        jnz     L0                          ; if ecx>0 loop to L0
        ret     8
Mymin   endp

Mymax   proc    p:dword, n:dword

        mov     ecx, [esp+8]    ;n
        ;sub     ecx, 1
        mov     edx, [esp+4]    ;p
        ;fld     real4 ptr [edx+ecx*4]       ; set st(0) to MAX value
        fld     real4 ptr [edx]             ; set st(0) to MIN value       
        sub     ecx, 1                      ; point to the next value               
  L0:
        fld     real4 ptr [edx+ecx*4]       ; point to the next value
        fcomi   st, st(1)                   ; compare st(1)=MAX with st(0)
        jbe     L1
       
        fstp    st(1)                       ; remove the MAX, st(0) is the MAX
        sub     ecx, 1
        ;jns     L0                          ; if ecx>0 or ecx=0 loop to L0       
        jnz     L0                          ; if ecx>0 loop to L0
        ret     8               
  L1:
        fstp    st
        sub     ecx, 1
        ;jns     L0                          ; if ecx>0 or ecx=0 loop to L0       
        jnz     L0                          ; if ecx>0 loop to L0
        ret     8
Mymax   endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


This is the new minmax


OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
Myminmax    proc    p:dword, n:dword

        mov     ecx, [esp+8]    ;n
        ;sub     ecx, 1
        mov     edx, [esp+4]    ;p
       ;fld     real4 ptr [edx+ecx*4]       ; set st(1) to MAX value
        fld     real4 ptr [edx]             ; set st(1) to MAX value       
        fld     st(0)                       ; set st(0) to MIN value
        sub     ecx, 1                      ; point to the next value
  L0:
        fld     real4 ptr [edx+ecx*4]
        fcomi   st, st(1)                   ; compare st(1)=MIN with st(0)
        jae     L1
        fxch    st(1)
        jb      L2
               
  L1:   fcomi   st, st(2)                   ; compare st(2)=MAX with st(0)
        jbe     L2
        fxch    st(2)

  L2:   fstp    st
        sub     ecx, 1
        ;jns     L0                          ; if ecx>0 or ecx=0 loop to L0
        jnz     L0                          ; if ecx>0 loop to L0       
        ret     8
Myminmax    endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


KeepingRealBusy

Rui,

Under those conditions your code is correct and you need the jns to compare the first element.

You are setting min and max to the last value, and then comparing the next to the last values in turn against the min and max values.

Dave.

RuiLoureiro

Dave,
       Did you see the last code i posted ?
       It seems it is better. The first Min and Max is now
       the «element index 0», so i used jnz because
       we dont need to compare that element again
       (if ecx=0, exit)

KeepingRealBusy

Rui,

I did download some version and tested the .exe, (see above) but have not yet examined all of the code, so I'm not sure that I have your last version.

Dave.

jj2007

Quote from: RuiLoureiro on June 16, 2012, 02:48:54 AM
This is the new minmax

Pretty fast, second best on my CPU:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Getting min & max for 10000000 REAL4 and REAL8 values:
47 ms for FPU
70 ms for ArrayMinMax, REAL4
77 ms for ArrayMinMax, REAL8
31 ms for SSE2JJ, REAL8
45 ms for fltminmax
36 ms for Myminmax, REAL4

47 ms for FPU
70 ms for ArrayMinMax, REAL4
78 ms for ArrayMinMax, REAL8
31 ms for SSE2JJ, REAL8
46 ms for fltminmax
36 ms for Myminmax, REAL4

Results:
ArrayMinMax=    -888.887818/999.998966
r4MinMax=       -888.887817/999.998962
r4MinMax=       -888.887817/999.998962
r8MinMax=       -888.887818/999.998966
SSE2Min=        -888.887818/999.998966

51       bytes for fltminmax
42       bytes for Myminmax
29       bytes for SSE2JJ

RuiLoureiro

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Getting min & max for 10000000 REAL4 and REAL8 values:
120 ms for FPU
37 ms for ArrayMinMax, REAL4
54 ms for ArrayMinMax, REAL8
28 ms for SSE2a, REAL8
26 ms for SSE2b, REAL8
112 ms for fltminmax
24 ms for Mymin
30 ms for Mymin
47 ms for Mymin+Mymax
39 ms for Myminmax

109 ms for FPU
36 ms for ArrayMinMax, REAL4
81 ms for ArrayMinMax, REAL8
37 ms for SSE2a, REAL8
29 ms for SSE2b, REAL8
109 ms for fltminmax
24 ms for Mymin
24 ms for Mymin
50 ms for Mymin+Mymax
36 ms for Myminmax

Results:
ArrayMinMax=    -888.887818/999.998966
r4MinMax=       -888.887817/999.998962
r4MinMax=       -888.887817/999.998962
r8MinMax=       -888.887818/999.998966
SSE2Min=        -888.887818/999.998966

one of Mymin is Mymax Jochen !

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Getting min & max for 10000000 REAL4 and REAL8 values:
109 ms for FPU
31 ms for ArrayMinMax, REAL4
48 ms for ArrayMinMax, REAL8
28 ms for SSE2a, REAL8
29 ms for SSE2b, REAL8
103 ms for fltminmax
20 ms for Mymin
23 ms for Mymin
42 ms for Mymin+Mymax
33 ms for Myminmax

104 ms for FPU
31 ms for ArrayMinMax, REAL4
46 ms for ArrayMinMax, REAL8
28 ms for SSE2a, REAL8
26 ms for SSE2b, REAL8
105 ms for fltminmax
20 ms for Mymin
20 ms for Mymin
44 ms for Mymin+Mymax
32 ms for Myminmax

Results:
ArrayMinMax=    -888.887818/999.998966
r4MinMax=       -888.887817/999.998962
r4MinMax=       -888.887817/999.998962
r8MinMax=       -888.887818/999.998966
SSE2Min=        -888.887818/999.998966

jj2007

I've streamlined it a bit, and added code sizes :biggrin:

(oh, and I forgot: it's now exactly the same array - have you noticed how fast...  :eusa_boohoo:)