News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Test results for AVX and AVX-512 needed

Started by Gunther, December 21, 2017, 11:43:25 AM

Previous topic - Next topic

Gunther

Hi aw27,

Quote from: aw27
Learning assembly language is important, even on the day compilers are able to do better than humans in every case. That day will arrive.
Right. Every programmer should know how computers work intern, how is hardware accessed etc. etc. We'll see if this day is coming.
Quote
I still remember the days most people believed it was impossible a machine to win on chess against a Grand Master because have no global vision, could not recognize patterns, had no sense of position, not able to think in strategic terms - could only use brute force. They were wrong, most chess programs nowadays beat easily every chess Grand master.
That's more complicated than it seems at first glance. Here is one of the best hit-parades of chess engines. It's updated at least weekly and very precise. I think your statement is true for the top scorers: Stockfish (by the way: Asmfish is a stockfish derivate), Kommodo, Houdini, Shredder etc.

I'm not a top correspondence chess player; my cc ELO is round about 2300. By comparison, the world ranking first is ELO 2688, because we haven't such usual ELO inflation. The calculation of our ELO numbers are a bit different; but that has proven itself. I'm using chess engines daily and with a little luck I'm qualified for the semifinals of the European Championship. However, I have had to pay a lot of apprenticeship. It's wrong to think: I'm using a chess engine now and will beat everyone else. You have to feed the Chess Engine with your own strategic ideas and then check that you have not overlooked any tactical finesse. That's the art. The only thing what chess engines are doing is: brute force. It has often happened to me that the engine suggests moves that ruin the pawn structure. That's poison for the entire game. So you have to look for an alternative and you need often a second opinion. But that's a big field and if anyone wants to keep discussing these questions, we should do that in a separate thread in the Soap Box. I still have a lot to talk about chess engines.

By the way: What says your machine now with the updated software in floatasm.zip?

Gunther
You have to know the facts before you can distort them.

aw27

@Felipe
We are in the era of the self-driving cars! Any cheap robot can kick that stupid cat out of the window reducing by one its many lifes!.

felipe

:biggrin:

Btw:
Quote from: Gunther on December 23, 2017, 04:50:27 AM
But that's a big field and if anyone wants to keep discussing these questions, we should do that in a separate thread in the Soap Box. I still have a lot to talk about chess engines.

:t

six_L

Quote
Calculating the sum of a float array in 5 different variants.
That'll take a little while. Please be patient ...

Simple C implementation:
------------------------
sum1              = 8390656.00
Elapsed Time      = 62.32 Seconds

C implementation with 4 accumulators:
-------------------------------------
sum2              = 8390656.00
Elapsed Time      = 44.33 Seconds
Performance Boost = 141%

Assembly language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 5.80 Seconds
Performance Boost = 1075%

Assembly Language with 4 YMM accumulators:
------------------------------------------
sum4              = 8390656.00
Elapsed Time      = 2.91 Seconds
Performance Boost = 2145%

Your current CPU doesn't support the AVX-512 instruction set.
You'll need at least the Knights Landing or Skylake architecture.

The application terminates now.
Quote
Calculating the sum of a float array in 5 different variants.
That'll take a little while. Please be patient ...

Simple C with assembly code generated by VS:
--------------------------------------------
sum1              = 8390656.00
Elapsed Time      = 62.45 Seconds

C and 4 accumulators with assembly code generated by VS:
--------------------------------------------------------
sum2              = 8390656.00
Elapsed Time      = 20.63 Seconds
Performance Boost = 303%

Assembly language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 5.19 Seconds
Performance Boost = 1204%

Assembly Language with 4 YMM accumulators:
------------------------------------------
sum4              = 8390656.00
Elapsed Time      = 2.62 Seconds
Performance Boost = 2381%

Your current CPU doesn't support the AVX-512 instruction set.
You'll need at least the Knights Landing or Skylake architecture.

The application terminates now.
Say you, Say me, Say the codes together for ever.

Gunther

Thank you six_L for running the software and providing the results. What's your environment? I assume at least Windows 7-64. The processor would be interesting: Intel or AMD?

Gunther
You have to know the facts before you can distort them.

Gunther

What I mean is this:
Quote from: fellipe
Haha, and here we go again, right?  ;)

:lol:

I would say, if that's a total true, so machines and computers will do everything better some day, but i think that's not correct. It's just a simple generalization. Humans will be always smartest than machines, even if we don't realize of that.  :biggrin:

Btw i always question the real importance of the chess play. Maybe it's a stupid game. Humans had give machines the role of doing stupid and brutal things in an important part. So, they can win a chess play, but a cat can piss on a computer.  :lol:

or that:
Quote from: aw27
@Felipe
We are in the era of the self-driving cars! Any cheap robot can kick that stupid cat out of the window reducing by one its many lifes!.

I am not the senior teacher here, just a simple forum member in the last row. Would it not be better to discuss such deep philosophical questions inside several threads in the Soap Box or in the Coloseum?

Gunther
You have to know the facts before you can distort them.

felipe



Calculating the sum of a float array in 5 different variants.
That'll take a little while. Please be patient ...

Simple C with assembly code generated by VS:
--------------------------------------------
sum1              = 8390656.00
Elapsed Time      = 75.10 Seconds

C and 4 accumulators with assembly code generated by VS:
--------------------------------------------------------
sum2              = 8390656.00
Elapsed Time      = 25.73 Seconds
Performance Boost = 292%

Assembly language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 6.63 Seconds
Performance Boost = 1132%

Your current CPU doesn't support the AVX instruction set.
You'll need at least the Sandy Bridge or Ivy Bridge architecture.

The application terminates now.

Your current CPU doesn't support the AVX-512 instruction set.
You'll need at least the Knights Landing or Skylake architecture.

The application terminates now.



Windows 8.1...And a formerly bay trail  :redface:

six_L

Say you, Say me, Say the codes together for ever.

felipe

Yeah, that was i saying with this:

Quote from: felipe on December 23, 2017, 05:12:33 AM
Btw:
Quote from: Gunther on December 23, 2017, 04:50:27 AM
But that's a big field and if anyone wants to keep discussing these questions, we should do that in a separate thread in the Soap Box. I still have a lot to talk about chess engines.
:t


nidud

#39
deleted

FORTRANS

Hi Gunther,

   i3, Win 8.1, notebook.


Calculating the sum of a float array in 5 different variants.
That'll take a little while. Please be patient ...

Simple C implementation:
------------------------
sum1              = 8390656.00
Elapsed Time      = 108.83 Seconds

C implementation with 4 accumulators:
-------------------------------------
sum2              = 8390656.00
Elapsed Time      = 73.05 Seconds
Performance Boost = 149%

Assembly language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 9.08 Seconds
Performance Boost = 1199%

Assembly Language with 4 YMM accumulators:
------------------------------------------
sum4              = 8390656.00
Elapsed Time      = 4.59 Seconds
Performance Boost = 2369%

Your current CPU doesn't support the AVX-512 instruction set.
You'll need at least the Knights Landing or Skylake architecture.

The application terminates now.

Calculating the sum of a float array in 5 different variants.
That'll take a little while. Please be patient ...

Simple C with assembly code generated by VS:
--------------------------------------------
sum1              = 8390656.00
Elapsed Time      = 107.92 Seconds

C and 4 accumulators with assembly code generated by VS:
--------------------------------------------------------
sum2              = 8390656.00
Elapsed Time      = 36.27 Seconds
Performance Boost = 298%

Assembly language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 9.08 Seconds
Performance Boost = 1189%

Assembly Language with 4 YMM accumulators:
------------------------------------------
sum4              = 8390656.00
Elapsed Time      = 4.58 Seconds
Performance Boost = 2357%

Your current CPU doesn't support the AVX-512 instruction set.
You'll need at least the Knights Landing or Skylake architecture.

The application terminates now.


HTH,

Steve N.

Gunther

Felipe,

thank you for testing floatasm. It's simply the code of VS which aw27 provided. I think that I should re-arrange the if statement. My fault, excuse me, please.

six_L,

special thanks for your detailed environment information. Where can I find Raistlins software?

nidud,

Quote from: nidud
I have hardware with support up to AVX-2 but AVX-512 is now implemented in Asmc. Good to see hardware is available for testing.

Wow, impressive link. It seems that you've included the complete instruction set, including the new mask registers.  :t

Steve (aka FORTRANS),

I am looking forward to hearing from you again. We had a long break. I very much hope that we will work together as comradely as we used to. In this sense: thank you for testing. Not bad for a small i3.

To sum up, so far all testers are driving on the Intel rail. Is AMD out of fashion?

Gunther
You have to know the facts before you can distort them.

HSE

Perfect now :t

Float:
Calculating the sum of a float array in 5 different variants.
That'll take a little while. Please be patient ...

Simple C implementation:
------------------------
sum1              = 8390656.00
Elapsed Time      = 85.06 Seconds

C implementation with 4 accumulators:
-------------------------------------
sum2              = 8390656.00
Elapsed Time      = 56.41 Seconds
Performance Boost = 151%

Assembly language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 7.21 Seconds
Performance Boost = 1180%

Your current CPU doesn't support the AVX instruction set.
You'll need at least the Sandy Bridge or Ivy Bridge architecture.

The application terminates now.

Your current CPU doesn't support the AVX-512 instruction set.
You'll need at least the Knights Landing or Skylake architecture.

The application terminates now.


Floatassembly:
Calculating the sum of a float array in 5 different variants.
That'll take a little while. Please be patient ...

Simple C with assembly code generated by VS:
--------------------------------------------
sum1              = 8390656.00
Elapsed Time      = 84.75 Seconds

C and 4 accumulators with assembly code generated by VS:
--------------------------------------------------------
sum2              = 8390656.00
Elapsed Time      = 28.91 Seconds
Performance Boost = 293%

Assembly language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 7.19 Seconds
Performance Boost = 1178%

Your current CPU doesn't support the AVX instruction set.
You'll need at least the Sandy Bridge or Ivy Bridge architecture.

The application terminates now.

Your current CPU doesn't support the AVX-512 instruction set.
You'll need at least the Knights Landing or Skylake architecture.

The application terminates now.
Equations in Assembly: SmplMath

Gunther

Hi HSE,

good to see that. Please excuse the inconveniences. But where the hell did the instruction set number 13 come from? I've no answer, to be honest.

Gunther
You have to know the facts before you can distort them.

felipe