Author Topic: Intel SPMD Program Compiler  (Read 1151 times)

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: Intel SPMD Program Compiler
« Reply #15 on: October 12, 2019, 03:02:48 AM »
This is an example of using ISPC (Intel SPMD Program Compiler ) with MASM.
This is the straightforward method, there are other ways, namely the ISPC compiler can produce ASM, but requires a cleanup before being ready for assembly.

I selected the Simple sample from the distribution (the simplest of them all) and am building for SSE2 so everybody will be able to run it.
Output:

Code: [Select]
0: simple(0.000000) = 0.000000
1: simple(1.000000) = 1.000000
2: simple(2.000000) = 4.000000
3: simple(3.000000) = 1.732051
4: simple(4.000000) = 2.000000
5: simple(5.000000) = 2.236068
6: simple(6.000000) = 2.449490
7: simple(7.000000) = 2.645751
8: simple(8.000000) = 2.828427
9: simple(9.000000) = 3.000000
10: simple(10.000000) = 3.162278
11: simple(11.000000) = 3.316625
12: simple(12.000000) = 3.464102
13: simple(13.000000) = 3.605551
14: simple(14.000000) = 3.741657
15: simple(15.000000) = 3.872983

PS: To build, you will need to download ipsc.exe from the Intel link above.


TimoVJL

  • Member
  • ***
  • Posts: 476
Re: Intel SPMD Program Compiler
« Reply #16 on: October 12, 2019, 06:15:27 AM »
As ispc create a switcher for different instruction sets, it's an easy way to have them in same program.
In x64 speed gain is only 2 - 3 times faster.

Sadly some IDE's don't even give an opportunity select them for different functions.

In poide it's possible to change options for object file by editing build macro.
« Last Edit: October 12, 2019, 08:07:49 PM by TimoVJL »
May the source be with you

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: Intel SPMD Program Compiler
« Reply #17 on: October 12, 2019, 07:09:00 AM »
In x64 speed gain is only 2 - 3 times faster.

These targets on the mandelbrot sample.
--target=avx2-i32x16    > 12x
--target=avx1-i32x16    almost 10x
--target=sse2-i32x8           5 to 6 x

Not sure if your AMD supports AVX1 but at leats you can use --target=sse2-i32x8

TimoVJL

  • Member
  • ***
  • Posts: 476
Re: Intel SPMD Program Compiler
« Reply #18 on: October 12, 2019, 06:45:18 PM »
OK. Having a bad AMD CPU :sad:

How AMD Razor perform ?
« Last Edit: October 12, 2019, 08:16:10 PM by TimoVJL »
May the source be with you

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: Intel SPMD Program Compiler
« Reply #19 on: October 12, 2019, 07:28:49 PM »
 :biggrin:

In VS 2019 x64, Release Mode

Using the Custom Build Tool for the ispc file:
Command Line:
ispc -O2 "%(Filename).ispc" -o "$(ProjectDir)%(Filename).obj" -h "$(ProjectDir)%(Filename)_ispc.h" --target=sse2-i32x8 --opt=fast-math --target-os=windows
Output: $(ProjectDir)%(Filename).obj

So, for SSE2 (double-pumped) we get:

Code: [Select]
C++ compiler: Microsoft Visual C/C++ Version: 1923 Toolset: 192328106
@time of ISPC run:                      [57.462] million cycles
@time of ISPC run:                      [55.335] million cycles
@time of ISPC run:                      [58.917] million cycles
[mandelbrot ispc]:              [55.335] million cycles
Wrote image file mandelbrot-ispc.ppm
@time of serial run:                    [297.770] million cycles
@time of serial run:                    [297.318] million cycles
@time of serial run:                    [297.424] million cycles
[mandelbrot serial]:            [297.318] million cycles
Wrote image file mandelbrot-serial.ppm
                                (5.37x speedup from ISPC)

If for the Custom Build Tool:
Command Line:
ispc -O2 "%(Filename).ispc" -o "$(ProjectDir)%(Filename).obj" -h "$(ProjectDir)%(Filename)_ispc.h" --target=sse2-i32x8,avx1-i32x16,avx2-i32x16 --opt=fast-math --target-os=windows
Output: $(ProjectDir)%(Filename).obj;$(ProjectDir)%(Filename)_sse2.obj;$(ProjectDir)%(Filename)_avx.obj;$(ProjectDir)%(Filename)_avx2.obj

It will try to figure out the best for the running computer, for me it finds AVX2 (double-pumped), so:

Code: [Select]
C++ compiler: Microsoft Visual C/C++ Version: 1923 Toolset: 192328106
@time of ISPC run:                      [29.066] million cycles
@time of ISPC run:                      [25.350] million cycles
@time of ISPC run:                      [25.265] million cycles
[mandelbrot ispc]:              [25.265] million cycles
Wrote image file mandelbrot-ispc.ppm
@time of serial run:                    [296.230] million cycles
@time of serial run:                    [296.588] million cycles
@time of serial run:                    [299.479] million cycles
[mandelbrot serial]:            [296.230] million cycles
Wrote image file mandelbrot-serial.ppm
                                (11.72x speedup from ISPC)

TimoVJL

  • Member
  • ***
  • Posts: 476
Re: Intel SPMD Program Compiler
« Reply #20 on: October 13, 2019, 01:07:30 AM »
Can't continue with this kind of things, as i don't have access to proper hardware, but maybe a bit later.
A small test with Visual Studio 2017 was a nightmare, why it is so slow and difficult, made for masochist ?
May the source be with you

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: Intel SPMD Program Compiler
« Reply #21 on: October 13, 2019, 01:36:00 AM »
A small test with Visual Studio 2017 was a nightmare, why it is so slow and difficult, made for masochist ?
You need 32 GB RAM and a little faster CPU then it takes 1 second to load.

 :biggrin:


AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: Intel SPMD Program Compiler
« Reply #22 on: October 18, 2019, 01:56:06 AM »
Poor man's Mandelbrot  :sad:

Code: [Select]
..............................................................................
......................................................#.......................
..................................................########....................
.................................................##########...................
....................................##.#....##################.###............
....................................####################################......
................................########################################......
...............................############################################...
.............###.#####.#.......###########################################....
.........#.#################..############################################....
.....#.#..################################################################....
######################################################################........
.....#.#..################################################################....
.........#.#################..############################################....
.............###.#####.#.......###########################################....
...............................############################################...
................................########################################......
....................................####################################......
....................................##.#....##################.###............
.................................................##########...................
..................................................########....................
......................................................#.......................

Code (MSVC, original here):

Code: [Select]
#include <complex.h>
#include <stdio.h>

int main() {
int max_row = 22, max_column = 78, max_iteration = 20;
for (int row = 0; row < max_row; ++row) {
for (int column = 0; column < max_column; ++column) {
_Fcomplex z = {};
_Fcomplex  c = { (float)column * 2 / max_column - 1.5f , (float)row * 2 / max_row - 1.0f };
int iteration = 0;
while ((cabsf(z) < 2) && (++iteration < max_iteration))
{
z = cpowf(z, { 2.0f, 0 });
z = { real(z) + real(c) , imag(z) + imag(c) };
}
printf("%c", iteration == max_iteration ? '#' : '.');
}
printf("\n");
}
}


TimoVJL

  • Member
  • ***
  • Posts: 476
Re: Intel SPMD Program Compiler
« Reply #23 on: October 18, 2019, 02:55:33 AM »
converted to C99
Code: [Select]
#include <complex.h>
#include <stdio.h>

int main(void) {
int max_row = 22, max_column = 78, max_iteration = 20;
for (int row = 0; row < max_row; ++row) {
for (int column = 0; column < max_column; ++column) {
//_Fcomplex z = {};
float _Complex z = 0 + 0 * I;
//_Fcomplex  c = { (float)column * 2 / max_column - 1.5f , (float)row * 2 / max_row - 1.0f };
float _Complex c = ((float)column * 2 / max_column - 1.5f) + ((float)row * 2 / max_row - 1.0f) * I;
int iteration = 0;
while ((cabsf(z) < 2) && (++iteration < max_iteration))
{
//z = cpowf(z, { 2.0f, 0 });
z = cpowf(z, 2.0f + 0 * I);
//z = { real(z) + real(c) , imag(z) + imag(c) };
z = (crealf(z) + crealf(c)) + (cimagf(z) + cimagf(c)) * I;
}
printf("%c", iteration == max_iteration ? '#' : '.');
}
printf("\n");
}
}
and bit more
Code: [Select]
#include <complex.h>
#include <stdio.h>

int main(void) {
int max_row = 22, max_column = 78, max_iteration = 20;
for (int row = 0; row < max_row; ++row) {
for (int column = 0; column < max_column; ++column) {
float _Complex z = 0;
float _Complex c = ((float)column * 2 / max_column - 1.5f) + ((float)row * 2 / max_row - 1.0f) * I;
int iteration = 0;
while ((cabsf(z) < 2) && (++iteration < max_iteration))
z = z * z + c;
printf("%c", iteration == max_iteration ? '#' : '.');
}
printf("\n");
}
}
May the source be with you

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: Intel SPMD Program Compiler
« Reply #24 on: October 18, 2019, 07:41:26 AM »
Smaller?

Code: [Select]
#include <complex.h>
#include <stdio.h>

int main(void)
{
float _Complex z, c; int iteration, max_row = 22, max_column = 78, max_iteration = 20;
for (int row = 0; row < max_row; ++row) {
for (int column = 0; column < max_column; ++column) {
for (iteration = 0 , z=0, c = ((float)column * 2 / max_column - 1.5f) + ((float)row * 2 / max_row - 1.0f) * I; (cabsf(z) < 2) && (++iteration < max_iteration) ;)
z = z * z + c;
printf("%c", iteration == max_iteration ? '#' : '.');
}
printf("\n");
}
}
Builds to 23 KB (Intel compiler)

Microsoft way (same number of source lines):

Code: [Select]
#include <complex.h>
#include <stdio.h>

int main() {
_Fcomplex z, c; int iteration, max_row = 22, max_column = 78, max_iteration = 20;
for (int row = 0; row < max_row; ++row) {
for (int column = 0; column < max_column; ++column) {
for (iteration = 0, z = {}, c = { (float)column * 2 / max_column - 1.5f , (float)row * 2 / max_row - 1.0f }; (cabsf(z) < 2) && (++iteration < max_iteration);)
z = { z._Val[0] * z._Val[0] - z._Val[1] * z._Val[1] + c._Val[0], z._Val[0] * z._Val[1] + z._Val[1] * z._Val[0] + c._Val[1] };
printf("%c", iteration == max_iteration ? '#' : '.');
}
printf("\n");
}
}
Builds to 9 KB (VS 2019)
« Last Edit: October 18, 2019, 08:46:04 AM by AW »

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: Intel SPMD Program Compiler
« Reply #25 on: October 18, 2019, 06:09:11 PM »
New attempt. Microsoft 12 lines  :eusa_clap:, C99 13 lines  :eusa_pray:.

C99 (32-bit: Pelles builds to 36 KB, 64-bit: Pelles builds to 42 KB):
Code: [Select]
#include <complex.h>
#include <stdio.h>

int main(void) {
float _Complex z, c;
for (int row = 0, max_row = 22, max_iteration = 20; row < max_row; ++row) {
for (int column = 0, iteration, max_column = 78; column < max_column; ++column) {
for (z = 0, c = ((float)column * 2 / max_column - 1.5f) + ((float)row * 2 / max_row - 1.0f) * I, iteration = 0; (cabsf(z) < 2) && (++iteration < max_iteration); z = z * z + c);
printf("%c", iteration == max_iteration ? '#' : '.');
}
printf("\n");
}
}

MSVC (32-bit: VS 2019 builds to 9 KB, 64-bit: VS 2019 build to 11 KB):
Code: [Select]
#include <complex.h>
#include <stdio.h>

int main() {
for (int row = 0, max_row = 22, max_iteration = 20; row < max_row; ++row) {
for (int column = 0, iteration=0, max_column = 78; column < max_column; ++column, iteration = 0) {
for (_Fcomplex z = { 0,0 }, c = { (float)column * 2 / max_column - 1.5f , (float)row * 2 / max_row - 1.0f }; (cabsf(z) < 2) && (++iteration < max_iteration); z = { z._Val[0] * z._Val[0] - z._Val[1] * z._Val[1] + c._Val[0], z._Val[0] * z._Val[1] + z._Val[1] * z._Val[0] + c._Val[1] });
printf("%c", iteration == max_iteration ? '#' : '.');
}
printf("\n");
}
}

TimoVJL

  • Member
  • ***
  • Posts: 476
Re: Intel SPMD Program Compiler
« Reply #26 on: October 18, 2019, 07:22:47 PM »
If i try to compile a that example as C
Code: [Select]
>cl -GS- -MD PoorManFractalMS_AW.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.23.28106.4 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

PoorManFractalMS_AW.c
PoorManFractalMS_AW.c(7): error C2059: syntax error: '{'
so it was compiled as .cpp ?
Code: [Select]
>clang-cl -GS- -MD PoorManFractalMS_AW.c
PoorManFractalMS_AW.c(7,166): error: expected expression
  ..., (float)row * 2 / max_row - 1.0f }; (cabsf(z) < 2) && (++iteration < max_iteration); z = { z._Val[0] * z._Val[0...

                                                                                               ^
PoorManFractalMS_AW.c(7,166): error: expected ')'
PoorManFractalMS_AW.c(7,8): note: to match this '('
                        for (_Fcomplex z = { 0,0 }, c = { (float)column * 2 / max_column - 1.5f , (float)row * ...
                            ^
PoorManFractalMS_AW.c(7,287): warning: for loop has empty body [-Wempty-body]
  ...* z._Val[0] - z._Val[1] * z._Val[1] + c._Val[0], z._Val[0] * z._Val[1] + z._Val[1] * z._Val[0] + c._Val[1] });
                                                                                                                  ^
PoorManFractalMS_AW.c(7,287): note: put the semicolon on a separate line to silence this warning
1 warning and 2 errors generated.
and if size matters, Pelles C 9 5/6 kb
EDIT:
C11 made the <complex.h> header optional.
May the source be with you

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: Intel SPMD Program Compiler
« Reply #27 on: October 18, 2019, 08:27:39 PM »
Quote
so it was compiled as .cpp ?

This 12 lines shall build as "C" with MSVC:  :thumbsup:

#include <complex.h>
#include <stdio.h>

int main() {
   for (int row = 0, max_row = 22, max_iteration = 20; row < max_row; ++row) {
      for (int column = 0, iteration=0, max_column = 78; column < max_column; ++column, iteration = 0) {
         for (_Fcomplex z = { 0,0 }, x = { 0,0 }, c = { (float)column * 2 / max_column - 1.5f , (float)row * 2 / max_row - 1.0f }; (cabsf(z) < 2) && (++iteration < max_iteration); x._Val[0] = z._Val[0] * z._Val[0] - z._Val[1] * z._Val[1] + c._Val[0], x._Val[1] = z._Val[0] * z._Val[1] + z._Val[1] * z._Val[0] + c._Val[1], z._Val[0]=x._Val[0], z._Val[1] = x._Val[1]);
         printf("%c", iteration == max_iteration ? '#' : '.');
      }
      printf("\n");
   }
}


Quote
clang-cl -GS- -MD PoorManFractalMS_AW.c

clang-cl uses msvc include files, right?

Quote
and if size matters, Pelles C 9 5/6 kb

I am talking without tricks.
Without tricks it will go to 36KB/42KB.  :sad:

Edit: Fixed small bug, it requires an intermediate complex variable because z was overwritten here:
z._Val[0] = z._Val[0] * z._Val[0] - z._Val[1] * z._Val[1] + c._Val[0], z._Val[1] = z._Val[0] * z._Val[1] + z._Val[1] * z._Val[0] + c._Val[1]
.

« Last Edit: October 18, 2019, 10:45:38 PM by AW »

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: Intel SPMD Program Compiler
« Reply #28 on: October 18, 2019, 09:51:00 PM »
In MSVC, built with tricks (size 2560 bytes).  :biggrin:
Same code built with Clang-Cl hosted in VS (size 2048) in attachment 2.

Edit: Updated files to fix bug:
Edit: Fixed small bug, it requires an intermediate complex variable because z was overwritten here:
z._Val[0] = z._Val[0] * z._Val[0] - z._Val[1] * z._Val[1] + c._Val[0], z._Val[1] = z._Val[0] * z._Val[1] + z._Val[1] * z._Val[0] + c._Val[1]
.


« Last Edit: October 18, 2019, 10:53:00 PM by AW »

TimoVJL

  • Member
  • ***
  • Posts: 476
Re: Intel SPMD Program Compiler
« Reply #29 on: October 18, 2019, 11:03:20 PM »
without headers and with ucrtbase.lib
Code: [Select]
cl -GS- -O2 -Zl PMF_AW1.c -FePMF_AW1_ms.exe -link -nocoffgrpinfo -fixedmsvc 2019: 1 536 bytes / 2 560 bytes
Code: [Select]
#pragma comment(lib, "ucrtbase.lib")

typedef struct _C_float_complex {
float _Val[2];
} _C_float_complex;
typedef _C_float_complex _Fcomplex;

__declspec(dllimport) int __cdecl putchar(int c);
__declspec(dllimport) float __cdecl cabsf(_Fcomplex z);

void __cdecl mainCRTStartup(void)
{
__declspec(dllimport) void __cdecl exit(int status);
int __cdecl main(void);
exit(main());
}

int _fltused;

int main(void) {
for (int row = 0, max_row = 22, max_iteration = 20; row < max_row; ++row) {
for (int column = 0, iteration=0, max_column = 78; column < max_column; ++column, iteration = 0) {
//for (_Fcomplex z = { 0,0 }, c = { (float)column * 2 / max_column - 1.5f , (float)row * 2 / max_row - 1.0f }; (cabsf(z) < 2) && (++iteration < max_iteration); z._Val[0] = z._Val[0] * z._Val[0] - z._Val[1] * z._Val[1] + c._Val[0], z._Val[1] = z._Val[0] * z._Val[1] + z._Val[1] * z._Val[0] + c._Val[1]);
for (_Fcomplex z = { 0,0 }, x = { 0,0 }, c = { (float)column * 2 / max_column - 1.5f , (float)row * 2 / max_row - 1.0f }; (cabsf(z) < 2) && (++iteration < max_iteration); x._Val[0] = z._Val[0] * z._Val[0] - z._Val[1] * z._Val[1] + c._Val[0], x._Val[1] = z._Val[0] * z._Val[1] + z._Val[1] * z._Val[0] + c._Val[1], z._Val[0]=x._Val[0], z._Val[1] = x._Val[1]);
putchar(iteration == max_iteration ? '#' : '.');
}
putchar('\n');
}
}
clang 9 1 536 bytes / 2 560 bytes
Code: [Select]
bin\Clang-cl -GS- -O2 -Zl PMF_AW1.c -FePMF_AW1_cl.exe -link -nocoffgrpinfo -dynamicbase:no -fixed -mapwithout M$ headers or libs.
« Last Edit: October 19, 2019, 12:56:06 AM by TimoVJL »
May the source be with you