News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

largeadware question and gigabytes of memory

Started by daydreamer, December 12, 2019, 01:36:28 AM

Previous topic - Next topic

daydreamer

#15
thanks AW,close to 4gb about the same with Cg app running on 64bit OS

tested so I can learn difference between x86 and x64 also

this is backwards thinking of to get prime numbers,also because the test by divide wasnt suitable for parallel SIMD,but this approach is more
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

daydreamer


reading the two assembly listings I have questions about is LEA slow compared to MUL in 64bit mode?or is 32bit compiler more evolved to find faster ways of coding than 64bit?
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

hutch--

LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.

daydreamer

Quote from: hutch-- on January 06, 2020, 10:35:36 PM
LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.
What about shuffle vs bitshift?
Movd xmmreg,general purpose reg,alternative to stack save?or save reg in 64bit?

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

daydreamer

Quote from: hutch-- on January 06, 2020, 10:35:36 PM
LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.
64bit LEA's and MUL's faster than 32bits?
64bit listing was more readable,what is safe registers in 64bit calling and what registers need to be saved?

can I use option casesensitive/not casesensitive more than once,using SSE,its annoying I must use XMM's in capital ,make it not casesensitive breaks windows includes and produces lots of errors

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

daydreamer

there is a problem with maincompos proc, the windows program just exits,which I call from a workerthread,removing invoke maincompos,i   it works
its not finished,going to compare calculation speed with mul,fmul,mulps,mulpd,also cons and pros with real8's,real10's,real4's,dword calculations
how do I check OS and cpuid supports avx in 32bit mode?

I have also a different version that runs on very slow 15mhz cpu,only 1-1000
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

LiaoMi

Quote from: daydreamer on February 14, 2020, 06:38:13 AM
how do I check OS and cpuid supports avx in 32bit mode?

Hi daydreamer,

there were many examples on the forum  :icon_idea:
http://masm32.com/board/index.php?topic=3191.msg39457#msg39457

daydreamer

Quote from: LiaoMi on February 15, 2020, 09:09:55 AM
Quote from: daydreamer on February 14, 2020, 06:38:13 AM
how do I check OS and cpuid supports avx in 32bit mode?

Hi daydreamer,

there were many examples on the forum  :icon_idea:
http://masm32.com/board/index.php?topic=3191.msg39457#msg39457
thanks,there is something wrong and hard to find why it exits windows program in the above maincompos,so I started with avxvers proc

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

LiaoMi

Quote from: daydreamer on February 15, 2020, 12:39:24 PM
Quote from: LiaoMi on February 15, 2020, 09:09:55 AM
Quote from: daydreamer on February 14, 2020, 06:38:13 AM
how do I check OS and cpuid supports avx in 32bit mode?

Hi daydreamer,

there were many examples on the forum  :icon_idea:
http://masm32.com/board/index.php?topic=3191.msg39457#msg39457
thanks,there is something wrong and hard to find why it exits windows program in the above maincompos,so I started with avxvers proc

Compile your test case if it's not difficult for you  :icon_idea:

daydreamer

#24
now writes some primes to prime.txt
gonna try SIMT also
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

daydreamer

Quote from: hutch-- on January 06, 2020, 10:35:36 PM
LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.
PIV you are recommended use ADD and SUB instead of INC,DEC
but maybe can take advantage of ADD/SUB affects most flags,so you dont need to use cmp after INC/DEC,now when unrolled it uses ADD ecx,2 /cmp ecx,max /JB @@L1
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

hutch--

ADD and SUB set the zero flag, that is why you don't need a CMP to control a loop exit when value is 0. From long ago Intel recommend ADD and SUB over INC and DEC and they tend to know what they are talking about.

daydreamer

Quote from: hutch-- on February 28, 2020, 09:53:28 AM
ADD and SUB set the zero flag, that is why you don't need a CMP to control a loop exit when value is 0. From long ago Intel recommend ADD and SUB over INC and DEC and they tend to know what they are talking about.
but this is a very "odd" solution with odd numbers,but its at least 4 times faster
but maybe right solution with ADD and SUB because they uses all CMP flags, would be little faster

mov ecx,3

@@L1:
mov edx,3

@@L2:
    ;movss XMM3,XMM1 ;previous result
    cvtsi2ss XMM1,edx
    cvtsi2ss XMM2,ecx
    shufps XMM1,XMM1,0 ;broadcast all
    shufps XMM2,XMM2,0
    addps XMM1,xminc
    mulps XMM1,XMM2
    cvtps2pi MM0,XMM1
    movq xmi,MM0
    cvtss2si edi,XMM1
   
    lea ebx,[buffer+edi*2]
    mov [ebx],edi
    ;store second result in even results
    lea esi,[xmi+4]
    mov esi,[esi]
    lea ebx,[ebx+50000+esi*2]
    mov [ebx],esi
   

                  ;@@LI3:
add edx,2
cmp edx,256
jb @@L2


add ecx,2
cmp ecx,256
jb @@L1
EMMS

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

daydreamer

what api should I use to in 64bit to allocate memory with "nothrow"(C++ "new" allocate)?
allocate without nothrow program stops with an exception when fails,nothrow returns nullpointer so you can check afterwards and make a smaller allocate
there is several allocates in 32bit winapi???
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

hutch--

I wonder why you assume that a C++ definition is duplicated in the Win API. In both 32 and 64 bit you have the HeapAlloc family of allocations, the GlobalAlloc family of allocations and the VirtualAlloc family of allocations, pick what you need from the documentation for any of these.