Author Topic: largeadware question and gigabytes of memory  (Read 1399 times)

daydreamer

  • Member
  • *****
  • Posts: 1156
  • I also want a stargate
Re: largeadware question and gigabytes of memory
« Reply #15 on: December 28, 2019, 02:34:10 AM »
thanks AW,close to 4gb about the same with Cg app running on 64bit OS

tested so I can learn difference between x86 and x64 also

this is backwards thinking of to get prime numbers,also because the test by divide wasnt suitable for parallel SIMD,but this approach is more
« Last Edit: December 31, 2019, 09:35:47 PM by daydreamer »
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

daydreamer

  • Member
  • *****
  • Posts: 1156
  • I also want a stargate
Re: largeadware question and gigabytes of memory
« Reply #16 on: January 06, 2020, 10:29:08 PM »

reading the two assembly listings I have questions about is LEA slow compared to MUL in 64bit mode?or is 32bit compiler more evolved to find faster ways of coding than 64bit?
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7059
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: largeadware question and gigabytes of memory
« Reply #17 on: January 06, 2020, 10:35:36 PM »
LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

daydreamer

  • Member
  • *****
  • Posts: 1156
  • I also want a stargate
Re: largeadware question and gigabytes of memory
« Reply #18 on: January 25, 2020, 06:21:48 AM »
LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.
What about shuffle vs bitshift?
Movd xmmreg,general purpose reg,alternative to stack save?or save reg in 64bit?

Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

daydreamer

  • Member
  • *****
  • Posts: 1156
  • I also want a stargate
composites V3
« Reply #19 on: February 02, 2020, 08:12:05 PM »
LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.
64bit LEA's and MUL's faster than 32bits?
64bit listing was more readable,what is safe registers in 64bit calling and what registers need to be saved?

can I use option casesensitive/not casesensitive more than once,using SSE,its annoying I must use XMM's in capital ,make it not casesensitive breaks windows includes and produces lots of errors

Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

daydreamer

  • Member
  • *****
  • Posts: 1156
  • I also want a stargate
composite versions???
« Reply #20 on: February 14, 2020, 06:38:13 AM »
there is a problem with maincompos proc, the windows program just exits,which I call from a workerthread,removing invoke maincompos,i   it works
its not finished,going to compare calculation speed with mul,fmul,mulps,mulpd,also cons and pros with real8's,real10's,real4's,dword calculations
how do I check OS and cpuid supports avx in 32bit mode?

I have also a different version that runs on very slow 15mhz cpu,only 1-1000
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

LiaoMi

  • Member
  • ****
  • Posts: 672
Re: composite versions???
« Reply #21 on: February 15, 2020, 09:09:55 AM »
how do I check OS and cpuid supports avx in 32bit mode?

Hi daydreamer,

there were many examples on the forum  :icon_idea:
http://masm32.com/board/index.php?topic=3191.msg39457#msg39457

daydreamer

  • Member
  • *****
  • Posts: 1156
  • I also want a stargate
Re: composite versions???
« Reply #22 on: February 15, 2020, 12:39:24 PM »
how do I check OS and cpuid supports avx in 32bit mode?

Hi daydreamer,

there were many examples on the forum  :icon_idea:
http://masm32.com/board/index.php?topic=3191.msg39457#msg39457
thanks,there is something wrong and hard to find why it exits windows program in the above maincompos,so I started with avxvers proc

Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

LiaoMi

  • Member
  • ****
  • Posts: 672
Re: composite versions???
« Reply #23 on: February 15, 2020, 09:46:20 PM »
how do I check OS and cpuid supports avx in 32bit mode?

Hi daydreamer,

there were many examples on the forum  :icon_idea:
http://masm32.com/board/index.php?topic=3191.msg39457#msg39457
thanks,there is something wrong and hard to find why it exits windows program in the above maincompos,so I started with avxvers proc

Compile your test case if it’s not difficult for you  :icon_idea:

daydreamer

  • Member
  • *****
  • Posts: 1156
  • I also want a stargate
Re: largeadware question and gigabytes of memory
« Reply #24 on: February 16, 2020, 10:20:34 PM »
now writes some primes to prime.txt
gonna try SIMT also
« Last Edit: February 17, 2020, 04:43:24 AM by daydreamer »
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

daydreamer

  • Member
  • *****
  • Posts: 1156
  • I also want a stargate
Re: largeadware question and gigabytes of memory
« Reply #25 on: February 28, 2020, 07:52:13 AM »
LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.
PIV you are recommended use ADD and SUB instead of INC,DEC
but maybe can take advantage of ADD/SUB affects most flags,so you dont need to use cmp after INC/DEC,now when unrolled it uses ADD ecx,2 /cmp ecx,max /JB @@L1
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7059
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: largeadware question and gigabytes of memory
« Reply #26 on: February 28, 2020, 09:53:28 AM »
ADD and SUB set the zero flag, that is why you don't need a CMP to control a loop exit when value is 0. From long ago Intel recommend ADD and SUB over INC and DEC and they tend to know what they are talking about.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

daydreamer

  • Member
  • *****
  • Posts: 1156
  • I also want a stargate
Re: largeadware question and gigabytes of memory
« Reply #27 on: February 29, 2020, 02:14:10 AM »
ADD and SUB set the zero flag, that is why you don't need a CMP to control a loop exit when value is 0. From long ago Intel recommend ADD and SUB over INC and DEC and they tend to know what they are talking about.
but this is a very "odd" solution with odd numbers,but its at least 4 times faster
but maybe right solution with ADD and SUB because they uses all CMP flags, would be little faster
Code: [Select]
mov ecx,3

@@L1:
mov edx,3

@@L2:
    ;movss XMM3,XMM1 ;previous result
    cvtsi2ss XMM1,edx
    cvtsi2ss XMM2,ecx
    shufps XMM1,XMM1,0 ;broadcast all
    shufps XMM2,XMM2,0
    addps XMM1,xminc
    mulps XMM1,XMM2
    cvtps2pi MM0,XMM1
    movq xmi,MM0
    cvtss2si edi,XMM1
   
    lea ebx,[buffer+edi*2]
    mov [ebx],edi
    ;store second result in even results
    lea esi,[xmi+4]
    mov esi,[esi]
    lea ebx,[ebx+50000+esi*2]
    mov [ebx],esi
   

                  ;@@LI3:
add edx,2
cmp edx,256
jb @@L2


add ecx,2
cmp ecx,256
jb @@L1
EMMS
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

daydreamer

  • Member
  • *****
  • Posts: 1156
  • I also want a stargate
Re: largeadware question and gigabytes of memory
« Reply #28 on: March 04, 2020, 02:12:06 AM »
what api should I use to in 64bit to allocate memory with "nothrow"(C++ "new" allocate)?
allocate without nothrow program stops with an exception when fails,nothrow returns nullpointer so you can check afterwards and make a smaller allocate
there is several allocates in 32bit winapi???
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7059
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: largeadware question and gigabytes of memory
« Reply #29 on: March 04, 2020, 02:17:13 AM »
I wonder why you assume that a C++ definition is duplicated in the Win API. In both 32 and 64 bit you have the HeapAlloc family of allocations, the GlobalAlloc family of allocations and the VirtualAlloc family of allocations, pick what you need from the documentation for any of these.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy: