The MASM Forum

General => The Campus => Topic started by: daydreamer on December 12, 2019, 01:36:28 AM

Title: largeadware question and gigabytes of memory
Post by: daydreamer on December 12, 2019, 01:36:28 AM
Hi
this is 64bit .exe
got error when trying to make array in static,that it was max 07fffffffffh something,its too big
is this and dynamic alloc need largadware,despite its 64bit mode?
also wonder if I need different alloc/free in 64bit mode than previous
some method of check maxmemory and if I get 64gb, subtract that with x gb(maybe 2-5gb) before I alloc memory ,so windows doesnt get unstable or crash?
whats typical memory hz in your 64bit cpu compared to clock freqency?if working with a memoryintensive program,I am only guaranteed memoryspeed?
Title: Re: largeadware question and gigabytes of memory
Post by: hutch-- on December 12, 2019, 03:20:02 AM
magnus,

With MASM (ml64) you use the /LARGEADDRESSAWARE option to ensure you only use fully compatible 64 bit code. When you get an error that says you cannot use that option, it will be a code construction that cannot be used in win64. It is usually things like trying to load a large "immediate" value into a memory variable, if you have the problem, load the value into a register first then move the register into the 64 bit variable.

As long as you keep the app fully compatible with 64 bit Windows, you can access very large amounts of memory, if you go the other route you limit the app to 32 bit address range for no real gain, you may as well write in 32 bit.
Title: Re: largeadware question and gigabytes of memory
Post by: aw27 on December 12, 2019, 03:44:28 AM
07fffffffffh = ‭549,755,813,887‬ bytes = 512 GB
Quite a lot, it will not work in most systems.
Title: Re: largeadware question and gigabytes of memory
Post by: daydreamer on December 25, 2019, 03:09:09 AM
Quote from: hutch-- on December 12, 2019, 03:20:02 AM

As long as you keep the app fully compatible with 64 bit Windows, you can access very large amounts of memory, if you go the other route you limit the app to 32 bit address range for no real gain, you may as well write in 32 bit.
I wish that you and other who has highend 64bit with loads of RAM can test it on 64gig RAM systems,with a 64bit version of a numbercrunching 32program,inspired by 64bit Cg apps that take advantage of all aviable RAM,so it can have lots of photorealistic textures while numbercrunching with algos with the right math to render photorealistic
Title: Re: largeadware question and gigabytes of memory
Post by: hutch-- on December 25, 2019, 06:40:46 AM
magnus,

You are missing something here, if you code in 32 bit Windows code, you have a 2 gig limit and about another gig if you use the /LARGEADDRESSAWARE flag. If you write 64 bit Windows code that uses the /LARGEADDRESSAWARE flag, you can use up to 192 gigabytes of memory if you have that much installed and make truly massive amounts of data.

Since you live in a northern hemisphere cold country, I hope you have a great white Christmas.
Title: Re: largeadware question and gigabytes of memory
Post by: jj2007 on December 25, 2019, 09:00:04 AM
In 32-bit code, adding the negative addresses with /LARGEADDRESSAWARE is an interesting adventure, because all code that relies on addresses being positive will produce nice surprises.

I wonder, though, why 64-bit code would need /LARGEADDRESSAWARE? 2^63 is 9.223e18... :rolleyes:
Title: Re: largeadware question and gigabytes of memory
Post by: hutch-- on December 25, 2019, 11:09:09 AM
> I wonder, though, why 64-bit code would need /LARGEADDRESSAWARE? 2^63 is 9.223e18...

So it does not get restricted to 2 gig. Without it, its one of those half arsed options for code that contains 32 bit constructions that are not valid in 64 bit mode. Its win32 on steroids as far as extra registers but knackered win64.  :eusa_hand:
Title: Re: largeadware question and gigabytes of memory
Post by: jj2007 on December 25, 2019, 12:01:24 PM
Right, that builds and runs indeed fine but is somehow against the 64-bit spirit:
include \Masm32\MasmBasic\Res\JBasic.inc ; ## console demo, builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
Init
  mov eax, Chr$("Hello World")
  PrintLine rax
  MsgBox 0, "Wow, that works indeed!!!!", "Hi", MB_OK
EndOfCode

OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
OPT_DebugL /LARGEADDRESSAWARE:NO
Title: Re: largeadware question and gigabytes of memory
Post by: TimoVJL on December 25, 2019, 06:10:47 PM
QuoteThe /LARGEADDRESSAWARE option tells the linker that the application can handle addresses larger than 2 gigabytes. In the 64-bit compilers, this option is enabled by default. In the 32-bit compilers, /LARGEADDRESSAWARE:NO is enabled if /LARGEADDRESSAWARE is not otherwise specified on the linker line.
Title: Re: largeadware question and gigabytes of memory
Post by: daydreamer on December 25, 2019, 10:59:08 PM
Quote from: hutch-- on December 25, 2019, 06:40:46 AM
You are missing something here, if you code in 32 bit Windows code, you have a 2 gig limit and about another gig if you use the /LARGEADDRESSAWARE flag. If you write 64 bit Windows code that uses the /LARGEADDRESSAWARE flag, you can use up to 192 gigabytes of memory if you have that much installed and make truly massive amounts of data.

Since you live in a northern hemisphere cold country, I hope you have a great white Christmas.
thanks,not so often white christmas here close to Copenhagen,hope you also have a great Christmas
only have 20gb and C code so far that use 2gb and also x64 testing dynamic alloc

@JJ
the coolest looking pointer bug I did many years ago in ddraw software renderer,sea and atmosphere textures blended,accidently part of sphere used sea texture for outer atmosphere,looked like cool transparent wrinkled plastic wrapped part of sphere
Title: Re: largeadware question and gigabytes of memory
Post by: hutch-- on December 25, 2019, 11:20:20 PM
He he, we had an almost sunny day today (Christmas Day) as it lightly rained last night and the wind blew the smoke away. For all of the heatwave predictions, we have not had very hot conditions here in Sydney, some evenings have been a bit cold. It would really help if we had a couple of days of really heavy rain but it does not look like its going to happen any time soon.
Title: Re: largeadware question and gigabytes of memory
Post by: aw27 on December 26, 2019, 09:06:32 PM
A 32-bit application on a 64-bit OS can use the /LARGEADDRESSAWARE linker switch to address 4GB.
This is what Raymond Chen says.
https://devblogs.microsoft.com/oldnewthing/20050601-24/?p=35483

"Use" does not mean allocate, I have never been able to allocate more than 2GB on a 32-bit application running on a 64-bit OS when using /LARGEADDRESSAWARE, either with HeapAlloc or VirtualAlloc. When using /LARGEADDRESSAWARE I can allocate a bit more than without /LARGEADDRESSAWARE, but never more than 2GB. We can address but we can not allocate, the system DLLs continue to be placed above the 2GB address as before.

Now, about 64-bit applications:
Some people use /LARGEADDRESSAWARE switch when linking a 64 bit application. This is not necessary because the default for 64-bit applications is /LARGEADDRESSAWARE. What I have seen some people doing (here) is use /LARGEADDRESSAWARE:NO on a 64-bit application.
Although legal, when used blindly, as tends to be, causes problems.
Title: Re: largeadware question and gigabytes of memory
Post by: hutch-- on December 26, 2019, 09:14:04 PM
With a 32 bit app with the /LARGEADDRESSAWARE flag set, you can allocate almost 2gig in one allocation but you can allocate another block depending on the extra size you get from using the flag after some of the high address space is used by some hardware.
Title: Re: largeadware question and gigabytes of memory
Post by: aw27 on December 26, 2019, 09:36:37 PM
Quote from: hutch-- on December 26, 2019, 09:14:04 PM
With a 32 bit app with the /LARGEADDRESSAWARE flag set, you can allocate almost 2gig in one allocation but you can allocate another block depending on the extra size you get from using the flag after some of the high address space is used by some hardware.
Yes, it is true.  :thumbsup:
Title: Re: largeadware question and gigabytes of memory
Post by: aw27 on December 27, 2019, 06:36:14 AM
Testing how far we can go with /LARGEADDRESSAWARE in 32-bit by top down allocation in chunks 10000 pages.

(https://www.dropbox.com/s/1odmuckvc1mienc/memalloc.png?dl=1)
Title: Re: largeadware question and gigabytes of memory
Post by: daydreamer on December 28, 2019, 02:34:10 AM
thanks AW,close to 4gb about the same with Cg app running on 64bit OS

tested so I can learn difference between x86 and x64 also

this is backwards thinking of to get prime numbers,also because the test by divide wasnt suitable for parallel SIMD,but this approach is more
Title: Re: largeadware question and gigabytes of memory
Post by: daydreamer on January 06, 2020, 10:29:08 PM

reading the two assembly listings I have questions about is LEA slow compared to MUL in 64bit mode?or is 32bit compiler more evolved to find faster ways of coding than 64bit?
Title: Re: largeadware question and gigabytes of memory
Post by: hutch-- on January 06, 2020, 10:35:36 PM
LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.
Title: Re: largeadware question and gigabytes of memory
Post by: daydreamer on January 25, 2020, 06:21:48 AM
Quote from: hutch-- on January 06, 2020, 10:35:36 PM
LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.
What about shuffle vs bitshift?
Movd xmmreg,general purpose reg,alternative to stack save?or save reg in 64bit?

Title: composites V3
Post by: daydreamer on February 02, 2020, 08:12:05 PM
Quote from: hutch-- on January 06, 2020, 10:35:36 PM
LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.
64bit LEA's and MUL's faster than 32bits?
64bit listing was more readable,what is safe registers in 64bit calling and what registers need to be saved?

can I use option casesensitive/not casesensitive more than once,using SSE,its annoying I must use XMM's in capital ,make it not casesensitive breaks windows includes and produces lots of errors

Title: composite versions???
Post by: daydreamer on February 14, 2020, 06:38:13 AM
there is a problem with maincompos proc, the windows program just exits,which I call from a workerthread,removing invoke maincompos,i   it works
its not finished,going to compare calculation speed with mul,fmul,mulps,mulpd,also cons and pros with real8's,real10's,real4's,dword calculations
how do I check OS and cpuid supports avx in 32bit mode?

I have also a different version that runs on very slow 15mhz cpu,only 1-1000
Title: Re: composite versions???
Post by: LiaoMi on February 15, 2020, 09:09:55 AM
Quote from: daydreamer on February 14, 2020, 06:38:13 AM
how do I check OS and cpuid supports avx in 32bit mode?

Hi daydreamer,

there were many examples on the forum  :icon_idea:
http://masm32.com/board/index.php?topic=3191.msg39457#msg39457 (http://masm32.com/board/index.php?topic=3191.msg39457#msg39457)
Title: Re: composite versions???
Post by: daydreamer on February 15, 2020, 12:39:24 PM
Quote from: LiaoMi on February 15, 2020, 09:09:55 AM
Quote from: daydreamer on February 14, 2020, 06:38:13 AM
how do I check OS and cpuid supports avx in 32bit mode?

Hi daydreamer,

there were many examples on the forum  :icon_idea:
http://masm32.com/board/index.php?topic=3191.msg39457#msg39457 (http://masm32.com/board/index.php?topic=3191.msg39457#msg39457)
thanks,there is something wrong and hard to find why it exits windows program in the above maincompos,so I started with avxvers proc

Title: Re: composite versions???
Post by: LiaoMi on February 15, 2020, 09:46:20 PM
Quote from: daydreamer on February 15, 2020, 12:39:24 PM
Quote from: LiaoMi on February 15, 2020, 09:09:55 AM
Quote from: daydreamer on February 14, 2020, 06:38:13 AM
how do I check OS and cpuid supports avx in 32bit mode?

Hi daydreamer,

there were many examples on the forum  :icon_idea:
http://masm32.com/board/index.php?topic=3191.msg39457#msg39457 (http://masm32.com/board/index.php?topic=3191.msg39457#msg39457)
thanks,there is something wrong and hard to find why it exits windows program in the above maincompos,so I started with avxvers proc

Compile your test case if it's not difficult for you  :icon_idea:
Title: Re: largeadware question and gigabytes of memory
Post by: daydreamer on February 16, 2020, 10:20:34 PM
now writes some primes to prime.txt
gonna try SIMT also
Title: Re: largeadware question and gigabytes of memory
Post by: daydreamer on February 28, 2020, 07:52:13 AM
Quote from: hutch-- on January 06, 2020, 10:35:36 PM
LEA was slow on a PIV processor but from the Core2 series upwards it would be faster than a MUL where the two capacities overlap.
PIV you are recommended use ADD and SUB instead of INC,DEC
but maybe can take advantage of ADD/SUB affects most flags,so you dont need to use cmp after INC/DEC,now when unrolled it uses ADD ecx,2 /cmp ecx,max /JB @@L1
Title: Re: largeadware question and gigabytes of memory
Post by: hutch-- on February 28, 2020, 09:53:28 AM
ADD and SUB set the zero flag, that is why you don't need a CMP to control a loop exit when value is 0. From long ago Intel recommend ADD and SUB over INC and DEC and they tend to know what they are talking about.
Title: Re: largeadware question and gigabytes of memory
Post by: daydreamer on February 29, 2020, 02:14:10 AM
Quote from: hutch-- on February 28, 2020, 09:53:28 AM
ADD and SUB set the zero flag, that is why you don't need a CMP to control a loop exit when value is 0. From long ago Intel recommend ADD and SUB over INC and DEC and they tend to know what they are talking about.
but this is a very "odd" solution with odd numbers,but its at least 4 times faster
but maybe right solution with ADD and SUB because they uses all CMP flags, would be little faster

mov ecx,3

@@L1:
mov edx,3

@@L2:
    ;movss XMM3,XMM1 ;previous result
    cvtsi2ss XMM1,edx
    cvtsi2ss XMM2,ecx
    shufps XMM1,XMM1,0 ;broadcast all
    shufps XMM2,XMM2,0
    addps XMM1,xminc
    mulps XMM1,XMM2
    cvtps2pi MM0,XMM1
    movq xmi,MM0
    cvtss2si edi,XMM1
   
    lea ebx,[buffer+edi*2]
    mov [ebx],edi
    ;store second result in even results
    lea esi,[xmi+4]
    mov esi,[esi]
    lea ebx,[ebx+50000+esi*2]
    mov [ebx],esi
   

                  ;@@LI3:
add edx,2
cmp edx,256
jb @@L2


add ecx,2
cmp ecx,256
jb @@L1
EMMS

Title: Re: largeadware question and gigabytes of memory
Post by: daydreamer on March 04, 2020, 02:12:06 AM
what api should I use to in 64bit to allocate memory with "nothrow"(C++ "new" allocate)?
allocate without nothrow program stops with an exception when fails,nothrow returns nullpointer so you can check afterwards and make a smaller allocate
there is several allocates in 32bit winapi???
Title: Re: largeadware question and gigabytes of memory
Post by: hutch-- on March 04, 2020, 02:17:13 AM
I wonder why you assume that a C++ definition is duplicated in the Win API. In both 32 and 64 bit you have the HeapAlloc family of allocations, the GlobalAlloc family of allocations and the VirtualAlloc family of allocations, pick what you need from the documentation for any of these.
Title: Re: largeadware question and gigabytes of memory
Post by: daydreamer on March 04, 2020, 02:35:08 AM
Quote from: hutch-- on March 04, 2020, 02:17:13 AM
I wonder why you assume that a C++ definition is duplicated in the Win API. In both 32 and 64 bit you have the HeapAlloc family of allocations, the GlobalAlloc family of allocations and the VirtualAlloc family of allocations, pick what you need from the documentation for any of these.
ok,thanks