News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Use the MOV instruction to obtain the value in the array

Started by wanker742126, September 04, 2024, 05:35:02 PM

Previous topic - Next topic

johnsa

You are wanting something that simply isn't possible.
You're either violating PE and address-space constraints or trying to hack in 32bit code-gen into 64bit mode.

var[base+idx*scale] addressing is fully supported in 32bit mode.

it should NEVER be used in 64bit code as even when the relocation is generated it won't be compatible with PE without adding LARGEADDRESSAWARE:NO - which is a terrible idea for many reasons.

You won't find this addressing form generated by any 64bit compiler (ie. C/C++) for this reason. It also prevents the addressing from being fully RIP relative. The accepted practice in 64bit code is to use LEA and then register_base+idx*scale.

lea rdi, myArray        ; RIP relative.
mov eax, [rdi+rcx*4]

I believe NASM can generate RIP relative references like:
mov eax, [rel myArray + rcx*4]
but I'm not sure if that actually solves the L.A.W. problem as it's simply not encodable in x64. I think the only form that would work is with a fixed constant like:
mov eax, [rel myArray + 10] as the ONLY addressing mode with RIP that x64 can handle is [RIP + ofs]

See:
https://stackoverflow.com/questions/48124293/can-rip-be-used-with-another-register-with-rip-relative-addressing
https://stackoverflow.com/questions/34058101/referencing-the-contents-of-a-memory-location-x86-addressing-modes


lucho

Quotevar[base+idx*scale] addressing is fully supported in 32bit mode.

it should NEVER be used in 64bit code as even when the relocation is generated it won't be compatible with PE without adding LARGEADDRESSAWARE:NO - which is a terrible idea for many reasons.

I have the following instruction in my 64-bit heapsort implementation, tested and working in both ELF (Unix-like) and PE (Windows) variants without problems:
CMP R8,[RDI+8*R10+8]which both GAS and UASM encode as
4E 3B 44 D7 08Here, of course, RDI is the base, R10 is the index, 8 is the scale, and the other 8 is the offset (or displacement - don't know which is the correct term here).

johnsa

Yep that form is fine,

[base+idx*scale+constantofs]

the problem is that if you want to reference a symbol, like an array, then you can't use a constant in 64bit, it needs to be RIP relative which you can't encode - unless you use LAA=NO so that the symbols relative distance is restricted in the address space. Also, you can't omit the base register, so it needs a pair.


lucho

My "Hello, world!" for Windows includes the following instruction:
LEA R9,WRITTENwhich "UASM -win64 -q -mf -Fl -Sa -zcw -Zd -Zi8" and LINK encode as
4c 8d 0d d3 3f 00 00which machine code Cygwin's "objdump -d" disassembles as
lea 0x3fd3(%rip),%r9Whether I give the linker /LARGEADDRESSAWARE or /LARGEADDRESSAWARE:NO, it produces the above machine code.

By the way, what's the real disadvantage of being "large address unaware" besides the 4 GB limit for a single process?

TimoVJL

Quote from: lucho on March 29, 2025, 07:46:27 PMWhether I give the linker /LARGEADDRESSAWARE or /LARGEADDRESSAWARE:NO, it produces the above machine code.

By the way, what's the real disadvantage of being "large address unaware" besides the 4 GB limit for a single process?
with 64-bit code bare /LARGEADDRESSAWARE is useless, as it is default for linker for 64-bit exe
May the source be with you

jj2007

Quote from: lucho on March 29, 2025, 07:46:27 PMwhat's the real disadvantage of being "large address unaware" besides the 4 GB limit for a single process?

Having more than 4GB of address space is the only valid argument for not sticking with 32-bit code.

sinsi

Quote from: jj2007 on March 29, 2025, 10:51:30 PMHaving more than 4GB of address space is the only valid argument for not sticking with 32-bit code.
More registers?

jj2007

I can count the occasions when I ran out of registers on the fingers of one hand. And even then it wasn't a tight loop.

lucho

Having 16 instead of 8 integer registers and the lack of the 4 GB RAM limit are very important.
The 4 GB limit for a single process (or maybe a single thread, I don't know) is not so important.

jj2007

Show me one piece of your software where eight registers were not enough. And another one where 4GB of address space were not enough.

guga

Quote from: jj2007 on March 29, 2025, 10:51:30 PM
Quote from: lucho on March 29, 2025, 07:46:27 PMwhat's the real disadvantage of being "large address unaware" besides the 4 GB limit for a single process?

Having more than 4GB of address space is the only valid argument for not sticking with 32-bit code.


CouldnĀ“t agree more. :thumbsup:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

lucho

#26
Quote from: jj2007 on April 01, 2025, 11:47:11 PMShow me one piece of your software where eight registers were not enough.
Here you are:
; Multiply the 64-bit unsigned numbers in registers EDX:ECX (Y1:Y0) and EBX:EAX
; (Z1:Z0) and return the 128-bit result in registers EDX:ECX:EBX:EAX (A:B:C:D).
; Algorithm: Peter Norton, "Advanced Assembly Language", 1991, pp. 229-230

.code
_um64x64 proc uses ESI EDI EBP
        MOV     ESI,EAX ; Z0
        MOV     EDI,EDX ; Y1
        PUSH    EDX     ; save Y1
        MUL     ECX     ; Z0 * Y0
        XCHG    EAX,ESI ; ESI = D = Low (Z0 * Y0), EAX = Z0
        XCHG    EDX,EDI ; EDI = C = High(Z0 * Y0), EDX = Y1
        MUL     EDX     ; Z0 * Y1, high word is at most 0xFFFFFFFE
        XOR     EBP,EBP ; A = 0
        ADD     EDI,EAX ; C = High(Z0 * Y0) + Low(Z0 * Y1)
        ADC     EDX,EBP ; High(Z0 * Y1), the sum is at most 0xFFFFFFFF, CF=0
        XCHG    EDX,ECX ; ECX = B = High(Z0 * Y1), EDX = Y0
        MOV     EAX,EBX ; Z1
        MUL     EDX     ; Z1 * Y0
        ADD     EDI,EAX ; C = High(Z0 * Y0) + Low (Z0 * Y1) + Low(Z1 * Y0)
        ADC     ECX,EDX ; B = High(Z0 * Y1) + High(Z1 * Y0)
        ADC     EBP,EBP ; A
        XCHG    EAX,EDI ; EAX = C
        XCHG    EAX,EBX ; EAX = Z1, EBX = C
        POP     EDX     ; restore Y1
        MUL     EDX     ; Z1 * Y1
        ADD     ECX,EAX ; B = High(Z0 * Y1) + High(Z1 * Y0) + High(Z1 * Y1)
        ADC     EDX,EBP ; A = High(Z1 * Y1)
        XCHG    EAX,ESI ; EAX = D
        RET
_um64x64 endp
end
Compare the above with the elegant 64-bit implementation of the same algorithm.

QuoteAnd another one where 4GB of address space were not enough.
When I claim that 4 GB of address space is not enough, I mean that the total RAM usage of today's software that resides at the same time in memory exceeds 4 GB, not that 4 GB are not enough for a single process or thread, which is rarely so indeed.

jj2007

Ok, you won: if I'll ever need a 128-bit result in four dword registers, I'll switch to 64-bit code for that particular task :thumbsup:

With dramatically reduced specifications for the required accuracy, I usually find a simple solution:
include \masm32\MasmBasic\MasmBasic.inc
num64_1 QWORD 1111111111111111111
num64_2 QWORD 2222222222222222222
.DATA?
result64 REAL10 ?
  Init
  fild num64_1
  fild num64_2
  deb 4, "On FPU", ST(0), ST(1)
  fmul
  fstp result64
  Print Str$("\nThe result is %Jf", result64)
EndOfCode

On FPU
ST(0)           2222222222222222222.
ST(1)           1111111111111111111.

The result is 2.469135802469135802e+36

Which is, of course, much less accurate than the 2,4691358024691358019753086419753e+36 provided by Windows Calc...

Quote from: lucho on April 03, 2025, 01:40:25 AMWhen I claim that 4 GB of address space is not enough, I mean that the total RAM usage of today's software that resides at the same time in memory exceeds 4 GB

Sorry, but total RAM use of all running processes is irrelevant. They all have their separate address space. Total RAM used by all processes together can exceed 4GB due to swapping and paging.

sinsi

I'm old-school enough to remember that you NEVER used memory, always registers, and that carries on for me today.

Having 4GB of address space means having 2GB of usable space (usually). My database is not there yet (just over 1GB) but it's grown to that size in just over 2 years so I am (fingers crossed) future-proof.

It's no big deal, 32- or 64-bit. There are a few nifty little tricks in 64-bit but if you're satisfied with 32-bit then don't change. The MASM64 SDK is all but useless and unfortunately the 32-bit SDK is missing anything from the last 10-15 years (Windows 7 was released in 2009).

jj2007

Quote from: sinsi on April 03, 2025, 07:28:48 PMthe 32-bit SDK is missing anything from the last 10-15 years

True but not a major obstacle. You can add the handful of important new functions by hand.