News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Help with QueryPerformanceCounter and 64 bit numbers

Started by Lonewolff, April 12, 2018, 03:15:46 PM

Previous topic - Next topic

Siekmanski

You can find it in dx9macros.inc ( part of my direct3d9 sources )

FLT4 MACRO float_number:REQ
LOCAL float_num
   .data
    align 4
    float_num real4 float_number
   .code
   EXITM <float_num>
ENDM

FLT8 MACRO float_number:REQ
LOCAL float_num
   .data
    align 8
    float_num real8 float_number
   .code
   EXITM <float_num>
ENDM

Creative coders use backward thinking techniques as a strategy.

LordAdef

Out of curiosity: was it just to have a customized macro name?

Siekmanski

Not really. I don't use the masm32rt.inc or the masm32\macros\macros.asm in my sources.
FP4 is not a masm standard. Don't know who came up with this kind of macro first.
I'm using the FLT4 and FLT8 macros for +/- 20 years now and they are properly aligned.
Creative coders use backward thinking techniques as a strategy.

jj2007

FP4 is aligned, too:
      FP4 MACRO value
        LOCAL vname
        .data
        align 4
          vname REAL4 value
        .code
        EXITM <vname>
      ENDM


I find this debate a bit academic. Without the Masm32 SDK, nobody would even know that MASM exists, or that writing Windows programs in Assembler is possible 8)

Certainly, this snippet works with UAsm. But it doesn't work with Masm. If, however, you enable the macros.asm line, it assembles with both. So what exactly is the added value of built-in FP? macros in UAsm?
.486                                      ; create 32 bit code
.model flat, stdcall                      ; 32 bit memory model
option casemap :none                      ; case sensitive

include \masm32\include\kernel32.inc
include \masm32\include\msvcrt.inc
; include \masm32\macros\macros.asm ; assembles with MASM and UAsm

includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\msvcrt.lib

.code
txFormat db "A double: %1.15f", 0
start:
  mov edi, offset FP8(0.0)
  fldpi
  fstp REAL8 PTR [edi]
  invoke crt_printf, addr txFormat, REAL8 PTR [edi]
  invoke ExitProcess, 0

end start

Siekmanski

QuoteFP4 is aligned, too:

True, but FP8 isn't.

      FP8 MACRO value
        LOCAL vname
        .data
        align 4   <---- shouldn't this be align 8 ?
          vname REAL8 value
        .code
        EXITM <vname>
      ENDM

QuoteI find this debate a bit academic. Without the Masm32 SDK, nobody would even know that MASM exists, or that writing Windows programs in Assembler is possible 8)

You are totally right, but the question was: "So, how about FLT4, where is this macro from?" ( reply #59 )
Creative coders use backward thinking techniques as a strategy.

aw27

A problem is that we can't include the macros.inc without including the others.
It will not take much effort to start finding nuisances. For example in JJ's carefully chosen example it is enough  to change the calling convention to PASCAL to break the whole.

While in UASM we can simply declare our own prototypes and use the built-in FP4/FP8 macros.


jj2007

Quote from: Siekmanski on May 17, 2018, 11:09:15 PM
QuoteFP4 is aligned, too:

True, but FP8 isn't.
...
        align 4   <---- shouldn't this be align 8 ?

Alignment is overrated IMHO 8)Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

465     cycles for 100 * align8
484     cycles for 100 * align4
487     cycles for 100 * misaligned

500     cycles for 100 * align8
494     cycles for 100 * align4
464     cycles for 100 * misaligned

Siekmanski

It seems it is, but can we trust cycle counters on modern PC's?  :biggrin:

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

561     cycles for 100 * align8
566     cycles for 100 * align4
565     cycles for 100 * misaligned

567     cycles for 100 * align8
564     cycles for 100 * align4
559     cycles for 100 * misaligned

571     cycles for 100 * align8
567     cycles for 100 * align4
559     cycles for 100 * misaligned

562     cycles for 100 * align8
562     cycles for 100 * align4
559     cycles for 100 * misaligned

563     cycles for 100 * align8
562     cycles for 100 * align4
564     cycles for 100 * misaligned

12      bytes for align8
12      bytes for align4
12      bytes for misaligned


--- ok ---
Creative coders use backward thinking techniques as a strategy.

daydreamer

you also have UASM and MASM macro for substitute several ugly messy
mov eax,immediate integers
movd (x)mm0,eax

with some MOVD (x)mm0,immediate integer macro?
also nice with 64bit and 128bit etc macros
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

Quote from: daydreamer on May 18, 2018, 03:43:21 AM
you also have UASM and MASM macro for substitute several ugly messy
mov eax,immediate integers
movd (x)mm0,eax

with some MOVD (x)mm0,immediate integer macro?
also nice with 64bit and 128bit etc macros

No problem:

include \masm32\MasmBasic\MasmBasic.inc         ; download

movx MACRO xmmArg, immArg
  if (opattr immArg) ne 36 ; atImmediate
       .err <** needs an immediate arg **>
  endif
  push immArg
  movd xmmArg, dword ptr [esp]
  add esp, 4
ENDM

  Init
  movx xmm2, 12345678h
  deb 1, "Result:", x:xmm2
EndOfCode


Doesn't trash eax, and works with ordinary non-MasmBasic code, too.

HSE

Always making noise!!AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

759     cycles for 100 * align8
1628    cycles for 100 * align4
755     cycles for 100 * misaligned

760     cycles for 100 * align8
1642    cycles for 100 * align4
755     cycles for 100 * misaligned

760     cycles for 100 * align8
1629    cycles for 100 * align4
764     cycles for 100 * misaligned

761     cycles for 100 * align8
1629    cycles for 100 * align4
755     cycles for 100 * misaligned

765     cycles for 100 * align8
1642    cycles for 100 * align4
755     cycles for 100 * misaligned

12      bytes for align8
12      bytes for align4
12      bytes for misaligned

--- ok ---


I don't know nothing about processor's architecture, but I think that AMD FPU is a RISC chip. In RISC processors alignment apparently is critical.

Look like Assembler have 8 aligned qwords by default. not that  :biggrin: Again, what happen here?
Equations in Assembly: SmplMath

zedd151

Win 10 Home, 64 bit    1.6  Ghz



AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

962     cycles for 100 * align8
1023    cycles for 100 * align4
939     cycles for 100 * misaligned

1003    cycles for 100 * align8
1020    cycles for 100 * align4
979     cycles for 100 * misaligned

991     cycles for 100 * align8
1047    cycles for 100 * align4
1012    cycles for 100 * misaligned

939     cycles for 100 * align8
1104    cycles for 100 * align4
948     cycles for 100 * misaligned

949     cycles for 100 * align8
1064    cycles for 100 * align4
948     cycles for 100 * misaligned

12      bytes for align8
12      bytes for align4
12      bytes for misaligned



zedd151

a little while later...


AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

877     cycles for 100 * align8
943     cycles for 100 * align4
877     cycles for 100 * misaligned

947     cycles for 100 * align8
943     cycles for 100 * align4
878     cycles for 100 * misaligned

877     cycles for 100 * align8
1042    cycles for 100 * align4
878     cycles for 100 * misaligned

876     cycles for 100 * align8
1035    cycles for 100 * align4
875     cycles for 100 * misaligned

876     cycles for 100 * align8
1064    cycles for 100 * align4
883     cycles for 100 * misaligned

12      bytes for align8
12      bytes for align4
12      bytes for misaligned




this computer doesn't seem to like align 4   
HSE's really doesn't like it.   :P

LordAdef

These benchmarks are sometime rather interesting, since many times we don't get any common ground conclusion.

But aligning is so simple that I don't mind doing it anyway

jj2007

Quote from: LordAdef on May 18, 2018, 07:49:34 AMBut aligning is so simple that I don't mind doing it anyway

If the code gets any faster with alignment, it makes sense in an innermost loop with a Million iterations. Otherwise it bloats your exe, pollutes the data cache, and thus may slow down the whole program.