News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

reason to switch to 64 Bit Assembler

Started by habran, February 10, 2013, 08:03:46 PM

Previous topic - Next topic

frktons

Quote from: Gunther on February 11, 2013, 06:25:49 PM
Frank,

I can do that next weekend; please post your code.

Gunther

Here you are. The code tests only  REP STOSQ vs MOVNTDQ.
You can add the tests for MOVAPS, MOVDQA, etc... if you like.

Frank
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

qWord

Quote from: habran on February 11, 2013, 03:17:47 PMdo you want to say that I lied :icon_eek:
yes, the purpose of my post was to defame you  :dazzled:

Quote from: habran on February 11, 2013, 03:17:47 PM
however, I don't believe in your testing because, looking in a C source everyone can see that there is much more
job for processor and also accessing memory in C than ASM
good point  :t

BTW, this is what PellesC creates from your C code:
sub_140001000   proc near
                mov     rax, rcx
                mov     rcx, rax
                cmp     rcx, rdx
                jz      short locret_140001026
                test    r8d, r8d
                jz      short locret_140001026

loc_140001010:
                mov     r9b, [rdx]
                mov     [rcx], r9b
                sub     r8d, 1
                jz      short locret_140001026
                add     rcx, 1
                add     rdx, 1
                jmp     short loc_140001010

locret_140001026:                       
                retn
sub_140001000   endp
MREAL macros - when you need floating point arithmetic while assembling!

habran

qWord,
Quote
BTW, this is what PellesC creates from your C code:
Holly Cow!!! :exclaim: :icon_exclaim: :icon_eek:
are you pulling my leg :shock: actually, are you puling my both legs!???
If hat is true why can I not build 64 bit JWASM with it?
give me a proper explanation or I am gone to that C64 forum

Quoteyes, the purpose of my post was to defame you   :dazzled:
I was not aware that I am famous, am I really  :greenclp:
if I am really a celebrity, maybe I need a body guard, someone like Frank I meant Farmer not frktons  :biggrin:( Kevin Michael Costner) or Arnold Alois Schwarzenegger 8)(Terminator)
Cod-Father

dedndave

how about Bullseye from DareDevil



funniest bad guy ever   :lol:

habran

dedndave, you are a genius  :t :eusa_clap:
what are you doing in this forum!!!???
you could struck rich somewhere else :bgrin:
You have DEFAMED me
Cod-Father

habran

qWord PellesC produced almost perfect code which should look like this:

sub_140001000   proc near
                mov     rax, rcx
                cmp     rcx, rdx
                jz      short locret_140001026
                test    r8d, r8d
                jz      short locret_140001026
loc_140001010:
                mov     r9b, [rdx]
                mov     [rcx], r9b
                add     rcx, 1
                add     rdx, 1
                sub     r8d, 1
                jnz      short loc_140001010
locret_140001026:                       
                retn
sub_140001000   endp
Cod-Father

Gunther

Frank,

Quote from: frktons on February 11, 2013, 08:17:09 PM
Here you are. The code tests only  REP STOSQ vs MOVNTDQ.
You can add the tests for MOVAPS, MOVDQA, etc... if you like.

Frank

I'll first study your code and see what's to do. Thank you for uploading the source.  :t

Gunther
You have to know the facts before you can distort them.

Magnum

I think a better word might  be "celebrated".

Main Entry:    
celebrated  [sel-uh-brey-tid] Show IPA
Part of Speech:    adjective
Definition:    distinguished, famous
Synonyms:    acclaimed, big*, eminent, famed, glorious, great, high-powered, illustrious, immortal, important, large, laureate, lionized, notable, number one, numero uno, outstanding, popular, preeminent, prominent, renowned, revered, storied, up there, w. k., well-known

de·fame  audio  (d-fm) KEY

TRANSITIVE VERB:
de·famed, de·fam·ing, de·fames

    To damage the reputation, character, or good name of by slander or libel. See Synonyms at malign.
    Archaic To disgrace.
Take care,
                   Andy

Ubuntu-mate-18.04-desktop-amd64

http://www.goodnewsnetwork.org

habran

thank you Magnum, :t
now I know who I am:
distinguished, high-powered, immortal, numero uno, macho-man 8)

I also want to say(no joking this time):
This forum has gathered the most prominent assembler programmers, and if we decide HERE that:
we should not hold with our teeth
something that is already obsolete
but embrace 64 bit
other assembler programmers
will have this to swallow
and our example follow


Cod-Father

dedndave

i could write some 64-bit code, but i'd have to get you guys to test it for me   :(

habran

dedndave, I promise you I will be proud to do that for you :t
Cod-Father

habran

qWord this is what I get when compile your function in C with MSVC205:

xmemcpy:
0000000140063090  mov         qword ptr [rsp+18h],r8
0000000140063095  mov         qword ptr [rsp+10h],rdx
000000014006309A  mov         qword ptr [rsp+8],rcx
000000014006309F  sub         rsp,38h
00000001400630A3  mov         rax,qword ptr [cb]
00000001400630A8  shr         rax,3
00000001400630AC  mov         dword ptr [cnt1],eax
00000001400630B0  mov         rax,qword ptr [cb]
00000001400630B5  and         rax,7
00000001400630B9  mov         dword ptr [cnt2],eax
00000001400630BD  mov         rax,qword ptr [dest]
00000001400630C2  mov         qword ptr [p1],rax
00000001400630C7  mov         rax,qword ptr [src]
00000001400630CC  mov         qword ptr [p2],rax
00000001400630D1  jmp         xmemcpy+5Fh (1400630EFh)
00000001400630D3  mov         rax,qword ptr [p1]
00000001400630D8  add         rax,8
00000001400630DC  mov         qword ptr [p1],rax
00000001400630E1  mov         rax,qword ptr [p2]
00000001400630E6  add         rax,8
00000001400630EA  mov         qword ptr [p2],rax
00000001400630EF  mov         eax,dword ptr [cnt1]
00000001400630F3  mov         ecx,dword ptr [cnt1]
00000001400630F7  sub         ecx,1
00000001400630FA  mov         dword ptr [cnt1],ecx
00000001400630FE  test        eax,eax
0000000140063100  je          xmemcpy+84h (140063114h)
0000000140063102  mov         rax,qword ptr [p1]
0000000140063107  mov         rcx,qword ptr [p2]
000000014006310C  mov         rcx,qword ptr [rcx]
000000014006310F  mov         qword ptr [rax],rcx
0000000140063112  jmp         xmemcpy+43h (1400630D3h)
0000000140063114  mov         rax,qword ptr [p1]
0000000140063119  mov         qword ptr [p3],rax
000000014006311E  mov         rax,qword ptr [p2]
0000000140063123  mov         qword ptr [rsp],rax
0000000140063127  xor         eax,eax
0000000140063129  cmp         eax,1
000000014006312C  je          xmemcpy+0DBh (14006316Bh)
000000014006312E  mov         eax,dword ptr [cnt2]
0000000140063132  and         eax,4
0000000140063135  test        eax,eax
0000000140063137  je          xmemcpy+0DBh (14006316Bh)
0000000140063139  mov         rax,qword ptr [p3]
000000014006313E  mov         rcx,qword ptr [rsp]
0000000140063142  mov         ecx,dword ptr [rcx]
0000000140063144  mov         dword ptr [rax],ecx
0000000140063146  mov         rax,qword ptr [p3]
000000014006314B  add         rax,4
000000014006314F  mov         qword ptr [p3],rax
0000000140063154  mov         rax,qword ptr [rsp]
0000000140063158  add         rax,4
000000014006315C  mov         qword ptr [rsp],rax
0000000140063160  mov         eax,dword ptr [cnt2]
0000000140063164  sub         eax,4
0000000140063167  mov         dword ptr [cnt2],eax
000000014006316B  jmp         xmemcpy+0F7h (140063187h)
000000014006316D  mov         rax,qword ptr [p3]
0000000140063172  add         rax,1
0000000140063176  mov         qword ptr [p3],rax
000000014006317B  mov         rax,qword ptr [rsp]
000000014006317F  add         rax,1
0000000140063183  mov         qword ptr [rsp],rax
0000000140063187  mov         eax,dword ptr [cnt2]
000000014006318B  mov         ecx,dword ptr [cnt2]
000000014006318F  sub         ecx,1
0000000140063192  mov         dword ptr [cnt2],ecx
0000000140063196  test        eax,eax
0000000140063198  je          xmemcpy+11Ah (1400631AAh)
000000014006319A  mov         rax,qword ptr [p3]
000000014006319F  mov         rcx,qword ptr [rsp]
00000001400631A3  movzx       ecx,byte ptr [rcx]
00000001400631A6  mov         byte ptr [rax],cl
00000001400631A8  jmp         xmemcpy+0DDh (14006316Dh)
00000001400631AA  mov         rax,qword ptr [dest]
00000001400631AF  add         rsp,38h
00000001400631B3  ret

I can not believe that it takes only 2 ticks
Cod-Father

dedndave

i think you must be measuring that wrong
i think there is something wrong with the 2 cycle measurement   :P
maybe the timer code isn't doing what you think it is or something


i have a friend, not too far away...

he has a win 7-64 ultimate new-fangled machine at home, now
he uses it mostly for running his business
http://www.mesabattingcages.com/

he will let me test whatever i like, but i would hate to mess up his machine
or even be near it if it messes up   :P

qWord

Quote from: habran on February 12, 2013, 06:37:57 AMI can not believe that it takes only 2 ticks
you are obviously not able to configure your compiler! Also, looking in the code of my testbench (and yes ... you can't compile it because there are some dependencies I've not include) you will see that I've used a high loop count, which blends out memory access.
; MSVC 2010
sub_140008A60   proc near

                mov     r10d, r8d
                and     r8d, 7
                mov     r9, rcx
                shr     r10d, 3
                test    r10d, r10d
                jz      short loc_140008A94
                db      66h, 66h, 66h, 66h
                nop     word ptr [rax+rax+00000000h]

loc_140008A80:
                mov     rax, [rdx]
                add     r9, 8
                add     rdx, 8
                dec     r10d
                mov     [r9-8], rax
                jnz     short loc_140008A80

loc_140008A94:
                test    r8b, 4
                jz      short loc_140008AAC
                mov     eax, [rdx]
                add     r9, 4
                add     rdx, 4
                mov     [r9-4], eax
                add     r8d, 0FFFFFFFCh

loc_140008AAC:
                test    r8d, r8d
                jz      short loc_140008AD1
                sub     rdx, r9
                db      66h, 66h, 66h, 66h
                nop     dword ptr [rax+rax+00000000h]

loc_140008AC0:
                movzx   eax, byte ptr [rdx+r9]
                inc     r9
                dec     r8d
                mov     [r9-1], al
                jnz     short loc_140008AC0

loc_140008AD1:
                mov     rax, rcx
                retn
sub_140008A60   endp


in the attachment a testbench with loop count = 1

BTW: if you are not interested in a serious discussion, you may simply say that instead of this bullsh**  parody.
MREAL macros - when you need floating point arithmetic while assembling!

habran

yes qWord, now you are talking... :t
that looks more real then before and doesn't contradict to what I said before
my xmemcpy is OK for transferring one or two lines of characters, but for greater data transfer your function is absolute
I always admired your laser sharp mind and programmers skills :eusa_clap:   
Cod-Father