News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Recent posts

#1
The Workshop / Re: Survey for 64-bit coders
Last post by Vortex - July 05, 2025, 08:09:29 PM
Event with the Intel syntax support, it's not easy to user the Gas assembler. The macro engine is poor. Gas is rather useful as a backend tool for other programming languages.
#2
The Workshop / Re: Survey for 64-bit coders
Last post by lucho - July 05, 2025, 06:19:36 PM
Quote from: ognil on June 13, 2025, 10:21:36 PMПривет Лъчко, :smiley:
Привет, Оги! :smiley:

By the way, because of its multi-architectural nature, the GNU Assembler is actually not a single assembler:

QuoteGNU as is really a family of assemblers.
Members of this family share a common output format (ELF) and directives. Everything else – even the comment separation character – is architecture-dependent.
#3
The Laboratory / Re: Invoke, call, jump. Simple...
Last post by daydreamer - July 05, 2025, 04:13:00 PM
@NoCforMe
You are right,thanks to this discussion I want to find that code that uses all 8 gp registers to make it run faster and try make a SSE2 using 8 xmm regs instead,I have a vague memory it's decrypt or encrypt algo

But we update our info how clock cycles are on newer cpu's, not stuck on old info we know from old cpu's
Simplest way of inline code is create a macro version of the proc you want to use

@Timo
I have experience using network of computers for 3d animation and 3d art
On old one core cpu's it got very slow to use for anything else,so used slowest computer for surfing and maybe working with 2d texture
But now with my latest multicore computer that ain't nesserary any more
Even 3d program that can use all cores,you can use setting so it use fewer cores
Old laptop doesn't have disadvantage newest laptops have = nobody are interested in steal several year old computer, if you use it when travelling, use it outside
Text editor + assembler or compiler, create. Ico files for your gui program with 2d paint program, do only need a crappy old computer



#4
The Laboratory / Re: Invoke, call, jump. Simple...
Last post by NoCforMe - July 05, 2025, 09:34:53 AM
All in all, I think that once again folks are deluding themselves that this whole thing of how long it takes to call a function (pushed parameters vs. registers) actually makes any difference.

Think about it: more than likely, a critical function is likely to spend most of its time within that function crunching data and consuming cycles. If you want to optimize things, then the place to do that is inside the function, not outside.

The amount of time spent passing parameters to the function is going to be a tiny fraction of the time spent in the function.

The only time it would make much difference is if the function is fairly short and gets called very frequently, in which case it might make more sense to simply inline the function rather than worry about how long it takes to pass those parameters.
#5
FreeBASIC / Inline assembly and FreeBASIC
Last post by Vortex - July 05, 2025, 04:48:07 AM
Hello,

A quick example of using inline assembly with FreeBASIC :

Function LowerCase Naked (s As String) As ZString Ptr

     Asm

.data


lcase_table:

    .byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    .byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    .byte 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0
    .byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    .byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    .byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    .byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    .byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

.text

    mov     rdx,QWORD PTR [rcx]
    mov     r10,OFFSET lcase_table
    sub     rdx,1

_loop:

    add     rdx,1
    movzx   rax,BYTE PTR [rdx]
    movzx   r8,BYTE PTR [r10+rax]
    shl     r8,5
    add     BYTE PTR [rdx],r8b
    jnz     _loop
    mov     rax,QWORD PTR [rcx]
    ret
      
    End Asm
   
End Function


Dim As String t="THIS is a TEST."

Print *Lowercase(t)
#6
The Laboratory / Re: Invoke, call, jump. Simple...
Last post by TimoVJL - July 05, 2025, 12:12:19 AM
And hopefully people learn something about CPUs
Those just don't usually work similar ways.
An Intel CPU 7 family is very interesting.
An AMD Ryzen that i have is just too similar, what Jochen have.
Earlier Jochen have an intel i5 and later got similar laptop with same CPU.


#7
The Laboratory / Re: Invoke, call, jump. Simple...
Last post by daydreamer - July 04, 2025, 08:17:48 PM
Quote from: NoCforMe on July 04, 2025, 03:12:16 AM
Quote from: jj2007 on July 03, 2025, 08:11:59 PM
Quote from: NoCforMe on July 03, 2025, 05:19:16 PMI restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.

Why so restrictive?

I hardly ever use the FPU in my programs, and have never messed around w/XMM. Most of my code is in integer-land.

Nothing wrong with using either one of those register sets to pass parameters, of course.
still performance wise,calling a proc that performs any integer array math or float math on an array of integers or floats,using ecx to pass lenght of array(s) and one or more other registers as pointing as start of arrays and maybe return fail value in EAX back from PROC ,for example if detects divide by zero or other things that it failed to complete  math operation on array(s)

#8
The Laboratory / Re: Invoke, call, jump. Simple...
Last post by NoCforMe - July 04, 2025, 03:12:16 AM
Quote from: jj2007 on July 03, 2025, 08:11:59 PM
Quote from: NoCforMe on July 03, 2025, 05:19:16 PMI restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.

Why so restrictive?

I hardly ever use the FPU in my programs, and have never messed around w/XMM. Most of my code is in integer-land.

Nothing wrong with using either one of those register sets to pass parameters, of course.
#9
The Laboratory / Re: Invoke, call, jump. Simple...
Last post by zedd - July 04, 2025, 12:42:19 AM
From the laptop
Intel(R) Celeron(R) N5105 @ 2.00GHz (SSE4)

549     cycles for 100 * proc aligned 16
484     cycles for 100 * proc aligned 16+3
550     cycles for 100 * aligned push+pop
482     cycles for 100 * aligned reg32

551     cycles for 100 * proc aligned 16
484     cycles for 100 * proc aligned 16+3
551     cycles for 100 * aligned push+pop
482     cycles for 100 * aligned reg32

550     cycles for 100 * proc aligned 16
485     cycles for 100 * proc aligned 16+3
552     cycles for 100 * aligned push+pop
482     cycles for 100 * aligned reg32

551     cycles for 100 * proc aligned 16
493     cycles for 100 * proc aligned 16+3
562     cycles for 100 * aligned push+pop
493     cycles for 100 * aligned reg32

564     cycles for 100 * proc aligned 16
496     cycles for 100 * proc aligned 16+3
561     cycles for 100 * aligned push+pop
485     cycles for 100 * aligned reg32

15      bytes for proc aligned 16
19      bytes for proc aligned 16+3
24      bytes for aligned push+pop
20      bytes for aligned reg32


--- ok ---
#10
The Laboratory / Re: Invoke, call, jump. Simple...
Last post by zedd - July 04, 2025, 12:37:26 AM
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (SSE4)

344    cycles for 100 * proc aligned 16
256    cycles for 100 * proc aligned 16+3
391    cycles for 100 * aligned push+pop
387    cycles for 100 * aligned reg32

345    cycles for 100 * proc aligned 16
261    cycles for 100 * proc aligned 16+3
392    cycles for 100 * aligned push+pop
380    cycles for 100 * aligned reg32

345    cycles for 100 * proc aligned 16
265    cycles for 100 * proc aligned 16+3
403    cycles for 100 * aligned push+pop
381    cycles for 100 * aligned reg32

341    cycles for 100 * proc aligned 16
260    cycles for 100 * proc aligned 16+3
382    cycles for 100 * aligned push+pop
381    cycles for 100 * aligned reg32

382    cycles for 100 * proc aligned 16
260    cycles for 100 * proc aligned 16+3
374    cycles for 100 * aligned push+pop
389    cycles for 100 * aligned reg32

15      bytes for proc aligned 16
19      bytes for proc aligned 16+3
24      bytes for aligned push+pop
20      bytes for aligned reg32


--- ok ---