Recent Posts

Pages: [1] 2 3 ... 10
1
The Campus / Re: Student in technikal college
« Last post by K_F on Today at 07:46:34 AM »
Are you using a google translater ?
The i-86 does not physically have ports, but has the ability to read/write ports with IN/OUT instructions.

Some reading material..  :t

https://8051-microcontrollers.blogspot.com/2014/12/8086-io-ports-important-points-to-be.html

http://www.allsyllabus.com/aj/note/ECE/8086%20Microprocessor%20&%20Peripherals/unit%205/I%20O%20Interfacing%20Techniques.php

2
The Laboratory / Re: Simple floating point macros.
« Last post by RuiLoureiro on Today at 07:32:03 AM »
Hi HSE !
            There is no problem with you. We may kid with this things. It is fun !
Have a good work  :t
3
Romper Room / Re: How programming works.....
« Last post by Siekmanski on Today at 07:20:13 AM »
Did you just quoted the holy book?

Other historical examples of violent and unjust acts supported by biblical teachings include:
the Inquisition; the Crusades; the burning of witches; religious wars; pogroms against Jews; persecution of homosexuals; forceful conversions of heathens; slavery; beatings of children; brutal treatment of the mentally ill; suppression of scientists; and whippings, mutilations, and violent executions of persons convicted of crimes.
Those acts were a regular part of the Christian world for centuries.

Q: Is religion evil ?

Some Cruelties and Contradictions in the Bible.

The list is so long I had to zip it.
4
The Laboratory / Re: Simple floating point macros.Silpe
« Last post by HSE on Today at 07:15:51 AM »
Hi Rui!

         I guess that HSE is kidding with your idea of fpinit.

Just trying to guess what Hutch is making.

 fldz in fpinit is a problem if you don't need it.

There is two types of macros:

1) Don't need a non-empty st(0) and left an additional non-empty st()

    fpmul MACRO arg1,arg2       ;; multiply arg1 and arg2 together
      fld arg1
      fld arg2
      fmulp
    ENDM

2) Need a non-empty st(0) and don't modify number of non-empty st()

    fpadd MACRO arg             ;; add a number
      fld arg
      faddp
    ENDM

To make two set of macros is a posible solution, mmm

Meanwhile I'm trying to solve some problems calculating adaptation value of vectors for a Genetic Algorithm, really slow with so many debugging messages.    :(
5
The Laboratory / Re: Other floating point macros.
« Last post by RuiLoureiro on Today at 07:05:08 AM »
where is the error?
the result is ok when _LoopFrequency=1,2,3
error:
_LoopFrequency=4, result=1.0   <<<<<<<<<< ???????
_LoopFrequency=5, result=6.0
_LoopFrequency=6, result=1.0   <<<<<<<<<< ???????
_LoopFrequency=7, result=6.0
...
Code: [Select]
_floating_pointAdd Proc uses rbx _LoopFrequency:QWORD
   
   fninit                      ;; clear FPU registers and flags
   fldz                        ;; zero st(0)

   mov   rbx,_LoopFrequency

   fld   _One_real8
   fld   _One_real8
   faddp
@@:
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   
   sub   rbx,1
   jnz   @B
   
   fld   _One_real8
   fsubp

   fstp   result
   
     invoke   RtlZeroMemory,ADDR szBuffer, sizeof szBuffer
   invoke   FpuFLtoA64, ADDR result,40,ADDR szBuffer,SRC1_REAL Or SRC2_DIMM       
   invoke   SetWindowText,hEdithWnd,addr szBuffer
   
   ret
   
_floating_pointAdd Endp
Hi all,
        Did you test this prog ? Is it true that we get

                _LoopFrequency=4, result=1.0   <<<<<<<<<< ???????                                                          _LoopFrequency=5, result=6.0               
             _LoopFrequency=6, result=1.0   <<<<<<<<<< ???????               
              _LoopFrequency=7, result=6.0

I am not able to run it but i wrote the same code for console and the result is nothing of this and the result seems to be correct only looking at it. So, where is the problem ? Do you know ?

Ths file is in reply #1
Thanks  :t

My results (given by my ConverterDF):

 FloatingPointAdd - 1

          6.0              2 + 5 - 1
  FloatingPointAdd - 2
          11.0         2 + 10 -1
  FloatingPointAdd - 3
          16.0             2 + 15 -1
  FloatingPointAdd - 4
          21.0             2 + 20 -1
  FloatingPointAdd - 5
          26.0             2 + 25 -1
  FloatingPointAdd - 6                <<<<<--- _LoopFrequency
          31.0             2 + 30 -1
  FloatingPointAdd - 7
          36.0             2 + 35 -1

          ************** END *****************
6
Romper Room / Re: How programming works.....
« Last post by nidud on Today at 06:50:19 AM »
So to be religious is to believe in evil to justify doing evil which is evil. Sounds evil but nevertheless logical.
7
The Laboratory / Re: Simple floating point macros.Silpe
« Last post by RuiLoureiro on Today at 06:47:23 AM »
:biggrin:

Why do I get the impression that this last post was not all that serious ?  :P
Hutch,
         I guess that HSE is kidding with your idea of fpinit. It is not usual that we start the FPU with finit and  load 0 to st(0). What happen if we use it and next fpmul and next fstp var ? We exit and the FPU is not cleaned: 0.0 is in st(0). Is only this, there is no other problem, all macros works correctly, it seems.
note: when i have my new i7 i will test all possible cases.
8
Romper Room / Re: How programming works.....
« Last post by AW on Today at 05:52:56 AM »
Saying hello from London, the most religion and race tolerant place in the World. Arghhh.  :icon_rolleyes:

9
The Laboratory / Re: Simple floating point macros.
« Last post by hutch-- on Today at 05:26:56 AM »
Ray,

The target market for 64 bit MASM is different to the 32 bit version, it is not recommended to beginners at all but folks who already know how to write 32 bit MASM code. The difference with macros is the reference material and its easy enough to specify a "fstp variable" when the data needs to be placed in a variable but the more efficient form without redundant loads and stores is in the direction that many who use legacy code like this want.

I am just about clapped out and ready to sleep but I will have a look at your suggestion when I get up later today.
10
ASMC Development / Re: Asmc source and binaries
« Last post by nidud on Today at 05:25:10 AM »
Added a AVX implementation of the memcpy() using a switch with overlapping moves. The overhang for small counts are basically removed by using this method.

The AVX 32 byte version:
Code: [Select]
    .code

    mov rax,rcx

    .if r8 <= 32

        option switch:notest

        .switch r8

          .case 0
            ret

          .case 1
            mov cl,[rdx]
            mov [rax],cl
            ret

          .case 2,3,4
            mov cx,[rdx]
            mov dx,[rdx+r8-2]
            mov [rax+r8-2],dx
            mov [rax],cx
            ret

          .case 5,6,7,8
            mov ecx,[rdx]
            mov edx,[rdx+r8-4]
            mov [rax+r8-4],edx
            mov [rax],ecx
            ret

          .case 9,10,11,12,13,14,15,16
            mov rcx,[rdx]
            mov rdx,[rdx+r8-8]
            mov [rax],rcx
            mov [rax+r8-8],rdx
            ret

          .case 17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
            movdqu xmm0,[rdx]
            movdqu xmm1,[rdx+r8-16]
            movups [rax],xmm0
            movups [rax+r8-16],xmm1
            ret
        .endsw
    .endif

    vmovdqu ymm1,[rdx]
    vmovdqu ymm2,[rdx+r8-32]
    .if r8 > 64

        mov ecx,eax
        neg ecx
        and ecx,32-1
        add rdx,rcx
        mov r9,r8
        sub r9,rcx
        add rcx,rax
        and r9b,-32

        .if rcx > rdx

            .repeat
                sub r9,32
                vmovdqu ymm0,[rdx+r9]
                vmovdqa [rcx+r9],ymm0
            .untilz
            vmovdqu [rax],ymm1
            vmovdqu [rax+r8-32],ymm2
            ret
        .endif

        lea rcx,[rcx+r9]
        lea rdx,[rdx+r9]
        neg r9
        .repeat
            vmovdqu ymm0,[rdx+r9]
            vmovdqa [rcx+r9],ymm0
            add r9,32
        .untilz
    .endif
    vmovdqu [rax],ymm1
    vmovdqu [rax+r8-32],ymm2
    ret

    end

The AVX 64 byte version:
Code: [Select]
    .code

    mov rax,rcx

    .if r8 <= 64

        option switch:notest

        .switch r8

          .case 0
            ret

          .case 1
            mov cl,[rdx]
            mov [rax],cl
            ret

          .case 2,3,4
            mov cx,[rdx]
            mov dx,[rdx+r8-2]
            mov [rax+r8-2],dx
            mov [rax],cx
            ret

          .case 5,6,7,8
            mov ecx,[rdx]
            mov edx,[rdx+r8-4]
            mov [rax+r8-4],edx
            mov [rax],ecx
            ret

          .case 9,10,11,12,13,14,15,16
            mov rcx,[rdx]
            mov rdx,[rdx+r8-8]
            mov [rax],rcx
            mov [rax+r8-8],rdx
            ret

          .case 17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
            movdqu xmm0,[rdx]
            movdqu xmm1,[rdx+r8-16]
            movups [rax],xmm0
            movups [rax+r8-16],xmm1
            ret

          .case 33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,\
                49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64
            vmovdqu ymm0,[rdx]
            vmovdqu ymm1,[rdx+r8-32]
            vmovups [rax],ymm0
            vmovups [rax+r8-32],ymm1
            ret
        .endsw
    .endif

    vmovdqu ymm2,[rdx]
    vmovdqu ymm3,[rdx+32]
    vmovdqu ymm4,[rdx+r8-32]
    vmovdqu ymm5,[rdx+r8-64]

    .if r8 > 128

        mov ecx,eax
        neg ecx
        and ecx,64-1
        add rdx,rcx
        mov r9,r8
        sub r9,rcx
        add rcx,rax
        and r9b,-64

        .if rcx > rdx

            .repeat
                sub r9,64
                vmovdqu ymm0,[rdx+r9]
                vmovdqu ymm1,[rdx+r9+32]
                vmovdqa [rcx+r9],ymm0
                vmovdqa [rcx+r9+32],ymm1
            .untilz
            vmovdqu [rax],ymm2
            vmovdqu [rax+32],ymm3
            vmovdqu [rax+r8-32],ymm4
            vmovdqu [rax+r8-64],ymm5
            ret
            db 13 dup(0x90)
        .endif

        lea rcx,[rcx+r9]
        lea rdx,[rdx+r9]
        neg r9
        .repeat
            vmovdqu ymm0,[rdx+r9]
            vmovdqu ymm1,[rdx+r9+32]
            vmovdqa [rcx+r9],ymm0
            vmovdqa [rcx+r9+32],ymm1
            add r9,64
        .untilz
    .endif
    vmovdqu [rax],ymm2
    vmovdqu [rax+32],ymm3
    vmovdqu [rax+r8-32],ymm4
    vmovdqu [rax+r8-64],ymm5
    ret

    end


total [1 .. 4], 1++
    25764 cycles 2.asm: switch 32 AVX
    25788 cycles 1.asm: switch 32 SSE
    27684 cycles 3.asm: switch 64 AVX
    47541 cycles 0.asm: msvcrt.memcpy()

total [15 .. 17], 1++
    30200 cycles 2.asm: switch 32 AVX
    30364 cycles 3.asm: switch 64 AVX
    33621 cycles 1.asm: switch 32 SSE
    64903 cycles 0.asm: msvcrt.memcpy()

total [63 .. 65], 1++
    32243 cycles 3.asm: switch 64 AVX
    32869 cycles 2.asm: switch 32 AVX
    49630 cycles 1.asm: switch 32 SSE
    90890 cycles 0.asm: msvcrt.memcpy()

total [127 .. 129], 1++
    38979 cycles 3.asm: switch 64 AVX
    41102 cycles 2.asm: switch 32 AVX
    71012 cycles 1.asm: switch 32 SSE
   131579 cycles 0.asm: msvcrt.memcpy()

total [511 .. 513], 1++
    84769 cycles 3.asm: switch 64 AVX
    86763 cycles 2.asm: switch 32 AVX
   126420 cycles 1.asm: switch 32 SSE
   226737 cycles 0.asm: msvcrt.memcpy()

total [1023 .. 1025], 1++
   129894 cycles 3.asm: switch 64 AVX
   156393 cycles 2.asm: switch 32 AVX
   240375 cycles 1.asm: switch 32 SSE
   420802 cycles 0.asm: msvcrt.memcpy()
Pages: [1] 2 3 ... 10