Recent Posts

Pages: [1] 2 3 ... 10
1
The Workshop / Re: Multiply two QWORDs
« Last post by jj2007 on Today at 03:59:03 AM »
           versions 3 ( _v3 ) should be much faster... ( than quick! )

Then post it here. And please, not hidden in an archive with a dozen files.
2
The Workshop / Re: Multiply two QWORDs
« Last post by RuiLoureiro on Today at 03:27:33 AM »
>>> ...for those algos that do produce one, of which Rui's Multiply64by64 is the fastest on my CPU:
           versions 3 ( _v3 ) should be much faster... ( than quick! )


          This «u64_mul (Rui)» was not written by me, i got it... dont remember who wrote it.
3
The Workshop / Re: Multiply two QWORDs
« Last post by AW on Today at 03:26:07 AM »
This is a True Masm (TM) 64-bit version of mult64to128, results are 128-bits as expected not crippled to 64-bits, even using floating point, as I have seen so far, or using incorrect carryless functions.

Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz

64-bit True Masm (TM), 128-bit Results
978 cycles for 100 * mult64to128 (AW)
val1=0x1122334455667788
val2=0x99aabbccddeeff00
val1*val2=0xa48ddeb93f93d70479983e499807800


 :badgrin:
4
The Laboratory / Re: Randomise stack test piece.
« Last post by jj2007 on Today at 03:17:49 AM »
... randomly picks mousex,mousey,keyboard presses etc to get some real random numbers

Just take the lobyte of rdtsc, it's really random:
Code: [Select]
  xor edi, edi
  .Repeat
invoke Sleep, 1 ; leave the time slice
cpuid ; serialise
rdtsc
print str$(al), " "
inc edi
  .Until edi>=500
Code: [Select]
6 248 42 163 222 8 0 15 86 47 15 216 73 15 176 124 241 188 122 89 79 248 52 7 249 222 125 156 68 90 43 45 151 237 227 50 52 26 32 3 97 165 6
7 42 241 112 219 32 148 37 89 14 236 197 59 121 24 116 14 203 5 17 115 79 73 1 243 23 54 137 149 147 39 88 36 147 218 100 43 109 81 169 188
92 104 62 138 59 6 220 254 44 125 19 132 53 73 154 70 164 254 176 118 169 132 109 118 45 15 219 159 97 87 198 115 113 124 193 249 143 169 40
 220 98 236 207 122 29 219 2 173 20 62 248 140 126 201 143 10 184 38 168 83 45 216 100 49 164 2 195 60 66 242 25 142 2 83 70 19 186 212 197
214 173 128 52 60 209 78 114 125 118 17 130 10 189 105 139 211 13 50 82 253 194 37 86 66 120 68 193 0 203 19 222 143 34 37 100 105 180 254 2
1 61 157 117 221 8 96 20 217 129 52 244 230 29 77 182 131 24 253 127 150 217 147 220 212 224 16 139 20 87 222 83 1 160 39 51 170 0 71 28 107
 100 167 47 183 19 174 97 227 204 118 198 233 13 40 41 94 70 132 206 122 213 144 86 47 8 120 193 84 179 221 43 139 202 160 220 146 124 150 1
79 158 177 241 177 17 72 202 44 243 32 189 105 80 57 27 181 174 131 247 226 31 72 3 14 44 147 181 114 149 113 122 120 106 163 85 158 75 203
210 162 168 43 129 226 252 161 26 184 238 74 149 17 213 39 215 63 37 88 99 54 243 54 34 23 191 251 75 207 172 230 44 198 134 6 232 98 69 32
196 149 66 176 11 15 49 108 136 211 144 58 158 239 224 88 78 139 242 122 93 149 69 94 110 193 211 139 218 71 71 107 131 106 11 153 96 10 39
240 136 180 75 210 226 255 200 121 196 165 224 109 95 170 15 147 83 124 149 216 90 211 83 216 181 44 5 165 252 185 35 111 47 58 81 191 78 22
5 233 255 178 8 11 30 250 197 49 85 87 132 163 73 16 74 188 199 178 246 210 241 214 130 32 109 48 196 106 5 163 210 97 143 126 186 228 111 2
05 229 110 233 57 118 216 208 146 97 148 234 170 118 31 110 68 37 26 68 215 186 154 166 5
5
The Workshop / Re: Multiply two QWORDs
« Last post by jj2007 on Today at 03:02:12 AM »
@Nidud: Yes, the new SSE version does not produce the expected result, so I added _mul128 instead. I also added a print of the high qword for those algos that do produce one, of which Rui's Multiply64by64 is the fastest on my CPU:
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

914     cycles for 100 * MultQQ
2111    cycles for 100 * Multiply64by64 (Rui)
2159    cycles for 100 * u64_mul (Rui)
1207    cycles for 100 * PCLMULQDQ
2832    cycles for 100 * MultPclmulqdq2
4267    cycles for 100 * doMul (aw27)
962     cycles for 100 * muld (Nidud)
2306    cycles for 100 * _mul128 (Nidud)

910     cycles for 100 * MultQQ
2115    cycles for 100 * Multiply64by64 (Rui)
2163    cycles for 100 * u64_mul (Rui)
1209    cycles for 100 * PCLMULQDQ
2829    cycles for 100 * MultPclmulqdq2
4266    cycles for 100 * doMul (aw27)
962     cycles for 100 * muld (Nidud)
2300    cycles for 100 * _mul128 (Nidud)

910     cycles for 100 * MultQQ
2120    cycles for 100 * Multiply64by64 (Rui)
2166    cycles for 100 * u64_mul (Rui)
1211    cycles for 100 * PCLMULQDQ
2833    cycles for 100 * MultPclmulqdq2
4272    cycles for 100 * doMul (aw27)
964     cycles for 100 * muld (Nidud)
2296    cycles for 100 * _mul128 (Nidud)

926     cycles for 100 * MultQQ
2118    cycles for 100 * Multiply64by64 (Rui)
2169    cycles for 100 * u64_mul (Rui)
1207    cycles for 100 * PCLMULQDQ
2833    cycles for 100 * MultPclmulqdq2
4273    cycles for 100 * doMul (aw27)
962     cycles for 100 * muld (Nidud)
2287    cycles for 100 * _mul128 (Nidud)

62      bytes for MultQQ
108     bytes for Multiply64by64 (Rui)
126     bytes for u64_mul (Rui)
46      bytes for PCLMULQDQ
52      bytes for MultPclmulqdq2
253     bytes for doMul (aw27)
68      bytes for muld (Nidud)
146     bytes for _mul128 (Nidud)

MultQQ                 6760860027809745732
Multiply64by64 (Rui)   6760860027809745732  - high QWORD: 1728378107
u64_mul (Rui)          6760860027809745732  - high QWORD: 1728378107
PCLMULQDQ              7817399311675693060
MultPclmulqdq2         7817399311675693060
doMul (aw27)           6760860027809745732  - high QWORD: 1728378107
muld (Nidud)           6760860027809745732
_mul128 (Nidud)        6760860027809745732
6
The Laboratory / Re: Randomise stack test piece.
« Last post by daydreamer on Today at 02:53:48 AM »
cool
wouldnt it be useful to have a file with lots of random seeds for card game+a array representing all 52 cards and randomize it with a program that runs in background that randomly picks mousex,mousey,keyboard presses etc to get some real random numbers in the pseudorandom function?

7
The Workshop / Re: Multiply two QWORDs
« Last post by RuiLoureiro on Today at 02:12:16 AM »
Hi Jocehn
              works fine :t
6FA84000
5B98FAA3
C2C48146
00000783
*** STOP Multiply64by64_v1 ---***
6FA84000
5B98FAA3
C2C48146
00000783
*** STOP Multiply64by64_v2 ---***
6FA84000
5B98FAA3
C2C48146
00000783
*** STOP Multiply64by64_v3 ---***
6FA84000
5B98FAA3
C2C48146
00000783
*** STOP Multiply64_64_v1 ---***
6FA84000
5B98FAA3
C2C48146
00000783
*** STOP  Multiply64_64_v2 ---***
6FA84000
5B98FAA3
C2C48146
00000783
*** STOP Multiply64_64_v3 ---***
6FA84000
5B98FAA3
00000000
00000000
*** STOP MultQQ ---***
6FA84000
5B98FAA3
C2C48146
00000783
*** STOP doMul ---***
6FA84000
5B98FAA3
00000000
00000000
*** STOP muld ---***
**************** E N D ****************
8
The Workshop / Re: Multiply two QWORDs
« Last post by nidud on Today at 12:41:19 AM »
I made some changes to sse version.

  mulpd xmm1, xmm2
  cvtsd2si ecx, xmm1      ; a0 * b1
  add edx, ecx
  movhlps xmm1, xmm1
  cvtsd2si ecx, xmm1      ; a1 * b0
  add edx, ecx

However, it will still fail and be really slow compare to the conventional method.
Code: [Select]
_mul128 proc Multiplier:qword, Multiplicand:qword, Highproduct:ptr

    mov eax,dword ptr Multiplier
    mov edx,dword ptr Multiplier[4]
    mov ecx,dword ptr Multiplicand[4]

    .if !edx && !ecx

        .if Highproduct

            mov ecx,Highproduct
            mov [ecx],edx
            mov [ecx+4],edx
        .endif
        mul dword ptr Multiplicand
    .else

        push    ebx
        push    esi
        push    edi
        push    ebp
        push    eax
        push    edx
        push    edx
        mov     ebx,dword ptr Multiplicand
        mul     ebx
        mov     esi,edx
        mov     edi,eax
        pop     eax
        mul     ecx
        mov     ebp,edx
        xchg    ebx,eax
        pop     edx
        mul     edx
        add     esi,eax
        adc     ebx,edx
        adc     ebp,0
        pop     eax
        mul     ecx
        add     esi,eax
        adc     ebx,edx
        adc     ebp,0
        mov     ecx,ebp
        mov     edx,esi
        mov     eax,edi
        pop     ebp
        mov     edi,Highproduct

        .if edi

            mov [edi],ebx
            mov [edi+4],ecx
        .endif
        pop     edi
        pop     esi
        pop     ebx

    .endif
    ret

_mul128 endp
9
The Workshop / Re: Multiply two QWORDs
« Last post by hutch-- on Today at 12:10:57 AM »

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (SSE4)

1159    cycles for 100 * MultQQ
2008    cycles for 100 * Multiply64by64 (Rui)
2190    cycles for 100 * u64_mul (Rui)
665     cycles for 100 * PCLMULQDQ
2458    cycles for 100 * MultPclmulqdq2
4529    cycles for 100 * doMul (aw27)
1006    cycles for 100 * muld (Nidud)
4222    cycles for 100 * mulq_sse (Nidud)

994     cycles for 100 * MultQQ
1975    cycles for 100 * Multiply64by64 (Rui)
2115    cycles for 100 * u64_mul (Rui)
722     cycles for 100 * PCLMULQDQ
2478    cycles for 100 * MultPclmulqdq2
4550    cycles for 100 * doMul (aw27)
1004    cycles for 100 * muld (Nidud)
4209    cycles for 100 * mulq_sse (Nidud)

1000    cycles for 100 * MultQQ
1971    cycles for 100 * Multiply64by64 (Rui)
2109    cycles for 100 * u64_mul (Rui)
601     cycles for 100 * PCLMULQDQ
2639    cycles for 100 * MultPclmulqdq2
4493    cycles for 100 * doMul (aw27)
1003    cycles for 100 * muld (Nidud)
4229    cycles for 100 * mulq_sse (Nidud)

998     cycles for 100 * MultQQ
1972    cycles for 100 * Multiply64by64 (Rui)
2098    cycles for 100 * u64_mul (Rui)
598     cycles for 100 * PCLMULQDQ
2443    cycles for 100 * MultPclmulqdq2
5134    cycles for 100 * doMul (aw27)
1005    cycles for 100 * muld (Nidud)
4219    cycles for 100 * mulq_sse (Nidud)

62      bytes for MultQQ
108     bytes for Multiply64by64 (Rui)
126     bytes for u64_mul (Rui)
46      bytes for PCLMULQDQ
45      bytes for MultPclmulqdq2
253     bytes for doMul (aw27)
68      bytes for muld (Nidud)
112     bytes for mulq_sse (Nidud)

DestQ   11111111111234566980
DestQ   11111111111234566980
DestQ   11111111111234566980
DestQ   10801413638766757892
DestQ   10801413638766757892
DestQ   11111111111234566980
DestQ   11111111111234566980
DestQ   506253686751116096

--- ok ---
10
The Workshop / Re: Multiply two QWORDs
« Last post by LiaoMi on Today at 12:08:16 AM »
Code: [Select]
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

878     cycles for 100 * MultQQ
1695    cycles for 100 * Multiply64by64 (Rui)
1850    cycles for 100 * u64_mul (Rui)
515     cycles for 100 * PCLMULQDQ
2320    cycles for 100 * MultPclmulqdq2
3976    cycles for 100 * doMul (aw27)
879     cycles for 100 * muld (Nidud)
3668    cycles for 100 * mulq_sse (Nidud)

868     cycles for 100 * MultQQ
1859    cycles for 100 * Multiply64by64 (Rui)
2035    cycles for 100 * u64_mul (Rui)
546     cycles for 100 * PCLMULQDQ
2134    cycles for 100 * MultPclmulqdq2
4049    cycles for 100 * doMul (aw27)
956     cycles for 100 * muld (Nidud)
3637    cycles for 100 * mulq_sse (Nidud)

922     cycles for 100 * MultQQ
1729    cycles for 100 * Multiply64by64 (Rui)
1849    cycles for 100 * u64_mul (Rui)
537     cycles for 100 * PCLMULQDQ
2103    cycles for 100 * MultPclmulqdq2
4215    cycles for 100 * doMul (aw27)
885     cycles for 100 * muld (Nidud)
3635    cycles for 100 * mulq_sse (Nidud)

884     cycles for 100 * MultQQ
1901    cycles for 100 * Multiply64by64 (Rui)
1853    cycles for 100 * u64_mul (Rui)
531     cycles for 100 * PCLMULQDQ
2127    cycles for 100 * MultPclmulqdq2
3960    cycles for 100 * doMul (aw27)
981     cycles for 100 * muld (Nidud)
3643    cycles for 100 * mulq_sse (Nidud)

62      bytes for MultQQ
108     bytes for Multiply64by64 (Rui)
126     bytes for u64_mul (Rui)
46      bytes for PCLMULQDQ
45      bytes for MultPclmulqdq2
253     bytes for doMul (aw27)
68      bytes for muld (Nidud)
112     bytes for mulq_sse (Nidud)

DestQ   11111111111234566980
DestQ   11111111111234566980
DestQ   11111111111234566980
DestQ   10801413638766757892
DestQ   10801413638766757892
DestQ   11111111111234566980
DestQ   11111111111234566980
DestQ   506253686751116096

--- ok ---
Pages: [1] 2 3 ... 10