The MASM Forum

Miscellaneous => The Orphanage => Topic started by: raymond on November 28, 2023, 11:30:19 AM

Title: Python anyone?
Post by: raymond on November 28, 2023, 11:30:19 AM
Got into a brief discussion recently in another forum about possibilities of assembly vs other languages. Following is one answer I got:

Quoterayfil wrote:
"Show me any other language which could extract the square root of any number with a precision of 1000 decimal digits within a split second."

Following Python snippet can extract the square root with a precision of 10000 decimal digits in 0.01 second.

    CODE
    from decimal import *
    from time import time

    def sq(n, pr):
   start = time()
   getcontext().prec = pr
   print(Decimal(n).sqrt()
   print(time() - start)

    sq(5, 10000)

Mathematica is even faster than Python.

I would be VERY curious to see a finished program which could deliver such a feat, i.e. 0.01 second for 10,000 digits.

Is there anyone
a) sufficiently familiar with Python to understand the offered snippet,
b) capable of generating a working exe to confirm its speed and its output

If proven, its algo would be extremely interesting.

Thanks
Title: Re: Python anyone?
Post by: jack on November 28, 2023, 12:55:11 PM
hi Raymond
on my PC python takes 0.035 seconds
I have been playing with MP floating point arithmetic for a while in FreeBasic, my algorithms are very simple, the following little program builds an mp float from a 10000 characters long string and and takes the square root, squares it and gets the relative error, the average time per loop is 0.011 seconds
mind you that there's overhead in building the number from string and then squaring it to get the relative error
the same algorithms in C took 0.0054 seconds
dim as decfloat x, y
dim as long i
dim as double t

t=timer
for i=1 to 9
x=trim(str(i))+string(NUMBER_OF_DIGITS-1, trim(str(i))) ' build a 10000 digits number from string
y=sqr(x) 'square root
y=(y*y-x)/x 'calculate the relative error
print fp2str(y, 20)
next
t=timer-t
print t/9
<obviously the decimal floating point package is not included, if you are interested I can upload it and post a link>
Title: Re: Python anyone?
Post by: TimoVJL on November 28, 2023, 05:43:04 PM
QuoteStefan Krah's mpdecimal package (libmpdec): a complete implementation of the General Decimal Arithmetic Specification that will – with minor restrictions – also conform to the IEEE 754-2008 Standard for Floating-Point Arithmetic. Starting from Python-3.3, libmpdec is the basis for Python's decimal module.
mpdec (https://www.bytereef.org/mpdecimal/)

This fixed version run, but don't print nothing  :sad:
from decimal import *
from time import time

def sq(n, pr):
  start = time()
  getcontext().prec = pr
  print(Decimal(n).sqrt())
  print(time() - start)

sq(5, 10000)
With x64 version:
0.18700003623962402

Title: Re: Python anyone?
Post by: jack on November 28, 2023, 06:43:20 PM
for people on Windows I recommend WinPython (https://winpython.github.io/) the download is huge but includes the most popular libraries like SciPy and many others
Title: Re: Python anyone?
Post by: jj2007 on November 28, 2023, 08:34:51 PM
Quote from: jack on November 28, 2023, 06:43:20 PMI recommend WinPython (https://winpython.github.io/)

See Using a Masm DLL from Python (https://masm32.com/board/index.php?topic=6356.0) or this (https://masm32.com/board/index.php?msg=45015):

include \masm32\MasmBasic\MasmBasic.inc      ; download
includelib \Python34\libs\python34.lib       ; adjust path according to your installation
  Py_Initialize PROTO C
  Py_Finalize PROTO C
  PyRun_SimpleString PROTO C :DWORD

  Init
  call Py_Initialize
  fn PyRun_SimpleString, "print('Hello World')"
  call Py_Finalize
  Exit
end start

That was over 8 years ago, and now intense googling has not produced any python*.lib. They seem to fumble a lot with this language; version 3.8, for example, is the last one that works on Windows 7: "Last WinPython version that is said to still work on Windows 7 should be WinPython64-3.8.9.0"
Title: Re: Python anyone?
Post by: HSE on November 28, 2023, 09:11:30 PM
In the phone program is a little slow:
0.221851 

:biggrin:
Title: Re: Python anyone?
Post by: GoneFishing on November 29, 2023, 12:43:24 AM
python 3.9(64-bit)
Quote0.04675626754760742

Quote from: jj2007 on November 28, 2023, 08:34:51 PMThat was over 8 years ago, and now intense googling has not produced any python*.lib.

You may try to download Windows Installer from the official site https://www.python.org/downloads/ (https://www.python.org/downloads/)
python*.lib resides in libs folder
Title: Re: Python anyone?
Post by: jj2007 on November 29, 2023, 01:54:15 AM
Quote from: GoneFishing on November 29, 2023, 12:43:24 AMdownload Windows Installer from the official site https://www.python.org/downloads/
python*.lib resides in libs folder

I've done that in the meantime, using this archive (https://www.python.org/ftp/python/3.8.9/python-3.8.9-embed-win32.zip). There's a problem, though:
include \masm32\include\masm32rt.inc

.data
hDll dd ?
hPyList_New dd ?
.code
start:
  mov hDll, rv(LoadLibrary, "\Python\python3.dll")
  print hex$(eax), 9, "LoadLib", 13, 10
  mov hPyList_New, rv(GetProcAddress, hDll, "PyList_New")
  print hex$(rv(GetLastError)), 9, "last error", 13, 10
  print hex$(hPyList_New), 9, "PyList_New", 13, 10
  exit
end start

Output:
6DD50000        LoadLib
000000C1        last error
00000000        PyList_New

Googling 0xc1 loadlibrary python yields thousands of hits. Error 0xC1 = dec 193 means bad exe format. So it loads the library correctly as 32-bit code but fails to get the address of PyList_New, which is according to Timo's TLPEView, part of python3.dll :cool:

You can solve the problem with
  invoke SetCurrentDirectory, chr$("\Python\")
Apparently, python3.dll loads libraries not from its own folder but rather 1. from the executable's folder and then 2. from Windows\System32, where it finds 64-bit DLLs. It's probably a feature :cool:
Title: Re: Python anyone?
Post by: GoneFishing on November 29, 2023, 02:53:00 AM
That ftp link was useful :thumbsup:
I had to download 32-bit version,then I copied python*.dlls to my test project folder
and with a minor adjustment:
mov hDll, rv(LoadLibrary, "python3.dll")
all worked fine:
Quote73EE0000        LoadLib
00000000        last error
715A3A30        PyList_New
Title: Re: Python anyone?
Post by: jj2007 on November 29, 2023, 03:54:09 AM
Quote from: GoneFishing on November 29, 2023, 02:53:00 AMI copied python*.dlls to my test project folder

And all worked fine, sure. But you would have to do that for all your projects, thus wasting an awful lot of disk space.

Next version of MasmBasic will have this macro, so that you can access the DLL from any folder:
SetDllFolder MACRO arg:=<0>
  ifndef sdfOld$
    .DATA?
    sdfOld$ db 260 dup(?)    ; MAX_PATH
    .CODE
  endif
  ifdif <arg>, <0>
    .if 1
        push repargA(arg)
        invoke GetCurrentDirectory, 260, addr sdfOld$
        call SetCurrentDirectory
  else
        invoke SetCurrentDirectory, addr sdfOld$
      .endif
  endif
ENDM

Usage:
SetDllFolder "\Python"
mov hDll, rv(LoadLibrary, "Python3")
mov func1, rv(GetProcAddress, hDll, "PyList_New")
...
SetDllFolder
Title: Re: Python anyone?
Post by: GoneFishing on November 29, 2023, 04:43:20 AM
I prefer pure MASM32.
And you're right python wants to know the path to its folder which contains not only dlls but also  python*.zip with all needed modules.
Your MasmBasic HelloWorld proggie outputs a lot of internal configuration info when it doesn't find any modules:
QuotePython path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = 'python'
  isolated = 0
  environment = 1
  user site = 1
  import site = 1
  sys._base_executable = 'C:\\masm32\\projects\\python\\39x32\\t2.exe'
  sys.base_prefix = ''
  sys.base_exec_prefix = ''
  sys.platlibdir = 'lib'
  sys.executable = 'C:\\masm32\\projects\\python\\39x32\\t2.exe'
  sys.prefix = ''
  sys.exec_prefix = ''
  sys.path = [
    'C:\\masm32\\projects\\python\\39x32\\python39.zip',
    '.\\DLLs',
    '.\\lib',
    'C:\\masm32\\projects\\python\\39x32',
  ]
Title: Re: Python anyone?
Post by: raymond on November 29, 2023, 04:56:32 AM
Quote from: jack on November 28, 2023, 12:55:11 PMhi Raymond
on my PC python takes 0.035 seconds

Many thanks to all those having shown interest in this subject.

Jack indicates that the quoted 0.01 second may be quite possibly achievable.

My main interest was to find out if there is a significantly different way of extracting a square root compared to all those I have learned in the past.

The most promising one for speed would have to be using logarithms. But then, are there means to convert those rapidly back and forth with sufficient precision? Or is it something totally different? A self-contained working program would certainly be useful to run under ollydbg and get some of the details.
Title: Re: Python anyone?
Post by: jack on November 29, 2023, 06:02:48 AM
my algorithm uses log and exp evaluated using double precision as a first approximation, thereafter I use the Newton-Raphson method, something like this
x0 = x scaled so it's between 1 and 10 but less than 10
ex = the exponent in base 10 of x
y0 = first approximation
y0 = exp(log(x0)/2)*exp(2.302585092994046*ex/2) ; the second exp is mp-exp evaluated only to 16 digits
prec=32
then do the Newton-Raphson calculations in a loop doubling the precision each time in the loop
the calculations inside the loop are only evaluated to prec precision
Paul Dixon posted an implementation of the square root in PowerBasic with inline asm https://forum.powerbasic.com/forum/user-to-user-discussions/programming/816462-arbitrary-length-number-math?p=816479#post816479 (https://forum.powerbasic.com/forum/user-to-user-discussions/programming/816462-arbitrary-length-number-math?p=816479#post816479)
but the Newton-Raphson  method is several orders of magnitude faster than his approach
Title: Re: Python anyone?
Post by: jack on November 29, 2023, 06:38:30 AM
just in case that someone would like to have a look at my humble implementation of multiple precision decimal floating-point I attach the code in FreeBasic DecFloat-FB (https://u.pcloud.link/publink/show?code=XZLlhy0ZqQ0UKHkOkTQ9kVAIgwqebmjFm5y0)
Title: Re: Python anyone?
Post by: GoneFishing on November 29, 2023, 06:41:37 AM
hi Raymond,
QuoteA self-contained working program would certainly be useful
In theory the small python script in your post can be converted to exe format.
I've used Pyinstaller for that purpose but the result doesn't look "debuggable" because of its size :
exe = 1018KB + dependency folder = 10.2MB

BTW performance improved to 0.031305789947509766

Title: Re: Python anyone?
Post by: jj2007 on November 29, 2023, 07:27:04 AM
Quote from: GoneFishing on November 29, 2023, 04:43:20 AMI prefer pure MASM32.

The SetDllFolder macro is pure Masm32. You are working with MASM, where the first M stands for "MACRO". Get used to it.

Has anybody succeeded in running Ray's code?

from decimal import *
from time import time

def sq(n, pr):
 start = time()
 getcontext().prec = pr
 print(Decimal(n).sqrt()
 print(time() - start)
 sq(5, 10000)

C:\Python>python.exe jjTest.py
  File "jjTest.py", line 8
    print(time() - start)
    ^
SyntaxError: invalid syntax

This is after eliminating the unexpected indent crap.

Edit: I found the reason, and will never seriously touch a language that has such horrible error handling.
Title: Re: Python anyone?
Post by: GoneFishing on November 29, 2023, 07:42:13 AM
Quote from: jj2007 on November 29, 2023, 07:27:04 AMHas anybody succeeded in running Ray's code?

As you could guess I've succeeded:
Quoteprint(Decimal(n).sqrt())

Unindent last line:

from decimal import *
from time import time

def sq(n, pr):
 start = time()
 getcontext().prec = pr
 print(Decimal(n).sqrt())
 print(time() - start)
sq(5, 10000)
Title: Re: Python anyone?
Post by: jack on November 29, 2023, 07:54:22 AM
just one last post of mine, here's a 64-bit exe plus source for testing
<the exe should work as is but I discovered a bug in my fp to string conversion that I need to track down>

<edit; got rid of non-working code and compiled without -march=native, should work on older CPU>
Title: Re: Python anyone?
Post by: jj2007 on November 29, 2023, 09:09:46 AM
Quote from: jack on November 29, 2023, 07:54:22 AMhere's a 64-bit exe

Well done, it compiles "out of the box" and it's a factor 5 faster than the Python version :thumbsup:
Title: Re: Python anyone?
Post by: TimoVJL on November 29, 2023, 09:19:04 AM
Quote from: jack on November 29, 2023, 07:54:22 AMjust one last post of mine, here's a 64-bit exe plus source for testing
<the exe should work as is but I discovered a bug in my fp to string conversion that I need to track down>
Crash in Windows 7  :sad:

Just a problem with an old AMD
00000001`3f520a51 c5f829b424d0000000 vmovaps xmmword ptr [rsp+0D0h],xmm6 ss:00000000`0052fca0=00000000000000000000000000000000
Title: Re: Python anyone?
Post by: jack on November 29, 2023, 09:30:05 AM
thanks for testing jj and TimoVJL
@TimoVJL, would try again?
I recompiled without the -march=native so it should work on older CPU's but probably not as fast
Title: Re: Python anyone?
Post by: TimoVJL on November 29, 2023, 09:47:47 AM
Same problem00000001`3fe4f891 c5f829b424d0000000 vmovaps xmmword ptr [rsp+0D0h],xmm6 ss:00000000`0055f7f0=00000000000000000000000000000000
Title: Re: Python anyone?
Post by: jack on November 29, 2023, 10:12:28 AM
TimoVJL, one last try please, I recompiled on a Windows 7 VM
Title: Re: Python anyone?
Post by: TimoVJL on November 29, 2023, 10:17:58 AM
time per loop =  0.0642072000756571  :thumbsup:
Title: Re: Python anyone?
Post by: jack on November 29, 2023, 10:22:52 AM
hi TimoVJL, looks like I forgot to change the time message but at least it runs
how does the time compare with python?
Title: Re: Python anyone?
Post by: TimoVJL on November 29, 2023, 10:29:01 AM
from post #2

With x64 version:
0.18700003623962402
Title: Re: Python anyone?
Post by: jj2007 on November 29, 2023, 08:53:58 PM
Attached a list of the 1600 functions exported by Python38.dll (direct link (https://www.python.org/ftp/python/3.8.9/python-3.8.9-embed-win32.zip))
Note you get a similar list from Python3.dll, which grabs about 800 functions from *38.dll; they probably keep it for backwards compatibility.

I noticed strange behaviour of PySys_WriteStdout: it writes 0D0D0Ah sequences, see Assembly uses Python (https://masm32.com/board/index.php?topic=11506.0) (warning, that's MasmBasic :biggrin: )

PySys_WriteStdout(cfm$("%i\t%s\n"), ecx, PyList_GetItem(edi, ecx))
I'm not sure whether it's Python's fault, or if the console plays tricks.
Title: Re: Python anyone?
Post by: raymond on November 30, 2023, 04:09:27 AM
Quote from: GoneFishing on November 29, 2023, 06:41:37 AMhi Raymond,
QuoteA self-contained working program would certainly be useful
In theory the small python script in your post can be converted to exe format.
I've used Pyinstaller for that purpose but the result doesn't look "debuggable" because of its size :
exe = 1018KB + dependency folder = 10.2MB

WOW Talk about BLOATING to simply extract a square root!!!! :skrewy:  :dazzled:  :eusa_dance:  :eusa_boohoo:
Title: Re: Python anyone?
Post by: raymond on November 30, 2023, 08:14:34 AM
Quote from: jack on November 29, 2023, 07:54:22 AMjust one last post of mine, here's a 64-bit exe plus source for testing
<the exe should work as is but I discovered a bug in my fp to string conversion that I need to track down>

<edit; got rid of non-working code and compiled without -march=native, should work on older CPU>

D/L, unzipped & ran without problem (except for one minor detail: it "prints" some extra 100 or so blank pages).

0.022 sec on my old 10-year old laptop under Win10. Almost 100 times faster than my BCD algo at 1.5 second. All 10,000 digits of the square root of 5 seem to be the same with both programs.

Is there something similar as the 32-bit Ollydbg to study this 64-bit program?
Title: Re: Python anyone?
Post by: jj2007 on November 30, 2023, 08:24:44 AM
Quote from: raymond on November 30, 2023, 08:14:34 AMIs there something similar as the 32-bit Ollydbg to study this 64-bit program?

x64Dbg (https://x64dbg.com/) has the typical Olly interface, and it works fine.
Title: Re: Python anyone?
Post by: jack on November 30, 2023, 10:53:01 AM
raymond
what algorithm did you use for the square root?
Title: Re: Python anyone?
Post by: daydreamer on November 30, 2023, 08:34:58 PM
Quote from: raymond on November 29, 2023, 04:56:32 AMMy main interest was to find out if there is a significantly different way of extracting a square root compared to all those I have learned in the past.

The most promising one for speed would have to be using logarithms. But then, are there means to convert those rapidly back and forth with sufficient precision? Or is it something totally different? A self-contained working program would certainly be useful to run under ollydbg and get some of the details.
I don't know if Taylor series has sufficient precision for log and exp functions?
I have SIMD trigo functions that calculate 4*sincos fast
y0 = exp(log(x0)/2)*exp(2.302585092994046*ex/2) ;possibility for SIMT or SIMD calculates *0.5 and exp functions in parallel?

Title: Re: Python anyone?
Post by: jack on December 01, 2023, 01:45:11 AM
Quotey0 = exp(log(x0)/2)*exp(2.302585092994046*ex/2)
I wrote it that way at the time because I was playing with nth roots, but for the square root it's simpler to use sqrt(x0)*exp(2.302585092994046*ex/2), the reason for splitting the number is so that you can deal with huge numbers beyond the range of double or even extended
so for example if x=17, then x0=1.7 and ex=1
but suppose x=5e10000 then x0=5 and ex=10000 so exp(2.302585092994046*ex/2) = 1.00000000000158e5000 so you need to use MPexp to evaluate that expression
I tried the inverse square root Newton-Rapson method but with my very simple multiplication algorithm it was slower.
Title: Re: Python anyone?
Post by: raymond on December 01, 2023, 06:24:42 AM
Quote from: jack on November 30, 2023, 10:53:01 AMraymond
what algorithm did you use for the square root?

When I first learned about this algo in the early 1960s, I dubbed it the "abacus" algo because it seemed as if it could have been developed in ancient times when the abacus was the prime instrument to perform calculations.

In those days, IBM was marketing mechanical calculators to perform primarily the usual math functions: add, subtract, multiply, divide. The user's manual also described a procedure to extract square roots without using any mult/div operations except for one single preliminary multiplication. That is the procedure which I adapted in the late 1980's for a TRS80 to achieve 10,000 decimal precision of square roots with only 64kb total memory (using what was named at the time as "machine language" :skrewy: )!

Here's a brief description of that algo.
Let's use the square root of 2 for example
;1st step is to multiply the source by 5
 2.0000....*5 = 10.0000.....
;2nd step subdivide the result into chunks of 2 digits from the decimal delimiter, pad front with a 0 if necessary
10 00 00 00 ...........
;start by subtracting 05 from the first double digits
05
__
05 00 00 00 ......
;continue incrementing the '05' by 10 for the subsequent subtractions until the result would be negative
15
__
;would be negative for this example
;remove the terminating 5 and replace it with a 05, then shift by 1 digit right and repeat subtractions
05 00 00 00 ..........
01 05
______________
03 95 00 00 ....
01 15
______________
02 80 00 00 ....
01 25
______________
01 55 00 00 .........
01 35
______________
00 20 00 00 ......
01 45
;would become negative for this next digit
;remove the terminating 5 and replace it with a 05, then shift by 1 digit right and repeat subtractions
00 20 00 00 ...........
00 14 05
_____________
00 05 95 00 ..........
00 14 15
;would become negative for this next digit
;remove the terminating 5 and replace it with a 05, then shift by 1 digit right and repeat subtractions
00 05 95 00 ..........
00 01 41 05
_____________
00 04 53 95 ..........
00 01 41 15
_____________
00 03 12 80 ..........
00 01 41 25
_____________
00 01 71 55 ..........
00 01 41 35
_____________
00 00 30 20
00 01 41 45
;would become negative for this next digit
;remove the terminating 5 and replace it with a 05, then shift by 1 digit right and repeat subtractions
00 00 30 20 00 00 ........
00 00 14 14 05
________________
00 00 16 05 95 00 .......
00 00 14 14 15
________________
00 00 01 91 80 00 .........
00 00 14 14 25
;would become negative for this next digit
;remove the terminating 5 and replace it with a 05, then shift by 1 digit right and repeat subtractions
00 00 01 91 80 00 .........
00 00 01 41 42 05

;and you can continue to repeat this ad infinitum......... limited to the amount of available memory.

You also have to keep track of where the decimal delimiter would be in the result.

This somehow makes me feel old. :dazzled:
Title: Re: Python anyone?
Post by: jack on December 01, 2023, 09:04:47 AM
 :biggrin:
interesting algorithm
Title: Re: Python anyone?
Post by: daydreamer on December 01, 2023, 06:32:22 PM
SSE rcpsqrt low-res LUT I use in make circles very fast in ddraw, documentation suggest use newton- raphson to perform better precision
my dos. Com file is inspired by pixelshader way of make circles per pixel checking every pixel if inside or outside circle radius r= sqrt(x*x+Y*y)
Fastest I use in dos. Com is reduce the slow fsqrt every pixel vs 100 radius, with
R=x*x+Y*y
.If R < 10000 ; check vs 100*100
Mov color,circle color
.else
Mov color,background color
.endif