The MASM Forum

Projects => ObjAsm => Topic started by: HSE on January 30, 2024, 07:36:03 AM

Title: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: HSE on January 30, 2024, 07:36:03 AM
Hi all!

Some weeks ago I found the article: Comparing K-Means and Others Algorithms for Data Clustering (https://www.codeproject.com/Articles/5375470/Comparing-K-Means-and-Others-Algorithms-for-Data-C) by Nicolás Descartes. With C# source code.

Look very interesting to translate that to Assembly. Still is a work in progress but look well.

There is some kind of abuse of Collections, because is very easy to write that in C#. I removed most obvious exagerations (perhaps they try to make more clear the algorithm, not sure).

- K-Means strategy need initial randomness, then probably you have to run several times before to find the better solution.

- Hierarchical strategy need to define termination, and in these cases are number of clusters.

This 2 don't need collections indeed, but I follow the author here. I use a K-Means method I wrote in PowerBasic for DOS three weeks ago :biggrin:

- Density-based spatial clustering of applications with noise really benefit from Collections, and also need a couple of Sorted Vectors.

Using this vectors happen that .VecForEach/.VecNext have some problems if you use more than one kind of vector.

Anyway is a good challenge, and still I have to collect some leaks  :biggrin:

Any sugestion or improvement is welcome!

Regards, HSE

Source Code and 64bit Binary updated 9 March 2024 in GitHub (https://github.com/ASMHSE/Clusters-in-Assembly/tree/main)
.
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: Biterider on January 30, 2024, 08:33:36 AM
Hi HSE
Very interesting and cool project, I'll read the CP article first to get a deeper understanding.  :thumbsup:
 
QuoteUsing this vectors happen that .VecForEach/.VecNext have some problems if you use more than one kind of vector.
After reading the code, I think I understand the problem. Since the vectors are of different sizes, the macros don't take this into account, with disastrous results. The good news is that we can do better.  :cool:

Biterider
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: jj2007 on January 30, 2024, 10:37:43 AM
Quote from: HSE on January 30, 2024, 07:36:03 AMAny sugestion or improvement is welcome!

Use the F1 key :biggrin:
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: HSE on January 30, 2024, 11:10:52 AM
Hi Biterider!

Quote from: Biterider on January 30, 2024, 08:33:36 AMVery interesting and cool project, I'll read the CP article first to get a deeper understanding.  :thumbsup:

Thanks  :thumbsup:

Quote from: Biterider on January 30, 2024, 08:33:36 AMThe good news is that we can do better.  :cool:

Yes. Just my first idea fail  :biggrin:  :biggrin: , and is not necessary right now.

In comparison, .ColForEach/.ColNext was critical, and work "impecable".

HSE
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: HSE on January 30, 2024, 11:16:14 AM
Quote from: jj2007 on January 30, 2024, 10:37:43 AMUse the F1 key :biggrin:

Good idea  :thumbsup:

It's not in my rudimentary template (nor is the help  :biggrin: ).

Thanks
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: Biterider on January 31, 2024, 05:29:14 AM
Hi HSE
I'm trying to compile the "Clusters" application, but I get an error that I can't fix:

forced error: [back end: x86-32/64,FPU][#CG6] function is undefined or not supported by current back end: ceil
I downloaded the latest version of SmplMath from GitHub, but that didn't solve the problem. Also, there seems to be a missing file called IfElseM.inc, which I had to borrow from another previous installation.

Without being able to compile the above application, I tried to solve the TVector problem, assuming that the problem was due to different element sizes, and prepared a modification that might solve it.

Would you give them a try?

Regards, Biterider
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: HSE on January 31, 2024, 07:12:23 AM
Hi Biterider!

Quote from: Biterider on January 31, 2024, 05:29:14 AMAlso, there seems to be a missing file called IfElseM.inc, which I had to borrow from another previous installation.

:biggrin:  :biggrin: Thanks

In math.inc you can change "IfElseM.inc" by "FlowControl.inc", but I think nothing of that is used.


Quote from: Biterider on January 31, 2024, 05:29:14 AMforced error: [back end: x86-32/64,FPU][#CG6] function is undefined or not supported by current back end: ceil

In math_functions.inc you can add:
fslv_fnc_ceil macro
    IFE fslv_volatile_gprs AND FSVGPR_EAX
        T_EXPR(<push eax>,<mov [rsp+8],rax>)
    ENDIF
    fstcw WORD ptr T_EXPR([esp-2],[rsp])
    movzx eax,WORD ptr T_EXPR([esp-2],[rsp])
    or eax,0800h
    mov T_EXPR([esp-4],[rsp+2]),ax
    fldcw T_EXPR([esp-4],[rsp+2])
    frndint   
    fldcw WORD ptr T_EXPR([esp-2],[rsp])
    IFE fslv_volatile_gprs AND FSVGPR_EAX
        T_EXPR(<pop eax>,<mov rax,[rsp+8]>)
    ENDIF
endm
default_fnc_dscptr2 <ceil>,nArgs=1,fpu=-1,x64=-1

Currently I use the Complete SmplMath package, with DoubleDouble precision and complex numbers, but package size and preprocess are bigger. Perhaps I have to post a Full version, that nobody will use  :biggrin:  :biggrin:


Previously, .VecForEach don't know about first kind of vector, now about last vector.

In .VecNext apparently must be
      inc dword ptr @CatStr(<??VecForEach_Index_>, %??VecForEach_ID)
Regards, HSE
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: Biterider on January 31, 2024, 08:04:51 AM
Thanks HSE
Now I can compile the application.  :thumbsup:
Could you show me the places where you wanted to use the TVector macros?

Biterider
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: Biterider on January 31, 2024, 08:06:47 AM
Quote from: HSE on January 31, 2024, 07:12:23 AMPerhaps I have to post a Full version, that nobody will use  :biggrin:  :biggrin:
At least you can count with one  :thumbsup:

Biterider
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: HSE on January 31, 2024, 08:13:35 AM
Quote from: Biterider on January 31, 2024, 08:04:51 AMCould you show me the places where you wanted to use the TVector macros?

Was for testing porpouses, because for debugging I used the classic Randy loop.

StrategyForClusters.inc line 597 and line 629. For this sorted vector is TVectorNameS
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: HSE on January 31, 2024, 08:40:44 AM
Quote from: jj2007 on January 30, 2024, 10:37:43 AMUse the F1 key :biggrin:

Updated in first post with F1 and Alt+F1 keys

The help is a PDF from Word, very big. I will have to build fron TEX  :biggrin:
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: Biterider on January 31, 2024, 08:49:22 AM
Quote from: HSE on January 31, 2024, 08:13:35 AMStrategyForClusters.inc line 597 and line 629.
Looking at these lines, I think I have it now.
The best way to provide the missing information is to use something like this (for vectors only)

.VecForEach [ebx]::Real8VectorS
  ...
.VecNext

If there are no objections, I will have a go at it.  :biggrin:

Biterider

Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: HSE on January 31, 2024, 08:55:17 AM
Quote from: Biterider on January 31, 2024, 08:49:22 AMIf there are no objections,

Mmm  :rolleyes: ... no

 :thumbsup:
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: jj2007 on January 31, 2024, 09:53:34 AM
Quote from: HSE on January 31, 2024, 08:40:44 AMThe help is a PDF from Word, very big

You are right, it's far too big. Add the two attached files to your folder, and when user presses VK_F1, do an invoke WinExec, chr$("clusters.rtf"), SW_RESTORE :cool:
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: NoCforMe on January 31, 2024, 11:55:58 AM
Side comment:

PDF from Word? Gag, puke, retch, barf. Have you ever looked at one of those abominations? Boy, Micro$oft really screwed up with that conversion! Yikes!

We now return you to your regularly scheduled thread.
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: jj2007 on January 31, 2024, 12:48:04 PM
Quote from: NoCforMe on January 31, 2024, 11:55:58 AMPDF from Word? Gag, puke, retch, barf

:joking:  :rofl:  :greenclp:

Pdf is crap, indeed. I managed to convert pdf to rtf - just run the attached exe to see HSE's help file in all its beauty (btw the exe was definitely not built with MasmBasic, hehe - look at the size) :cool:

From 311,187 to 4,608 bytes - the power of Assembly!
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: NoCforMe on January 31, 2024, 01:22:35 PM
Quote from: jj2007 on January 31, 2024, 12:48:04 PMPdf is crap, indeed.
Just to be clear, PDFs from Word docs are crap. Other PDFs are fine.

I used to own a print shop (ca. 2005). I was so happy when customers would bring in their jobs to print as PDFs; I could just place it in Adobe InDesign, ship it to my platemaker and off I'd go, perfect every time.
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: HSE on January 31, 2024, 10:33:59 PM
Thanks JJ!

Quote from: jj2007 on January 31, 2024, 12:48:04 PMFrom 311,187 to 4,608 bytes - the power of Assembly!

Not so bad, except that I don't like very much RTF files  :biggrin:

But if using TEX still PDF file is big, I could store RTF internally. I made that with a help in TXT format in MathArtH.
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: HSE on February 01, 2024, 01:14:17 AM
With LaTEX resulting PDF file have 54 Kb, a lot more reasonable, and look good  :thumbsup:

Updated in first post.
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: Biterider on February 01, 2024, 07:10:14 AM
Hi HSE
I have created a test version with your source files. I have changed the interaction macros and added a variable to the object definition that is filled at compile time.
When I run the binary with these changes, it seems to give reasonable results on DC.
Perhaps you can interpret this better than I can.  :rolleyes:

Biterider
Title: Re: Comparing K-Means and Others Algorithms for Data Clustering in Assembly.
Post by: HSE on February 01, 2024, 09:01:37 AM
Hi Biterider!

Quote from: Biterider on February 01, 2024, 07:10:14 AMPerhaps you can interpret this better than I can.  :rolleyes:

Look perfect  :thumbsup:

Thanks, HSE