## News:

Message to All Guests
NB: Posting URL's See here: Posted URL Change

## DBScan, Homogeneity and Completeness algorithm

Started by guga, May 11, 2024, 11:39:49 PM

#### guga

Hi guys

Someone succeeded to create a DBScan algorithm on a given data set ?

Im trying to create a dbscan clustering functiojn to analyse a chunck of data (in dwords or bytes or whatever needed input)

The dbscan is defined as in:
https://medium.com/@balajicena1995/dbscan-clustering-2a577d384e61
https://github.com/gyaikhom/dbscan/blob/master/dbscan.c
https://dev.to/rajaniraiyn/dbscan-clustering-algorithm-demystified-1d5o
https://www.codeproject.com/Articles/5129186/Step-by-Step-Guide-to-Implement-Machine-Learning-8
https://en.wikipedia.org/wiki/DBSCAN

The goal is to determine if a certain chunk of data is homogeneous, i mean, if the data inside a given chunck contains similar values. that can be used to identify/distinguish from others (to use in image processing, for example or watermark or object detection etc)

Homogeneity and Completeness are defined in:
https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html
and
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.homogeneity_score.html#sklearn.metrics.homogeneity_score

But, i didn´t understood how this can be implemented. The C version in github is way to complex for me to start a simple algorithm.

Chatgpt gave me some ideas on how to do it in masm, but i got nowhere.

On the example i provided i tried to analyse the following sequence:
{0, 1, 1, 1, 1, 1, 1, 1, 25, 15, 96, 19, 1, 1, 2, 25}

Where 1 is the most common value, so the data may lead to some value of homogeneity and COmpleteness.

When i started, chatgpt created me an scheme as:

QuoteTo determine if pixel values in a sequence of images represent the same pixel or object, especially when the majority of pixels are around a certain value and a few are outliers, you can use clustering algorithms or similarity measures. Here's an approach you can take:

Preprocess the Data: Normalize the pixel values to a common scale if necessary. This ensures that all values are comparable.

Cluster Pixel Values: Use a clustering algorithm such as K-means or DBSCAN to group similar pixel values together. This can help identify clusters of pixels that represent the same object or feature in the images. Set the number of clusters based on the expected number of objects or features in the images.

Identify Outliers: Pixels that do not belong to any cluster or belong to very small clusters can be considered outliers. These outliers may represent noise or background pixels that are different from the main object or feature.

Evaluate Results: Inspect the clusters and outliers to determine if they correspond to different objects or features in the images. Visualizing the clusters can help you understand the grouping of pixel values.

Adjust Parameters: If the results are not satisfactory, adjust the parameters of the clustering algorithm (e.g., number of clusters, distance metric) and re-run the analysis.

Post-processing: Depending on the application, you may need to further process the clustered pixel values to refine the identification of objects or features.

By using clustering algorithms and analyzing the distribution of pixel values in the images, you can determine if the values represent the same pixel or object, and identify outliers that may represent noise or background pixels.

QuoteTo demonstrate how clustering algorithms can be used to analyze the distribution of pixel values in a sequence of images, let's consider a simplified example with a set of 20 images. We'll use the K-means clustering algorithm to group similar pixel values together and identify outliers that may represent noise or background pixels.

Assuming each image has the same size and we want to analyze a specific pixel (e.g., pixel at coordinates (x, y)), here's how you can approach it:

Load the 20 images into memory.

Extract Pixel Values:
For each image, extract the pixel value at the specified coordinates (x, y). You will have 20 pixel values for this specific pixel location.

Normalize Pixel Values:
Normalize the pixel values to a common scale if necessary.

Apply K-means Clustering:
Use the K-means clustering algorithm to cluster the pixel values. Set the number of clusters based on the expected number of objects or features in the images.

Identify Outliers:
Pixels that do not belong to any cluster or belong to very small clusters can be considered outliers. These outliers may represent noise or background pixels.

Here's a Python example using the scikit-learn library to perform K-means clustering on a set of 20 images:
`import numpy as npfrom sklearn.cluster import KMeans# Assume pixel_values is a 1D array containing pixel values at location (x, y) for 20 images# This is a simplified example, and in practice, you would extract pixel values from actual imagespixel_values = np.array([120, 130, 140, 150, 160, 200, 210, 220, 230, 240,                          150, 140, 135, 130, 125, 195, 205, 210, 215, 220])# Reshape the array to have a single feature (required by K-means)pixel_values = pixel_values.reshape(-1, 1)# Apply K-means clustering with 2 clusterskmeans = KMeans(n_clusters=2, random_state=0).fit(pixel_values)# Get the cluster labelscluster_labels = kmeans.labels_# Get the cluster centerscluster_centers = kmeans.cluster_centers_# Identify outliersoutliers = pixel_values[np.where(cluster_labels != np.argmax(np.bincount(cluster_labels)))]print("Cluster Centers:", cluster_centers.flatten())print("Outliers:", outliers.flatten())`
I then asked it to create a similar function in Masm32 and using DbScan instead K-mean

This s the resultant function, ChatGPT gave me
`.DATA    pixel_values DWORD 120, 130, 140, 150, 160, 200, 210, 220, 230, 240, \                    150, 140, 135, 130, 125, 195, 205, 210, 215, 220    num_pixels DWORD 20    epsilon DWORD 15   ; Adjust epsilon as needed    minPts DWORD 3      ; Adjust minPts as needed    cluster_labels DWORD num_pixels DUP(0)    cluster_count DWORD 0.CODE    main PROC        ; Load addresses of data        mov esi, OFFSET pixel_values        mov edi, OFFSET cluster_labels                ; DBSCAN algorithm        mov ecx, num_pixels    outer_loop:        cmp DWORD PTR [edi], 0        jne skip_iteration   ; Skip if point is already assigned to a cluster        mov eax, ecx        dec eax        mov ebx, eax        mov edx, epsilon        shl edx, 2           ; Multiply epsilon by 4 (size of DWORD)        mov esi, OFFSET pixel_values        lea esi, [esi + eax * 4]    inner_loop:        cmp DWORD PTR [edi + eax * 4], 0        jne next_iteration   ; Skip if point is already assigned to a cluster        mov esi, OFFSET pixel_values        lea esi, [esi + ebx * 4]        sub esi, OFFSET pixel_values        mov edx, epsilon        shl edx, 2           ; Multiply epsilon by 4 (size of DWORD)        mov eax, ebx        mov ecx, eax        call distance       ; Calculate distance between points        cmp eax, edx        ; Compare distance with epsilon        jge next_iteration  ; Skip if distance is greater than epsilon        inc ecx             ; Increment count of neighbors        cmp ecx, minPts        jl next_iteration   ; Skip if minimum number of points is not reached        mov eax, ebx        inc cluster_count   ; Assign new cluster ID    assign_cluster:        mov DWORD PTR [edi + eax * 4], cluster_count        jmp next_iteration    skip_iteration:        dec ecx    next_iteration:        dec ebx        jns inner_loop        inc ecx        cmp ecx, num_pixels        jl outer_loop        ; Output cluster labels        mov esi, OFFSET cluster_labels        mov ecx, num_pixels    print_clusters:        mov eax, [esi]        call WriteDec        call Crlf        add esi, 4        loop print_clusters        ; Exit program        call ExitProcess    main ENDP    distance PROC        ; Calculate Euclidean distance between two points        ; Input: ESI - offset of first point, EDI - offset of second point        ; Output: EAX - distance        mov eax, DWORD PTR [esi]        sub eax, DWORD PTR [edi]        imul eax, eax        mov edx, DWORD PTR [esi + 4]        sub edx, DWORD PTR [edi + 4]        imul edx, edx        add eax, edx        mov edx, DWORD PTR [esi + 8]        sub edx, DWORD PTR [edi + 8]        imul edx, edx        add eax, edx        mov edx, DWORD PTR [esi + 12]        sub edx, DWORD PTR [edi + 12]        imul edx, edx        add eax, edx        call Sqrt        ret    distance ENDPEND main`
But this was just to Ok,calculate the cluster_labels. I then asked me to create the other functions such as the cluster_centers and outliers. Which lead me to:
QuoteYes, the function I provided is just to calculate the cluster_labels based on the DBSCAN algorithm. To find the cluster_centers and outliers, you can modify the code to calculate them after assigning the cluster labels. Here's how you can do it:

Cluster Centers: After assigning cluster labels, you can calculate the cluster centers by averaging the pixel values of all points in each cluster.

Outliers: Outliers are points that do not belong to any cluster or belong to very small clusters (noise points). You can identify outliers by checking for points with a cluster label of 0 or labels that are not assigned to a core point (i.e., points with fewer than minPts neighbors).
`.DATA    ...    cluster_centers DWORD 0, 0  ; Store the cluster centers (assuming 2 clusters)    outliers DWORD num_pixels DUP(0)  ; Store the indices of outliers    num_outliers DWORD 0  ; Counter for the number of outliers.CODE    ...    update_cluster_centers PROC        ; Calculate cluster centers        mov esi, OFFSET pixel_values        mov edi, OFFSET cluster_labels        xor ecx, ecx        xor edx, edx    calculate_centers_loop:        cmp DWORD PTR [edi + ecx * 4], cluster_count        jne not_in_cluster        add esi, DWORD PTR [esi + ecx * 4]  ; Add pixel value to sum        inc edx  ; Increment count of points in cluster    not_in_cluster:        inc ecx        cmp ecx, num_pixels        jl calculate_centers_loop        div edx  ; Calculate average pixel value (cluster center)        mov DWORD PTR [cluster_centers + cluster_count * 4], eax        inc cluster_count        ret    update_cluster_centers ENDP    identify_outliers PROC        ; Identify outliers        mov esi, OFFSET cluster_labels        xor ecx, ecx    find_outliers_loop:        cmp DWORD PTR [esi], 0  ; Check if point is not assigned to any cluster        jne not_outlier        mov DWORD PTR [outliers + num_outliers * 4], ecx  ; Add index to outliers        inc num_outliers    not_outlier:        inc ecx        add esi, 4        cmp ecx, num_pixels        jl find_outliers_loop        ret    identify_outliers ENDP    main PROC        ...        call update_cluster_centers        call identify_outliers        ...    main ENDP`

But, this is incorrect and crashed since it is not pointing to a given address in the distance function. I then asked it to explain in details how this thing works...but got more confused then ever. This is what chatgpt wrote:

QuoteLet's walk through the function step by step with the given pixel_values and assuming epsilon is 10 and minPts is 2 for this example:

Initialization:
pixel_values = {120, 130, 140, 150, 160}
cluster_labels = {0, 0, 0, 0, 0}
num_pixels = 5
epsilon = 10
minPts = 2

Outer Loop (for pixel 5, value 160):

Compare cluster label of pixel 5 (cluster_labels[4]) with 0. It is 0.

Set eax to 4 (index of pixel 5).

Set ebx to 4.

Calculate edx (distance threshold): 10 << 2 = 40.

Calculate address of pixel 5 (esi): pixel_values + 4 * 4 = pixel_values + 16.

Inner Loop (for pixel 4, value 150):

Compare cluster label of pixel 5 with 0. It is 0.

Calculate address of pixel 4 (esi): pixel_values + 3 * 4 = pixel_values + 12.

Calculate distance between pixel 5 and pixel 4. Assume distance is 10.

Compare distance (10) with epsilon (40). Distance is less than epsilon.

Increment neighbor count (ecx) to 1.

Continue inner loop for pixel 3.

Continue outer loop for pixel 4.

Continue the same process for pixels 3, 2, and 1.

Output Cluster Labels:
The cluster_labels array after processing may look like: {1, 1, 1, 1, 1} indicating all pixels belong to the same cluster.

Please note that the actual calculation of the distance and clustering logic is not shown here but would be necessary to implement the complete DBSCAN algorithm.

But, i´m failing badly to understand this whole thing.

Can someone help me trying to create such a thing ? Or fix the masm functions chatgpt did ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### zedd151

The few times I had used chatgpt to write code, it used the irvine32 libraries often. I gave up on that soon after.
Seeing in your code snippet from chatgpt, I see 'call Crlf' -- is that from irvine32 library? There is no header there so I don't know for sure.

#### guga

#2
Don´t know if chatgpt uses irvine´s code or not. Chatgpt has several flaws in what concerns trying to write something in masm (or Nasm, FAsm, RosAsm), but it gives some idea how to start or, in some cases do 50% of the work. The problem is that on this example, i´m not being able to follow the logic that i is showing because it seems a bit different than the ones explained in wikipedia and the other links.

It seems a simple loop back on each value that neds to be compared to others, but i´m faoiling badly to understand the logic and the math behind this.

And porting the algo at https://github.com/gyaikhom/dbscan/blob/master/dbscan.c to masm seems to me a bit complex for such a task that seems simple at the 1st .
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### zedd151

Have you tried godbolt.org, to help with the conversion/translation to assembly? (Assuming you have a usable source in another language)
Once converted, could more easily be optimized or change parts of it, to better suit your needs.

#### NoCforMe

Would it be possible for you to provide a link that explains what DBScan is in plain English to someone who's unfamiliar with all that jargon? I really can't make head nor tail of it using those links you provided.

Assembly language programming should be fun. That's why I do it.

#### NoCforMe

I suppose the Wikipedia entry (hate Wikipedia!) is as good as anything I've found so far:
QuoteIt is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed (points with many nearby neighbors), and marks as outliers points that lie alone in low-density regions (those whose nearest neighbors are too far away).

* I had to edit that entry, as it was badly written, as is so much stuff there.

Here's another article, on Medium by a Portuguese writer who writes better English than most native speakers, in pretty plain language.

Enough with the jargon! Tell me what it is, how it works and what it's used for.
Assembly language programming should be fun. That's why I do it.

#### guga

Quote from: sudoku on May 12, 2024, 03:01:02 AMHave you tried godbolt.org, to help with the conversion/translation to assembly? (Assuming you have a usable source in another language)
Once converted, could more easily be optimized or change parts of it, to better suit your needs.

Yeah, i´m doing it. But the C version uses 3d points and i want to use only 1d points. (Linear values of a sequence of images. So, taking pixel at pos x = 10, y = 10 on all images. All of the next images at this particular positions will contains specific values of their pixels (RGB). I need to compute the dbscan of each channel separated. So, dbscan of 1 pixel at a specific position in all subsequent images.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

#7
Quote from: NoCforMe on May 12, 2024, 03:03:29 AMWould it be possible for you to provide a link that explains what DBScan is in plain English to someone who's unfamiliar with all that jargon? I really can't make head nor tail of it using those links you provided.

https://en.wikipedia.org/wiki/DBSCAN
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

#8
Quote from: NoCforMe on May 12, 2024, 03:11:35 AMI suppose the Wikipedia entry (hate Wikipedia!) is as good as anything I've found so far:
QuoteIt is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed (points with many nearby neighbors), and marks as outliers points that lie alone in low-density regions (those whose nearest neighbors are too far away).

* I had to edit that entry, as it was badly written, as is so much stuff there.

Here's another article, on Medium by a Portuguese writer who writes better English than most native speakers, in pretty plain language.

Enough with the jargon! Tell me what it is, how it works and what it's used for.

I´m not a native english speaker. I´ll take a read on that article.

https://towardsdatascience.com/how-dbscan-works-and-why-should-i-use-it-443b4a191c80

If you speak portuguese, i can talk to you in my native language, but it will be confusing and hard for others understand.

Btw....I´m using it to try a better way to identify a watermark on a set of images. I was giving a try using Median or MAD algorithms but it seems DbScan may work better.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### HSE

#9
Hi Guga!

Quote from: guga on May 11, 2024, 11:39:49 PMSomeone succeeded to create a DBScan algorithm on a given data set

An implementation is present in Comparing K-Means and Others Algorithms for Data Clustering in Assembly.

Regards, HSE.
Equations in Assembly: SmplMath

#### guga

Tks Hse. Can you fix the link / It´s not pointing to any address. I managed to succeed to convert the C file, but would be nice take a look on the link you provided.

`;;Homogeneity metric of a cluster labeling given a ground truth.A clustering result satisfies homogeneity if all of its clusters contain only data points which are members of a single class.This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won't change the score value in any way.This metric is not symmetric: switching label_true with label_pred will return the completeness_score which will be different in general.Mathematically, homogeneity has the connotation of invariance, as all components of the equation have the same degree of value whether or not eachof these components are scaled to different values, for example, by multiplication or addition. Cumulative distribution fits this description."The state of having identical cumulative distribution function or values".The function uses a logarithm sum of good clusters found on a chain of data. This forces the result to be between 0 and 1 and enforce for homogeneity.In general,the higher the value (closer to 1) more homogenous is the data.    Parameters:        points(in) - A poiter to a Data formed by a array of point_t structures        MaxClasses(in) - The maximum amout of usable classes (Cluster_Id) available. This parameter takes onto account the Ids bigger then 0.        num_points(in) - The total amount of elements on the data chain.        pHomogeneity(out) - Pointer to a variable that will hold the value of the homogeneity. The size of the variable must be a Real8 (8 bytes).    Return Value: The function does not returns any value. All values are returned in pHomogeneity parameter.                  The range of the result is between 0 and 1    Remarks:        The mathematical equation for this is as follow:                homogeneity = 1 + ((log(cluster_id(0)/TotalCluster_id0)+                           (log(cluster_id(1)/TotalCluster_id1)+                           (log(cluster_id(2)/TotalCluster_id2)+                           (log(cluster_id(3)/TotalCluster_id3)+                           ....                           (log(cluster_id(N)/TotalCluster_idN)+                           ) / NumPoints        But, we exclude Noise (negative Ids) and also 0 Ids. Excluding 0 id is neded because it can also be interpreted as Noise and also there´s no log (0).        A 0 id means the data is closer to a cluster but, it can still be considered as noise. A negativa value (-2 is pure noise) and 0 can represent a 'Border' Noise.        So, in fact, our equaion results only in:        homogeneity = 1 + ((log(cluster_id(1)/TotalCluster_id1)+                           (log(cluster_id(2)/TotalCluster_id2)+                           (log(cluster_id(3)/TotalCluster_id3)+                           ....                           (log(cluster_id(N)/TotalCluster_idN)+                           ) / NumPoints    In other wods, it takes a somatory of the log of ClusterId divided tbe the total amount of elements related to that Id After summing them, it divide by the    total amount of elements of the data chunk (num_points) and finally add 1 onto this result. This will grant the resultant value will always be between    0 and 1.        Example, say you have a data chunk formed by a sequence of data (53 elements = points of x, y, z) where you already calculated theis Cluster_ids    associated with each point of the data chain. Like this:        x     y     z     cluster_id        -----------------------------        1.00  3.00  1.00: 0        1.00  4.00  1.00: 0        1.00  5.00  1.00: 0        1.00  6.00  1.00: 0        2.00  2.00  1.00: 2        2.00  3.00  0.00: 1        2.00  4.00  0.00: 1        2.00  5.00  0.00: 1        2.00  6.00  0.00: 1        2.00  7.00  1.00: 3        3.00  1.00  1.00: 2        3.00  2.00  1.00: 2        3.00  3.00  1.00: 2        3.00  4.00  0.00: 1        3.00  5.00  0.00: 1        3.00  6.00  1.00: 3        3.00  7.00  1.00: 3        4.00  1.00  1.00: 2        4.00  2.00  1.00: 2        4.00  3.00  0.00: 1        4.00  4.00  0.00: 1        4.00  5.00  1.00: -2        4.00  6.00  0.00: 1        4.00  7.00  1.00: 3        4.00  8.00  1.00: 3        5.00  1.00  1.00: 2        5.00  2.00  0.00: 1        5.00  3.00  0.00: 1        5.00  4.00  0.00: 1        5.00  5.00  0.00: 1        5.00  6.00  0.00: 1        5.00  7.00  1.00: 3        5.00  8.00  1.00: 3        6.00  1.00  1.00: 2        6.00  2.00  0.00: 1        6.00  3.00  1.00: 3        6.00  4.00  1.00: 3        6.00  5.00  1.00: 3        6.00  6.00  1.00: 3        6.00  7.00  1.00: 3        7.00  1.00  1.00: 2        7.00  2.00  0.00: 1        7.00  3.00  0.00: 1        7.00  4.00  0.00: 1        7.00  5.00  1.00: 3        8.00  1.00  1.00: 2        8.00  2.00  1.00: 2        8.00  3.00  0.00: 1        8.00  4.00  1.00: 3        8.00  5.00  1.00: 3        8.00  6.00  1.00: 3        9.00  2.00  1.00: 2        9.00  3.00  1.00: 2        From the above data chain (Points) we have the following clusters (groups) from -2 to 3. We need to count how many elements we have on each Cluster_ids                        ClusterId = -2 = 1 elements (Pure Noise)            ClusterId = 0 = 4 elements  (Border Noise)            ClusterId = 1 = 19 elements            ClusterId = 2 = 13 elements            ClusterId = 3 = 16 elements                        Total elements = 53 (including Noise and border noise)            Negative Ids, such as -2 and 0, represents Noise, therefore it is not calculated on the equation of homogeneity.            So, it will result in:                homogeneity = 1 + (log(1/19)+log(2/13)+log(3/16))/53 = 0.8775430643464206899449251...;;[TmpLogValue: R\$ 0]Proc calculate_homogeneity_Log:    Arguments @points, @MaxClasses, @num_points, @pHomogeneity    Local @ClusterCount    Uses ecx, esi, edi    xorpd xmm0 xmm0    ...If D@MaxClasses > 1 ; Skip class = 0 (We added earlier to 1)        mov esi D@points        mov edi 1 ; start with Cluster_Id = 1        .Do            mov D@ClusterCount 0            mov ecx D@num_points            Do                If D\$esi+cluster_idDis = edi                    inc D@ClusterCount                End_If                add esi Size_Of_point_t                dec ecx            Loop_Until ecx = 0            mov esi D@points            cvtsi2sd xmm1 edi ; converts a signed integer to double            cvtsi2sd xmm2 D@ClusterCount | divsd xmm1 xmm2            movsd X\$TmpLogValue xmm1 | movsd xmm2 xmm0            call Sse2_log TmpLogValue, SSE_EXP_REAL8            addsd xmm0 xmm2            inc edi        .Loop_Until edi >= D@MaxClasses        cvtsi2sd xmm1 D@num_points | divsd xmm0 xmm1    ...End_If    mov eax 1 | cvtsi2sd xmm1 eax ; converts a signed integer to double    addsd xmm0 xmm1    mov eax D@pHomogeneity    movsd X\$eax xmm0EndP`
Btw...does it woks for dbscan too ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

#11
Oh, got it..here, right ?

https://masm32.com/board/index.php?topic=11649.0
https://github.com/ASMHSE/Clusters-in-Assembly/tree/main

It do contains dbscan. Very good work. I´ll take a further look later and try to understand
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### HSE

Quote from: guga on May 15, 2024, 02:47:12 AMOh, got it..here, right ?

Exactly
Equations in Assembly: SmplMath

#### six_L

A very interesting topic.

the parameters ("epsilon and MinPts") of DBSCAN can have a significant impact on the clustering results. Careful parameter tuning is often required for optimal performance.

Hi,guga
Do you use the DBSCAN to analyse some image?
Say you, Say me, Say the codes together for ever.

#### guga

Quote from: six_L on May 16, 2024, 07:14:27 PMA very interesting topic.

the parameters ("epsilon and MinPts") of DBSCAN can have a significant impact on the clustering results. Careful parameter tuning is often required for optimal performance.

Hi,guga
Do you use the DBSCAN to analyse some image?
Hi six_L yes, i´m trying to port the JavaScript example on that link on github to assembly to handle images. The javascript example segments the image using DbScan algorithm. I guess i succeeded to port it, but it is incredibly slow. Maybe using a table of precalculated the euclidian distances can help, but i´m not being able to precalculate it correctly. I´m failing understand the math behinds it.

Btw..i attached the files i succeeded to port (The simpler one). Source code is embedded in exe (for RosAsm) and also included the asm file and the necessary example.dat it uses to do the calculations. Also attached the source code splitted in their own titles to make easier to follow. The part of the code itself related to dbscan is at part01.asm. Top.asm ans SSE_Macros.asm are only the macros i used for that file. The other asm files are simply additional functions for the console and Fast Logarithm functions

The output is this (Where -2 = Pure noise and 0 means noise but more close to the clusters, i suppose:
QuoteEpsilon: 1.000000
Minimum points: 2
Homogeneity: 0.000000
Homogeneity(log): 0.877543
Number of points: 53
x    y    z    cluster_id
-----------------------------
1.00  3.00  1.00: 0
1.00  4.00  1.00: 0
1.00  5.00  1.00: 0
1.00  6.00  1.00: 0
2.00  2.00  1.00: 2
2.00  3.00  0.00: 1
2.00  4.00  0.00: 1
2.00  5.00  0.00: 1
2.00  6.00  0.00: 1
2.00  7.00  1.00: 3
3.00  1.00  1.00: 2
3.00  2.00  1.00: 2
3.00  3.00  1.00: 2
3.00  4.00  0.00: 1
3.00  5.00  0.00: 1
3.00  6.00  1.00: 3
3.00  7.00  1.00: 3
4.00  1.00  1.00: 2
4.00  2.00  1.00: 2
4.00  3.00  0.00: 1
4.00  4.00  0.00: 1
4.00  5.00  1.00: -2
4.00  6.00  0.00: 1
4.00  7.00  1.00: 3
4.00  8.00  1.00: 3
5.00  1.00  1.00: 2
5.00  2.00  0.00: 1
5.00  3.00  0.00: 1
5.00  4.00  0.00: 1
5.00  5.00  0.00: 1
5.00  6.00  0.00: 1
5.00  7.00  1.00: 3
5.00  8.00  1.00: 3
6.00  1.00  1.00: 2
6.00  2.00  0.00: 1
6.00  3.00  1.00: 3
6.00  4.00  1.00: 3
6.00  5.00  1.00: 3
6.00  6.00  1.00: 3
6.00  7.00  1.00: 3
7.00  1.00  1.00: 2
7.00  2.00  0.00: 1
7.00  3.00  0.00: 1
7.00  4.00  0.00: 1
7.00  5.00  1.00: 3
8.00  1.00  1.00: 2
8.00  2.00  1.00: 2
8.00  3.00  0.00: 1
8.00  4.00  1.00: 3
8.00  5.00  1.00: 3
8.00  6.00  1.00: 3
9.00  2.00  1.00: 2
9.00  3.00  1.00: 2

Press enter to exit...

About epsilon variable, it seems that there is a way to pre-calculate it so it can maximize the results of homogeneity, but i´m still stuck at the precalculatd euclidian distance table
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com