Quote from: NoCforMe on Today at 08:32:10 AMRemember high-school algebra?
Given 2 points, (x1,y1) & (x2, y2):m = y2 - y1 / x2 - x1
m is the slope.
m = y2 - y1 / x2 - x1
Quote from: NoCforMe on Today at 06:57:57 AMHint: at least two ways to do that:
2. Check the value of y each time through the loop to see if it's at the bottom of the screen.
Quote from: NoCforMe on Today at 07:29:05 AMStupid dog. Patient turtle.
dec cx
jmp DrawLine
that jump should be jnz maybe, rather than jmp???cmp cx, 0
je EndLine
further up in the code..model small
Stack SEGMENT STACK
DW 256 DUP(?)
Stack ENDS
.data
red db 028h
x DW ?
y DW ?
slope DW ?
.code
_main proc
mov ax, @data
mov ds, ax
mov ax, 0A000h ;video memory
mov es, ax
xor di, di
mov ah, 0 ; 320 x 200
mov al, 013h
int 10h
mov x, 0 ; Starting Coordinates
mov y, 0
mov cx, 100 ; 100 pixel line
mov slope, 3 ; Slope of the line
DrawLine:
mov ax, y ; Calculate offset in video memory
mov bx, 320 ; Y * 320 + X
mul bx
add ax, x
mov di, ax ; Set DI to our offset value
mov al, red
mov es:[di], al ; Write a red pixel to memory
cmp cx, 0
je EndLine
mov ax, y ; Add the slope to the line
add ax, slope
mov y, ax
inc x
dec cx
jmp DrawLine
EndLine:
mov ah, 0 ; wait for key press
int 16h
mov ax, 04c00h ; exit
int 21h
_main endp
end _main
Main:
....
call GetMinPoints DBSCAN_DIM_1D
mov D$MinPoints eax
call GetEpsilon D$points, D$Num_Points, Epsilon, DBSCAN_DIM_1D
call dbscan D$points, D$Num_Points, Epsilon, D$MinPoints, Homogeneity, HomogeneityLog
....
;;
Automatically find epsilon
http://www.sefidian.com/2022/12/18/how-to-determine-epsilon-and-minpts-parameters-of-dbscan-clustering/
https://mrinalyadav7.medium.com/dbscan-algorithm-c894701306d5
https://hyperskill.org/learn/step/31529
;;
[DBSCAN_DIM_1D 1]
[DBSCAN_DIM_2D 2]
[DBSCAN_DIM_3D 3]
Proc GetMinPoints:
Arguments @DimensionFlags
mov eax D@DimensionFlags
shl eax 1
EndP
Proc GetEpsilon:
Arguments @points, @num_points, @pOutput, @Flag
Local @k_distances, @kFactor
Uses ebx, esi, edi, ecx, edx
xor eax eax
On D@num_points = 0, ExitP
C_call 'msvcrt.calloc' D@num_points, 8
mov D@k_distances eax
mov eax D@Flag
.If_Or eax = DBSCAN_DIM_1D, eax = 0
mov eax 2 ; dimensions*2) = 1*2
.Else_If eax = DBSCAN_DIM_2D
mov eax 4 ; dimensions*2) = 2*2
.Else_If eax = DBSCAN_DIM_3D
mov eax 6 ; dimensions*2) = 3*2
.Else_If_And eax > 3, eax < 7
mov eax 7
.Else_If eax > 10
inc eax
.End_If
mov D@kFactor eax
; Calculate k-distances for each point
call calculate_k_distances D@points, D@num_points, D@kFactor, D@k_distances
call find_knee_point D@k_distances, D@num_points
mov eax D@pOutput | fstp R$eax
C_call 'msvcrt.free' D@k_distances
EndP
; Function to calculate k-distance for each point
Proc calculate_k_distances:
Arguments @points, @num_points, @kFactor, @k_distances
local @Distances, @PointsIPos
Uses ebx, esi, edi, ecx, edx
xor eax eax
On D@num_points = 0, ExitP
C_call 'msvcrt.calloc' D@num_points, 8
mov D@Distances eax
mov ebx eax
mov esi D@points
mov D@PointsIPos esi
xor edi edi
.Do
mov esi D@points
xor ecx ecx
Do
call euclidean_dist D@PointsIPos, esi
fstp R$ebx+ecx*8
add esi Size_Of_point_t
inc ecx
Loop_Until ecx >= D@num_points ; jl
C_call 'msvcrt.qsort' ebx, D@num_points, 8, compare_doubles
; k-th nearest neighbor distance
mov eax D@kFactor
mov edx D@k_distances
fld R$ebx+eax*8 | fstp R$edx+edi*8
add D@PointsIPos Size_Of_point_t
inc edi
.Loop_Until edi >= D@num_points
C_call 'msvcrt.free' D@Distances
EndP
Proc compare_doubles:
Arguments @Arg1, @Arg2
mov eax D@Arg1 | movsd XMM0 X$eax
mov eax D@Arg2 | movsd XMM1 X$eax
xor eax eax
SSE_D_If xmm1 > xmm0
mov eax 0-1
SSE_D_Else_If xmm1 < xmm0
mov eax 1
SSE_D_End_If
EndSTD
[KneeRate: R$ 0.1]
Proc find_knee_point:
Arguments @k_Distance, @Num_Points
Local @KneeIndex
Uses ebx, esi, ecx, edx
mov ebx D@k_Distance
mov esi D@Num_Points
C_call 'msvcrt.qsort' ebx, esi, 8, compare_doubles
fild D@Num_Points | fmul R$KneeRate | fistp D@KneeIndex
;fld R$ebx;+eax*8
;mov eax 52 | fld R$ebx+eax*8
mov eax D@KneeIndex | fld R$ebx+eax*8
EndP
Quoteε: The value for ε can then be chosen by using a k-distance graph, plotting the distance to the k = minPts-1 nearest neighbor ordered from the largest to the smallest value. Good values of ε are where this plot shows an "elbow": if ε is chosen much too small, a large part of the data will not be clustered; whereas for a too high value of ε, clusters will merge and the majority of objects will be in the same cluster. In general, small values of ε are preferable, and as a rule of thumb, only a small fraction of points should be within this distance of each other.
QuoteThe choice of k is not strictly fixed and can vary based on the characteristics of the data and the specific requirements of the clustering task. Here's a more detailed explanation of how to determine an appropriate k value for different dimensions:
1D Data
For 1D data, clusters are linear, and local density is easier to determine. Therefore, a small k value is usually sufficient.
k = 2: Using the nearest neighbor (1st neighbor) can be enough to define local density in 1D data.
2D Data
For 2D data, clusters are more complex, and you generally need a slightly larger k to capture the local density effectively.
k = 4: This is a commonly used heuristic. A point and its three nearest neighbors can adequately define a local dense region in 2D space.
3D Data
For 3D data, the complexity increases due to the additional dimension, requiring a larger k value to capture the local density accurately.
k = 6: This value is chosen to include a point and its five nearest neighbors, which helps define the local density in 3D space.
k = 7-10: Depending on the noise level and the expected cluster structure, you might need a larger k to ensure robustness.
Choosing k Based on Dimensions
The heuristic k = dimensions + 1 is a starting point, but it may need to be adjusted based on empirical observations and the specific dataset. The values mentioned (e.g., k = 6 for 3D) come from practical experience and empirical studies in clustering literature, where a slightly larger k helps to better capture the density in higher dimensions.
Example Adjustments in Code
Here is how you can set k based on the dimensionality of your data and adjust it if necessary:
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
// Structure to represent a point in 3D (adjust for higher dimensions if needed)
typedef struct {
double x, y, z; // Add more dimensions as necessary
} point_t;
// Function to calculate Euclidean distance between two points
double euclidean_distance(point_t *a, point_t *b) {
return sqrt(pow(a->x - b->x, 2) + pow(a->y - b->y, 2) + pow(a->z - b->z, 2)); // Adjust for dimensions
}
// Function to calculate k-distance for each point
void calculate_k_distances(point_t *points, int num_points, int k, double *k_distances) {
for (int i = 0; i < num_points; ++i) {
double distances[num_points];
for (int j = 0; j < num_points; ++j) {
distances[j] = euclidean_distance(&points[i], &points[j]);
}
// Sort distances to find the k-th nearest neighbor
qsort(distances, num_points, sizeof(double), compare_doubles);
k_distances[i] = distances[k]; // k-th nearest neighbor distance
}
}
// Comparison function for qsort
int compare_doubles(const void *a, const void *b) {
double arg1 = *(const double*)a;
double arg2 = *(const double*)b;
if (arg1 < arg2) return -1;
if (arg1 > arg2) return 1;
return 0;
}
// Function to find the "knee" point in k-distances plot
double find_knee_point(double *k_distances, int num_points) {
// Sort the k-distances to create the k-distances plot
qsort(k_distances, num_points, sizeof(double), compare_doubles);
// Simple heuristic: return the distance at the 90th percentile
return k_distances[(int)(0.9 * num_points)];
}
int main() {
// Assuming points is already populated
point_t *points = ...; // Replace with actual points initialization
int num_points = ...; // Replace with actual number of points
int dimensions = 3; // Adjust based on the dimensionality of your data
int k = dimensions + 3; // Adjusted heuristic for k (example: dimensions + 3)
double k_distances[num_points];
// Calculate k-distances for each point
calculate_k_distances(points, num_points, k, k_distances);
// Find a suitable epsilon using the knee point of the k-distances plot
double epsilon = find_knee_point(k_distances, num_points);
// Determine minPts, often set as 2 * dimensions
int minPts = 2 * dimensions;
// Now run DBSCAN with the calculated epsilon and minPts
int num_clusters = dbscan_run(points, epsilon, minPts, num_points);
// Continue with your process...
return 0;
}
QuoteSummary
1D Data: k = 2 is typically sufficient.
2D Data: k = 4 is a commonly used heuristic.
3D Data: k = 6 or k = 7-10 based on noise and cluster structure.
Higher Dimensions: k = dimensions + 1 or slightly higher based on empirical observation.
These values are starting points, and you may need to adjust k based on your specific dataset and the results you observe.
Epsilon: 1.000000
Minimum points: 2
Homogeneity: 0.000000
Homogeneity(log): 0.877543
Number of points: 53
x y z cluster_id
-----------------------------
1.00 3.00 1.00: 0
1.00 4.00 1.00: 0
1.00 5.00 1.00: 0
1.00 6.00 1.00: 0
2.00 2.00 1.00: 2
2.00 3.00 0.00: 1
2.00 4.00 0.00: 1
2.00 5.00 0.00: 1
2.00 6.00 0.00: 1
2.00 7.00 1.00: 3
3.00 1.00 1.00: 2
3.00 2.00 1.00: 2
3.00 3.00 1.00: 2
3.00 4.00 0.00: 1
3.00 5.00 0.00: 1
3.00 6.00 1.00: 3
3.00 7.00 1.00: 3
4.00 1.00 1.00: 2
4.00 2.00 1.00: 2
4.00 3.00 0.00: 1
4.00 4.00 0.00: 1
4.00 5.00 1.00: -2
4.00 6.00 0.00: 1
4.00 7.00 1.00: 3
4.00 8.00 1.00: 3
5.00 1.00 1.00: 2
5.00 2.00 0.00: 1
5.00 3.00 0.00: 1
5.00 4.00 0.00: 1
5.00 5.00 0.00: 1
5.00 6.00 0.00: 1
5.00 7.00 1.00: 3
5.00 8.00 1.00: 3
6.00 1.00 1.00: 2
6.00 2.00 0.00: 1
6.00 3.00 1.00: 3
6.00 4.00 1.00: 3
6.00 5.00 1.00: 3
6.00 6.00 1.00: 3
6.00 7.00 1.00: 3
7.00 1.00 1.00: 2
7.00 2.00 0.00: 1
7.00 3.00 0.00: 1
7.00 4.00 0.00: 1
7.00 5.00 1.00: 3
8.00 1.00 1.00: 2
8.00 2.00 1.00: 2
8.00 3.00 0.00: 1
8.00 4.00 1.00: 3
8.00 5.00 1.00: 3
8.00 6.00 1.00: 3
9.00 2.00 1.00: 2
9.00 3.00 1.00: 2
Press enter to exit...