Write a python program for K means Clustering Algorithm for k=4

import numpy as np;
from numpy import random;
import matplotlib.pyplot as plt;
lb_age=18;
ub_age=60;
lb_salary=10000;
ub_salary=80000;
n=100;
max_iter=500;
age=np.random.randint(lb_age,ub_age,(n));
salary=np.random.randint(lb_salary, ub_salary,(n));
#plt.scatter(age,salary);
#plt.show();
#kmeans clustering
k=4;
A1=[];A2=[];A3=[];A4=[];
B1=[];B2=[];B3=[];B4=[];
A1.append(np.random.randint(lb_age,ub_age));
A2.append(np.random.randint(lb_age,ub_age));
A3.append(np.random.randint(lb_age,ub_age));
A4.append(np.random.randint(lb_age,ub_age));
#print(A1,A2,A3,A4);
B1.append(np.random.randint(lb_salary,ub_salary));
B2.append(np.random.randint(lb_salary,ub_salary));
B3.append(np.random.randint(lb_salary,ub_salary));
B4.append(np.random.randint(lb_salary,ub_salary));
#print(B1,B2,B3,B4);
#kmeans
for k in range(0, max_iter):
    for i in range(0, n):
        d1=np.sqrt(np.power((A1[0]-age[i]),2)+np.power((B1[0]-salary[i]),2));
        d2=np.sqrt(np.power((A2[0]-age[i]),2)+np.power((B2[0]-salary[i]),2));
        d3=np.sqrt(np.power((A3[0]-age[i]),2)+np.power((B3[0]-salary[i]),2));
        d4=np.sqrt(np.power((A4[0]-age[i]),2)+np.power((B4[0]-salary[i]),2));
        d=np.argmin(np.array([d1,d2,d3,d4]));
        if(d==0):
            A1.append(age[i]);
            B1.append(salary[i]);
        elif(d==1):
            A2.append(age[i]);
            B2.append(salary[i]);
        elif(d==2):
            A3.append(age[i]);
            B3.append(salary[i]);
        elif(d==3):
            A4.append(age[i]);
            B4.append(salary[i]);
    A1_avg=sum(A1)/len(A1);
    B1_avg=sum(B1)/len(B1);
    A2_avg=sum(A2)/len(A2);
    B2_avg=sum(B2)/len(B2);
    A3_avg=sum(A3)/len(A3);
    B3_avg=sum(B3)/len(B3);
    A4_avg=sum(A4)/len(A4);
    B4_avg=sum(B4)/len(B4);
    A1=[];A2=[];A3=[];A4=[];
    B1=[];B2=[];B3=[];B4=[];
    A1.append(A1_avg);
    A2.append(A2_avg);
    A3.append(A3_avg);
    A4.append(A4_avg);
    #print(A1,A2,A3,A4);
    B1.append(B1_avg);
    B2.append(B2_avg);
    B3.append(B3_avg);
    B4.append(B4_avg);
    #print(A1,A2,A3,A4);
    #print(B1,B2,B3,B4);
for i in range(0, n):
    d1=np.sqrt(np.power((A1[0]-age[i]),2)+np.power((B1[0]-salary[i]),2));
    d2=np.sqrt(np.power((A2[0]-age[i]),2)+np.power((B2[0]-salary[i]),2));
    d3=np.sqrt(np.power((A3[0]-age[i]),2)+np.power((B3[0]-salary[i]),2));
    d4=np.sqrt(np.power((A4[0]-age[i]),2)+np.power((B4[0]-salary[i]),2));
    d=np.argmin(np.array([d1,d2,d3,d4]));
    if(d==0):
        A1.append(age[i]);
        B1.append(salary[i]);
    elif(d==1):
        A2.append(age[i]);
        B2.append(salary[i]);
    elif(d==2):
        A3.append(age[i]);
        B3.append(salary[i]);
    elif(d==3):
        A4.append(age[i]);
        B4.append(salary[i]);
#print("finalize");
#print(A1,A2,A3,A4);
#print(B1,B2,B3,B4);
#plot
x1=np.array(A1);
y1=np.array(B1);
x2=np.array(A2);
y2=np.array(B2);
x3=np.array(A3);
y3=np.array(B3);
x4=np.array(A4);
y4=np.array(B4);
plt.title('k means,k=4');
plt.xlabel('age');
plt.ylabel('salary');
plt.scatter(x1,y1,label='cluster-1');
plt.scatter(x2,y2,label='cluster-2');
plt.scatter(x3,y3,label='cluster-3');
plt.scatter(x4,y4,label='cluster-4');
plt.legend();
plt.show();
        
Output


Original Title: Implementing K-means Clustering in Python for Data Segmentation

Introduction: Clustering is a popular unsupervised machine learning technique used for grouping similar data points together based on their similarity or distance. K-means clustering is one of the simplest and most widely used algorithms for data clustering. In this blog post, we will implement K-means clustering in Python using the numpy and matplotlib libraries.

Code Explanation: The code begins by importing the necessary libraries, including numpy for numerical computing and matplotlib for data visualization. It then defines the lower and upper bounds for age and salary, and generates random data points for these two features using numpy's randint function.

Next, the code initializes the centroids for the clusters by randomly selecting initial values for age and salary. It then iteratively updates the centroids and assigns data points to their nearest centroid until convergence or until reaching the maximum number of iterations specified.

Inside the loop, the code calculates the Euclidean distance between each data point and the centroids, and assigns the data point to the nearest centroid. It then updates the centroids by calculating the average of the data points assigned to each centroid.

After the loop, the code plots the final clusters using matplotlib, with each cluster represented by a different color. It also adds labels and a legend for better visualization.

Conclusion: K-means clustering is a simple yet powerful technique for data segmentation and grouping similar data points together. In this blog post, we implemented K-means clustering in Python using the numpy and matplotlib libraries. We generated random data points for age and salary, initialized centroids, and iteratively updated the centroids and assigned data points to their nearest centroid. Finally, we plotted the clusters for visualization.

Clustering can be used in various applications such as customer segmentation, image segmentation, anomaly detection, and more. Experimenting with different hyperparameters, initializing centroids, and visualizing the results can help in better understanding the data and gaining insights for further analysis.

Post a Comment

0 Comments