Unsupervised Learning (Clustering-based Customer Segmentation)

K-Means Clustering

Learning Outcome

Apply K-Means++ for better initialization

Identify K-Means limitations (outliers & shape bias)

Use the Elbow Method to find optimal K

Explain centroid assignment and updating in K-Means

Understand the shift from supervised to unsupervised learning

The Paradigm Shift : Unsupervised Learning

Supervised (Before)

We always had a Target Variable ($Y$). We knew if the customer churned. We knew the patient had heart disease. We were just teaching the machine to find the rules.

We delete the $Y$ column. We just have raw, unlabeled data. No "Yes/No", no prices.

Unsupervised (Now)

The Goal :

We ask the machine: "I don't know what I'm looking at. Please organize this data into natural groups based on their similarities."

Imagine you are the owner of a big pizza company in a huge city.

You have 10,000 customers living in different areas. Every time they order pizza, delivery takes time depending on how far they are.

Now you have money to open only 3 new pizza shops.

You start thinking:

“Where should I build these 3 shops so that all customers get pizza as fast as possible?”

If you choose the wrong locations:

Some customers will be very far
Delivery time will increase
Customers may get unhappy

Enter K-Means (Smart Assistant)

K-Means helps you find the best central locations based on data

You give all customer locations (GPS points) to a smart algorithm.

The algorithm does this:

It studies where customers are located
It groups nearby customers together
It finds the center point of each group

Final Result

Each group = one cluster
Each center = best location for a pizza shop

So, your 3 pizza shops are placed exactly where they minimize delivery distance

The K-Means Dance

Define the number of clusters

STEP 01

CHOOSE K

STEP 02

DROP CENTROIDS

Place starting points randomly on the map.

STEP 03

ASSIGN POINTS

Each data point joins its nearest centroid

STEP 04

UPDATE CENTER

Move centroid to the mean of its group

STEP 05

REPEAT

Loop until centroids stop moving (Convergence)

The Big Question: How Many Pizza Places?

The Problem : How do we know K = 3 is correct? What if we actually need 5 clusters?

The Metric: WCSS

Measures "Total Delivery Distance". It calculates the sum of squared distances between points and their cluster center.

Within-Cluster Sum of Squares

WCSS = Σ Distance(Centroid)²

Within-Cluster Sum of Squares

WCSS = Σ Distance(Centroid)²

The Insight: "The Elbow"

The line drops rapidly, then bends and flattens. The bend point is where adding more clusters stops being valuable.

The Random Initialization Flaw

In Step 2, K-Means drops the starting centroids completely at random.

The Nightmare :

What if it accidentally drops all 3 pizza restaurants on the exact same city block?

The algorithm will get horribly confused, misallocate the customers, and fail to find the true city clusters.

A smarter upgrade. K-Means++ changes the first step.

The Solution (K-Means++) :

It drops the first restaurant, then intentionally drops the second one as far away as possible from the first, ensuring the starting points are evenly spread out.