\(\)Unsupervised Learning\(\)

Somsak Chanaim

International College of Digital Innovation, CMU

October 1, 2025

Unsupervised Learning

Unsupervised Learning is one of the approaches in Machine Learning that analyzes data without a target variable (no labels).

In other words, the model must discover the structure or hidden patterns in the data on its own, without being told in advance how the data should be grouped or related. In some cases,

we may not even know how many groups should exist in the data.

Principles of Unsupervised Learning

Instead of learning from examples with predefined answers (labels) as in Supervised Learning, in the case of Unsupervised Learning, the model works by:

Analyzing the structure of the data
Discovering hidden patterns or relationships
Grouping similar data together
Reducing data dimensionality to simplify analysis

Main Types of Unsupervised Learning

1. Clustering

Clustering automatically groups data so that items within the same group are similar to each other, while being different from items in other groups.
Example algorithms include:
- (k)-means clustering
- Hierarchical clustering
- DBSCAN (Density-Based Spatial Clustering)

2. Dimensionality Reduction

Used to reduce the number of variables in the data while preserving as much important information as possible.
Useful for addressing the “Curse of Dimensionality” and allowing models to learn faster.
Example algorithms include:
- PCA (Principal Component Analysis)
- t-SNE (t-Distributed Stochastic Neighbor Embedding)
- UMAP (Uniform Manifold Approximation and Projection)

3. Anomaly Detection

Used to identify values that deviate from the normal pattern in a dataset, such as detecting fraud or system errors.
Example algorithms include:
- Isolation Forest
- One-Class SVM
- Autoencoders (Deep Learning)

4. Association Rule Learning

Used to discover relationships between variables in a dataset.
Example: Market Basket Analysis, which identifies items that customers often purchase together.
Example algorithms include:
- Apriori
- FP-Growth

Real-World Applications of Unsupervised Learning

Customer Segmentation
Companies can use clustering to group customers based on purchasing behavior, e.g., loyal customers, new customers, or churned customers.
Topic Modeling
Unsupervised Learning can analyze and group articles or reviews. For example, LDA (Latent Dirichlet Allocation) is used to classify document categories.

Fraud Detection
Anomaly Detection helps identify suspicious transactions, such as credit card use in very different locations within a short period.
Dimensionality Reduction
PCA can be applied to reduce the number of variables before feeding data into a Machine Learning model, simplifying the process.

Advantages and Disadvantages of Unsupervised Learning

Advantages

Does not require labeled data, reducing the cost of data preparation.
Automatically discovers hidden patterns in the data.
Works well with large and complex datasets.
Useful for uncovering new knowledge that humans may not easily notice.

Disadvantages

Results are often harder to interpret compared to Supervised Learning.
Setting the number of clusters or certain parameters requires expertise.
May struggle with noisy or poorly prepared data.

Interactive: Clustering

(async () => {

  // ====== UI skeleton ======
  const box = html`<div style="max-width:1080px;font:14px system-ui;">
    <div id="ctrl" style="display:grid;grid-template-columns:repeat(3,minmax(0,1fr));gap:10px;margin-bottom:10px;"></div>
    <div id="plot"></div>
    <div id="dash" style="display:grid;grid-template-columns:repeat(2,minmax(0,1fr));gap:10px;margin-top:10px;"></div>
    <div id="note" style="margin-top:6px;color:#444"></div>
  </div>`;
  const ctrl    = box.querySelector("#ctrl");
  const plotDiv = box.querySelector("#plot");
  const dash    = box.querySelector("#dash");
  const note    = box.querySelector("#note");

  // ====== Controls ======
  const algoS     = Inputs.select(["DBSCAN","K-means","Hierarchical"], {value:"DBSCAN", label:"Algorithm"});
  const seedS     = Inputs.range([1,9999],  {value: 1234, step: 1,    label: "Seed"});
  const nS        = Inputs.range([100,800], {value: 400,  step: 50,   label: "Points (n)"});
  const shapeS    = Inputs.select(["blobs","moons","circles"], {value:"circles", label:"Dataset"});
  const noiseS    = Inputs.range([0,0.3],   {value: 0.02, step: 0.01, label: "Noise ratio"});

  // K-means (no Max iters control; fixed at 50)
  const kS         = Inputs.range([2,12],   {value: 3,  step: 1, label: "k (K-means)"}); 
  const showElbow  = Inputs.toggle({label: "Show Elbow & Silhouette (K-means)", value: true});

  // DBSCAN
  const epsS      = Inputs.range([0.02,0.5],{value: 0.045, step: 0.005, label: "ε (eps)"});
  const minPtsS   = Inputs.range([3,30],    {value: 10,    step: 1,     label: "minPts"});
  const showKDist = Inputs.toggle({label: "Show k-distance (DBSCAN)", value: true});

  // Hierarchical
  const hkS        = Inputs.range([2,12],   {value: 3, step: 1, label: "k (Hierarchical)"}); 
  const linkageS   = Inputs.select(["single","complete","average","ward"], {value:"single", label:"Linkage"});

  const btnRes  = Inputs.button("Resample");

  // ====== Layout (3 columns) ======
  const col1 = html`<div></div>`;
  const col2 = html`<div></div>`;
  const col3 = html`<div></div>`;

  // Column 1: general
  col1.append(algoS, seedS, nS, shapeS, noiseS);

  // Column 2: DBSCAN block -> K-means block
  col2.append(
    epsS, minPtsS, showKDist,
    kS
  );

  // Column 3: Hierarchical + Resample + Show Elbow
  col3.append(
    hkS, linkageS,
    btnRes,
    showElbow
  );

  ctrl.append(col1, col2, col3);

  // ====== Utils ======
  const TAU = 2*Math.PI;
  const clamp = (x,a,b)=> Math.max(a, Math.min(b,x));
  function mulberry32(a){ return function(){ let t=a+=0x6D2B79F5; t=Math.imul(t^t>>>15,t|1); t^=t+Math.imul(t^t>>>7,t|61); return ((t^t>>>14)>>>0)/4294967296 } }
  function rnorm(rng, m=0, s=1){ let u=0,v=0; while(u===0) u=rng(); while(v===0) v=rng(); return m + s*Math.sqrt(-2*Math.log(u))*Math.cos(TAU*v); }
  function dist(a,b){ return Math.hypot(a.x-b.x, a.y-b.y); }
  function dist2(a,b){ const dx=a.x-b.x, dy=a.y-b.y; return dx*dx+dy*dy; }
  const colorFor = (cid)=> cid<0 ? "#888" : `hsl(${(cid*65)%360} 70% 45%)`;

  // ====== Data generators ======
  function genBlobs(n, rng){
    const k=3, centers = [{x:0.25,y:0.3},{x:0.7,y:0.35},{x:0.5,y:0.75}], sds=[0.06,0.07,0.05];
    const arr = [];
    for(let i=0;i<n;i++){
      const c = (rng()*k)|0;
      arr.push({ x: clamp(rnorm(rng, centers[c].x, sds[c]), 0,1),
                 y: clamp(rnorm(rng, centers[c].y, sds[c]), 0,1) });
    }
    return arr;
  }
  function genMoons(n, rng){
    const arr = [];
    for(let i=0;i<n;i++){
      const t = rng()*Math.PI;
      if(i<n/2){
        const r=0.28, cx=0.35, cy=0.5;
        const x = cx + r*Math.cos(t) + r*0.12*rnorm(rng);
        const y = cy + r*Math.sin(t) + r*0.12*rnorm(rng);
        arr.push({x: clamp(x,0,1), y: clamp(y,0,1)});
      }else{
        const r=0.28, cx=0.65, cy=0.5;
        const x = cx + r*Math.cos(t+Math.PI) + r*0.12*rnorm(rng);
        const y = cy + r*Math.sin(t+Math.PI) + r*0.12*rnorm(rng);
        arr.push({x: clamp(x,0,1), y: clamp(y,0,1)});
      }
    }
    return arr;
  }
  function genCircles(n, rng){
    const n1 = Math.floor(n/2), n2 = n - n1;
    const cx=0.5, cy=0.5;
    const r1 = 0.20, r2 = 0.40;
    const sRad = 0.010, sTan = 0.010;
    const pts = [];
    for (let i=0; i<n1; i++){
      const a  = rng()*2*Math.PI;
      const rr = r1 + sRad*rnorm(rng);
      pts.push({ x: clamp(cx + rr*Math.cos(a) + sTan*rnorm(rng), 0,1),
                 y: clamp(cy + rr*Math.sin(a) + sTan*rnorm(rng), 0,1) });
    }
    for (let i=0; i<n2; i++){
      const a  = rng()*2*Math.PI;
      const rr = r2 + sRad*rnorm(rng);
      pts.push({ x: clamp(cx + rr*Math.cos(a) + sTan*rnorm(rng), 0,1),
                 y: clamp(cy + rr*Math.sin(a) + sTan*rnorm(rng), 0,1) });
    }
    return pts;
  }
  function addNoise(points, rng, ratio){
    const k = Math.round(points.length*ratio);
    for(let i=0;i<k;i++) points.push({x:rng(), y:rng(), noise:true});
    return points;
  }

  // ====== DBSCAN ======
  function dbscan(points, eps, minPts){
    const n = points.length, eps2 = eps*eps;
    const visited = new Array(n).fill(false);
    const assigned= new Array(n).fill(false);
    const isCore  = new Array(n).fill(false);
    const neighbors = new Array(n);

    for(let i=0;i<n;i++){
      const neigh = [];
      for(let j=0;j<n;j++){
        if(i===j) continue;
        if(dist2(points[i], points[j]) <= eps2) neigh.push(j);
      }
      neighbors[i] = neigh;
      if(neigh.length+1 >= minPts) isCore[i] = true;
    }

    let clusterId = 0;
    const labels = new Array(n).fill(-1);

    for(let i=0;i<n;i++){
      if(visited[i]) continue;
      visited[i]=true;
      if(!isCore[i]) continue;

      labels[i] = clusterId;
      assigned[i]=true;

      const queue=[...neighbors[i]];
      while(queue.length){
        const q = queue.pop();
        if(!visited[q]){
          visited[q]=true;
          if(isCore[q]) queue.push(...neighbors[q]);
        }
        if(!assigned[q]){
          labels[q]=clusterId;
          assigned[q]=true;
        }
      }
      clusterId++;
    }

    for(let i=0;i<n;i++){
      if(assigned[i]) continue;
      let cid = -1;
      for(const j of neighbors[i]){
        if(isCore[j] && labels[j]!==-1){ cid = labels[j]; break; }
      }
      labels[i] = cid;
    }

    return points.map((p, idx) => ({
      ...p,
      cid: labels[idx],
      core: isCore[idx],
      noise: labels[idx]===-1
    }));
  }

  // ====== K-distance helper ======
  function kDistance(points, k){
    const arr = [];
    for(let i=0;i<points.length;i++){
      const dists = [];
      for(let j=0;j<points.length;j++){
        if(i===j) continue;
        dists.push(dist(points[i], points[j]));
      }
      dists.sort((a,b)=>a-b);
      const kth = dists[Math.max(0, Math.min(k-1, dists.length-1))];
      arr.push(kth);
    }
    arr.sort((a,b)=> b-a);
    return arr.map((v,i)=> ({i, v}));
  }

  // ====== K-means (k-means++ init) ======
  function kmeansPPInit(points, k, rng){
    const n = points.length;
    const centers = [];
    centers.push(points[(rng()*n)|0]);
    const d2 = new Array(n).fill(0);
    while(centers.length < k){
      for(let i=0;i<n;i++){
        let best = Infinity;
        for(const c of centers){
          const vv = dist2(points[i], c);
          if(vv<best) best = vv;
        }
        d2[i] = best;
      }
      const sum = d2.reduce((s,x)=>s+x,0) || 1e-12;
      let r = rng()*sum;
      let pick = 0;
      for(let i=0;i<n;i++){ r -= d2[i]; if(r<=0){ pick=i; break; } }
      centers.push({x: points[pick].x, y: points[pick].y});
    }
    return centers;
  }
  function inertia(points, labels, centers){
    let tot=0;
    for(let i=0;i<points.length;i++){
      const c = centers[labels[i]];
      tot += dist2(points[i], c);
    }
    return tot;
  }
  function kmeans(points, k, maxIters, rng){
    const n = points.length;
    k = Math.max(1, Math.min(k, n));
    let centers = kmeansPPInit(points, k, rng).map(c => ({x:c.x, y:c.y}));
    let labels  = new Array(n).fill(0);

    for(let iter=0; iter<maxIters; iter++){
      let changed = false;
      for(let i=0;i<n;i++){
        let best=-1, bestd=Infinity;
        for(let c=0;c<k;c++){
          const d = dist2(points[i], centers[c]);
          if(d < bestd){ bestd=d; best=c; }
        }
        if(labels[i] !== best){ labels[i]=best; changed = true; }
      }
      const sum = Array.from({length:k}, _ => ({x:0,y:0,c:0}));
      for(let i=0;i<n;i++){ const c=labels[i]; sum[c].x += points[i].x; sum[c].y += points[i].y; sum[c].c++; }
      for(let c=0;c<k;c++){
        if(sum[c].c>0){
          centers[c].x = sum[c].x / sum[c].c;
          centers[c].y = sum[c].y / sum[c].c;
        }
      }
      if(!changed) break;
    }
    const labeled = points.map((p,i)=> ({...p, cid: labels[i], core:false, noise:false}));
    const I = inertia(points, labels, centers);
    return {labeled, centers, inertia: I};
  }

  // ====== Hierarchical core (agglomerative) ======
  function agglomerative(points, linkage="single"){
    const n = points.length;
    const size = [], alive = [], centroid = [];
    for(let i=0;i<n;i++){ size[i]=1; alive[i]=true; centroid[i]={x:points[i].x, y:points[i].y}; }

    const N = 2*n-1;
    const distMat = Array.from({length:N}, _ => new Map());
    function setD(i,j,val){ if(i>j){const t=i;i=j;j=t;} distMat[i].set(j,val); }
    function getD(i,j){ if(i===j) return 0; if(i>j){const t=i;i=j;j=t;} return distMat[i].get(j); }

    const useSq = (linkage==="ward");
    for(let i=0;i<n;i++){
      for(let j=i+1;j<n;j++){
        const d = useSq ? dist2(points[i], points[j]) : dist(points[i], points[j]);
        setD(i,j,d);
      }
    }

    const merges = [];
    let nextId = n;

    function pickMin(){
      let bi=-1, bj=-1, bd=Infinity;
      for(let i=0;i<nextId;i++){
        if(!alive[i]) continue;
        const row = distMat[i];
        for(const [j,d] of row){
          if(j>=nextId || !alive[j]) continue;
          if(d < bd){ bd=d; bi=i; bj=j; }
        }
      }
      return [bi,bj,bd];
    }

    while(true){
      let aliveCnt=0;
      for(let i=0;i<nextId;i++) if(alive[i]) aliveCnt++;
      if(aliveCnt<=1) break;

      const [i,j,d] = pickMin();
      if(i===-1) break;

      const m = nextId++;
      alive[i]=false; alive[j]=false; alive[m]=true;
      const si=size[i], sj=size[j];
      size[m]=si+sj;
      centroid[m] = { x:(centroid[i].x*si + centroid[j].x*sj)/(si+sj),
                      y:(centroid[i].y*si + centroid[j].y*sj)/(si+sj) };
      const height = (linkage==="ward") ? Math.sqrt(d) : d;
      merges.push({left:i, right:j, height, newId:m, size:size[m]});

      for(let k=0;k<nextId;k++){
        if(!alive[k] || k===m) continue;
        let dik = getD(i,k), djk = getD(j,k);
        if(dik===undefined) dik = (linkage==="ward") ? dist2(centroid[i], centroid[k]) : dist(centroid[i], centroid[k]);
        if(djk===undefined) djk = (linkage==="ward") ? dist2(centroid[j], centroid[k]) : dist(centroid[j], centroid[k]);

        let dm;
        if(linkage==="single"){
          dm = Math.min(dik, djk);
        } else if(linkage==="complete"){
          dm = Math.max(dik, djk);
        } else if(linkage==="average"){
          dm = (si/(si+sj))*dik + (sj/(si+sj))*djk;
        } else { // ward
          const sk = size[k];
          dm = ((si+sk)/(si+sj+sk))*dik + ((sj+sk)/(si+sj+sk))*djk - (sk/(si+sj+sk))*getD(i,j);
        }
        setD(m,k, dm);
      }
    }
    return {merges, nLeaves:n};
  }

  // ====== Label helpers ======
  function relabelConsecutive(points){
    const map = new Map(); let next=0;
    return points.map(p => {
      if(!map.has(p.cid)) map.set(p.cid, next++);
      return {...p, cid: map.get(p.cid)};
    });
  }
  function centroidsFromLabels(points, K){
    const cents = Array.from({length:K}, _=>({x:0,y:0,c:0}));
    for(const p of points){ const k=p.cid; if(k>=0 && k<K){ cents[k].x+=p.x; cents[k].y+=p.y; cents[k].c++; } }
    for(const c of cents){ if(c.c>0){ c.x/=c.c; c.y/=c.c; } else { c.x=NaN; c.y=NaN; } }
    return cents;
  }
  function centroidOfCluster(points, cid){
    let sx=0, sy=0, c=0;
    for(const p of points) if(p.cid===cid){ sx+=p.x; sy+=p.y; c++; }
    return c>0 ? {x:sx/c,y:sy/c,c} : {x:NaN,y:NaN,c:0};
  }

  // ====== Force exactly K clusters for Hierarchical ======
  function enforceExactK(labeled, K){
    labeled = relabelConsecutive(labeled);
    let m = new Set(labeled.map(p=>p.cid)).size;

    const counts = () => {
      const mapC = new Map();
      for (const p of labeled) mapC.set(p.cid, (mapC.get(p.cid)||0)+1);
      return mapC;
    };

    while (m < K){
      const cnt = counts();
      let big=-1, bs=-1;
      for(const [cid, c] of cnt){ if(c>bs){ bs=c; big=cid; } }
      if (big===-1) break;
      const cent = centroidOfCluster(labeled, big);
      let farIdx=-1, farD=-1;
      for(let i=0;i<labeled.length;i++){
        if(labeled[i].cid!==big) continue;
        const d = dist(labeled[i], cent);
        if(d>farD){ farD=d; farIdx=i; }
      }
      const newId = Math.max(...labeled.map(p=>p.cid))+1;
      if (farIdx>=0) labeled[farIdx] = {...labeled[farIdx], cid:newId};
      labeled = relabelConsecutive(labeled);
      m = new Set(labeled.map(p=>p.cid)).size;
    }

    while (m > K){
      const curIds = Array.from(new Set(labeled.map(p=>p.cid))).sort((a,b)=>a-b);
      const cents = curIds.map(cid => ({cid, ...centroidOfCluster(labeled, cid)}));
      let pa=-1, pb=-1, best=Infinity;
      for(let i=0;i<cents.length;i++){
        for(let j=i+1;j<cents.length;j++){
          const d = dist(cents[i], cents[j]);
          if(d<best){ best=d; pa=cents[i].cid; pb=cents[j].cid; }
        }
      }
      if (pa===-1 || pb===-1) break;
      labeled = labeled.map(p => p.cid===pb ? ({...p, cid: pa}) : p);
      labeled = relabelConsecutive(labeled);
      m = new Set(labeled.map(p=>p.cid)).size;
    }
    labeled = relabelConsecutive(labeled);
    return labeled;
  }

  // ====== Helper charts ======
  function renderKDistance(points, k){
    const arr = kDistance(points, k);
    return Plot.plot({
      width: 520, height: 220, marginLeft: 50, marginBottom: 40,
      x: {label: "points sorted (desc)"},
      y: {label: k + "-NN distance"},
      marks: [ Plot.line(arr, {x:"i", y:"v"}), Plot.dot(arr, {x:"i", y:"v", r:1.5}) ]
    });
  }

  function computeElbowAndSil(points, rng, maxIter=50){
    const ks = Array.from({length: 11}, (_,i)=> i+2); // 2..12
    const elbow = [];
    const sil   = [];
    for(const kk of ks){
      const {labeled, centers, inertia: I} = kmeans(points, kk, maxIter, rng);
      elbow.push({k: kk, inertia: I});
      const SAMP = Math.min(400, labeled.length);
      let step = Math.max(1, Math.floor(labeled.length / SAMP));
      let sumS = 0, cnt=0;
      for(let i=0;i<labeled.length; i+=step){
        const p = labeled[i], ci = p.cid;
        let a=0, aN=0, b=Infinity;
        for(let j=0;j<labeled.length;j++){
          if(i===j) continue;
          const d = dist(p, labeled[j]);
          if(labeled[j].cid === ci){ a += d; aN++; }
        }
        if(aN>0) a /= aN; else a = 0;
        const kmax = Math.max(...labeled.map(d=>d.cid));
        for(let cj=0; cj<=kmax; cj++){
          if(cj===ci) continue;
          let sum=0, n=0;
          for(let j=0;j<labeled.length;j++){
            if(labeled[j].cid===cj){ sum += dist(p, labeled[j]); n++; }
          }
          if(n>0){ const avg=sum/n; if(avg < b) b=avg; }
        }
        if(!isFinite(b)) b = a;
        const s = (b - a) / Math.max(a, b, 1e-12);
        sumS += s; cnt++;
      }
      sil.push({k: kk, s: (cnt ? sumS/cnt : 0)});
    }
    return {elbow, sil};
  }

  // ✅ These were missing:
  function renderElbow(elbow){
    return Plot.plot({
      width: 520, height: 220, marginLeft: 56, marginBottom: 40,
      x: {label: "k"}, y: {label: "inertia (lower is better)"},
      marks: [ Plot.line(elbow, {x:"k", y:"inertia"}), Plot.dot(elbow, {x:"k", y:"inertia"}) ]
    });
  }
  function renderSil(sil){
    return Plot.plot({
      width: 520, height: 220, marginLeft: 56, marginBottom: 40,
      x: {label: "k"}, y: {label: "silhouette (−1..1; higher is better)"},
      marks: [ Plot.line(sil, {x:"k", y:"s"}), Plot.dot(sil, {x:"k", y:"s"}) ]
    });
  }

  // ====== DRAW ======
  let resampleNonce = 0; // ensures Resample always changes data even with same Seed

  function draw(){
    const rng = mulberry32((seedS.value|0) ^ (resampleNonce*0x9e3779b9));
    const n   = nS.value|0;
    const shape = shapeS.value;
    const noiseRatio = noiseS.value;

    // data
    let pts;
    if (shape === "blobs")      pts = genBlobs(n, rng);
    else if (shape === "moons") pts = genMoons(n, rng);
    else                        pts = genCircles(n, rng);
    addNoise(pts, rng, noiseRatio);

    const algo = algoS.value;
    let labeled;
    let info = "";
    let kmeansCenters = null;

    if (algo === "DBSCAN"){
      labeled = dbscan(pts, epsS.value, minPtsS.value);
      const kFound = Math.max(-1, ...labeled.map(d=>d.cid)) + 1;
      const coreCnt = labeled.filter(d=>d.core).length;
      const noiseCnt= labeled.filter(d=>d.cid===-1).length;
      info = `Clusters: <b>${kFound}</b> | Core: ${coreCnt} | Noise: ${noiseCnt}
              <span style="color:#666"> (ε=${epsS.value.toFixed(3)}, minPts=${minPtsS.value}, n=${labeled.length}, ${shape})</span>`;
    } else if (algo === "K-means"){
      const out = kmeans(pts, kS.value|0, 50, rng); // fixed 50 iterations
      labeled = out.labeled;
      kmeansCenters = out.centers;
      const kFound = Math.max(-1, ...labeled.map(d=>d.cid)) + 1;
      info = `Clusters (K-means++, 50 iters): <b>${kFound}</b> | Inertia: ${out.inertia.toFixed(2)}
              <span style="color:#666"> (k=${kS.value}, n=${labeled.length}, ${shape})</span>`;
    } else {
      const linkage = linkageS.value;
      const K = hkS.value|0;

      const ptsCopy = pts; // use full set or subset inside agglomerative section
      const maxHC = 600;
      if (ptsCopy.length <= maxHC) {
        const {merges, nLeaves} = agglomerative(ptsCopy, linkage);
        const edges = merges.map(m => ({u:m.left, v:m.right, h:m.height})).sort((a,b)=> b.h - a.h);
        const cuts = new Set(); for(let i=0;i<Math.min(K-1, edges.length); i++) cuts.add(i);

        const parentUF = Array.from({length:2*nLeaves},(_,i)=> i);
        function find(x){ while(parentUF[x]!==x){ parentUF[x]=parentUF[parentUF[x]]; x=parentUF[x]; } return x; }
        function unite(a,b){ a=find(a); b=find(b); if(a!==b) parentUF[b]=a; }
        for(let idx=edges.length-1; idx>=0; idx--){ if(cuts.has(idx)) continue; const e = edges[idx]; unite(e.u, e.v); }
        const root2id = new Map(); let next=0;
        const labels = new Array(nLeaves);
        for(let i=0;i<nLeaves;i++){ const r = find(i); if(!root2id.has(r)) root2id.set(r, next++); labels[i] = root2id.get(r); }
        labeled = ptsCopy.map((p,i)=> ({...p, cid: labels[i], core:false, noise:false}));
        labeled = enforceExactK(labeled, K);
      } else {
        const rngLoc = mulberry32(((seedS.value|0) ^ (resampleNonce*0x9e3779b9)) + 0xabc123);
        const idxs = Array.from({length:ptsCopy.length}, (_,i)=>i);
        for(let i=idxs.length-1;i>0;i--){ const j=(rngLoc()*(i+1))|0; [idxs[i],idxs[j]]=[idxs[j],idxs[i]]; }
        const take = idxs.slice(0, maxHC);
        const sub  = take.map(i => ptsCopy[i]);

        const {merges, nLeaves} = agglomerative(sub, linkage);
        const edges = merges.map(m => ({u:m.left, v:m.right, h:m.height})).sort((a,b)=> b.h - a.h);
        const cuts = new Set(); for(let i=0;i<Math.min(K-1, edges.length); i++) cuts.add(i);
        const parentUF = Array.from({length:2*nLeaves},(_,i)=> i);
        function find(x){ while(parentUF[x]!==x){ parentUF[x]=parentUF[parentUF[x]]; x=parentUF[x]; } return x; }
        function unite(a,b){ a=find(a); b=find(b); if(a!==b) parentUF[b]=a; }
        for(let idx=edges.length-1; idx>=0; idx--){ if(cuts.has(idx)) continue; const e = edges[idx]; unite(e.u, e.v); }
        const root2id = new Map(); let next=0;
        const labelsSub = new Array(nLeaves);
        for(let i=0;i<nLeaves;i++){ const r = find(i); if(!root2id.has(r)) root2id.set(r, next++); labelsSub[i] = root2id.get(r); }
        let labeledSub = sub.map((p,i)=> ({...p, cid: labelsSub[i], core:false, noise:false}));
        labeledSub = relabelConsecutive(labeledSub);

        let cents = centroidsFromLabels(labeledSub, K);
        for(let i=0;i<cents.length;i++){
          if(!isFinite(cents[i].x) || !isFinite(cents[i].y)){
            const r = (rngLoc()*sub.length)|0;
            cents[i] = {x: sub[r].x, y: sub[r].y, c:1};
          }
        }

        labeled = ptsCopy.map(p=>{
          let best=-1, bd=Infinity;
          for(let cId=0;cId<K;cId++){
            const c = cents[cId];
            const d = (p.x-c.x)*(p.x-c.x) + (p.y-c.y)*(p.y-c.y);
            if(d<bd){ bd=d; best=cId; }
          }
          return {...p, cid: best, core:false, noise:false};
        });

        labeled = enforceExactK(labeled, K);
      }

      const kFound = new Set(labeled.map(d=>d.cid)).size;
      info = `Clusters (Hierarchical, ${linkage}): <b>${kFound}</b>
              <span style="color:#666"> (k=${hkS.value}, n=${labeled.length}, ${shape})</span>`;
    }

    // ====== Render main scatter ======
    plotDiv.innerHTML = "";
    const marks = [];
    if (algo === "DBSCAN"){
      marks.push(Plot.dot(labeled.filter(d=>d.cid===-1), {x:"x", y:"y", r:2.8, fill:"#bbb", stroke:"white", title: d=> "noise"}));
      marks.push(Plot.dot(labeled.filter(d=>d.cid!==-1 && !d.core), {x:"x", y:"y", r:3.3, fill: d=> colorFor(d.cid), stroke:"white", title: d=>`cluster ${d.cid} (border)`}));
      marks.push(Plot.dot(labeled.filter(d=>d.core && d.cid!==-1), {x:"x", y:"y", r:4.8, fill: d=> colorFor(d.cid), stroke:"black", title: d=>`cluster ${d.cid} (core)`}));
      marks.push(Plot.text([{x:0.06,y:0.94,label:"● core, ● border, ○ noise"}], {x:"x",y:"y",text:"label",dy:-8,fill:"#444"}));
    } else {
      marks.push(Plot.dot(labeled, {x:"x", y:"y", r:3.6, fill: d=> colorFor(d.cid), stroke:"white", title: d=>`cluster ${d.cid}`}));
      if (algo === "K-means" && Array.isArray(kmeansCenters)){
        marks.push(Plot.dot(kmeansCenters, {x:"x", y:"y", r:7.5, fill:"white", stroke:"black", strokeWidth:2}));
        marks.push(Plot.text(kmeansCenters.map(c => ({x:c.x, y:c.y, label:"×"})), {x:"x", y:"y", text:"label", dy:3, fill:"black"}));
      }
    }
    plotDiv.append(Plot.plot({
      width: 1060, height: 560, marginLeft: 56, marginBottom: 44,
      x: {domain: [0,1], grid: true, label: "x₁"},
      y: {domain: [0,1], grid: true, label: "x₂"},
      marks
    }));

    // ====== Helper dashboard ======
    dash.innerHTML = "";
    if (algo === "DBSCAN" && showKDist.value){
      dash.append(renderKDistance(pts, minPtsS.value|0));
      dash.append(html`<div style="align-self:center;color:#666;">Tip: Set ε near the “elbow point” of the graph.</div>`);
    } else if (algo === "K-means" && showElbow.value){
      const {elbow, sil} = computeElbowAndSil(pts, mulberry32((seedS.value|0) ^ (resampleNonce*0x9e3779b9) ^ 0x517cc1), 50);
      dash.append(renderElbow(elbow));
      dash.append(renderSil(sil));
    }

    // ====== Note & soften irrelevant controls ======
    note.innerHTML = info + (resampleNonce>0 ? ` <span style="color:#0a7">• resampled×${resampleNonce}</span>` : "");

    const isDB = (algo==="DBSCAN"), isKM=(algo==="K-means"), isHC=(algo==="Hierarchical");
    epsS.style.opacity      = isDB ? 1 : 0.35;
    minPtsS.style.opacity   = isDB ? 1 : 0.35;
    showKDist.style.opacity = isDB ? 1 : 0.2;

    kS.style.opacity        = isKM ? 1 : 0.35;

    hkS.style.opacity       = isHC ? 1 : 0.35;
    linkageS.style.opacity  = isHC ? 1 : 0.35;
  }

  // events
  [algoS,seedS,nS,shapeS,noiseS,kS,epsS,minPtsS,showKDist,hkS,linkageS,showElbow]
    .forEach(el => el.addEventListener("input", draw));

  // Resample always changes RNG stream even if Seed unchanged
  btnRes.addEventListener("click", () => { resampleNonce++; draw(); });

  // initial
  draw();
  return box;
})()

K-Means Clustering

Clustering with \(k\)-means (K-Means Clustering)

viewof xvar = Inputs.radio(["Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"], {label: "X-axis", value: "Sepal.Length"});
viewof yvar = Inputs.radio(["Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"], {label: "Y-axis", value: "Sepal.Width"});
viewof k = Inputs.range([1, 6], { value: 3, step: 1, label: "Number of Cluster (k)" })
viewof clicks3 = Inputs.button("Click to Show the clusters")

\(k\)-means clustering is one of the most popular techniques for
Clustering in Unsupervised Learning.

It works by dividing data into \(k\) groups based on similarity,
with each group represented by its own centroid (the central point).

How \(k\)-means Works

flowchart TD
    A[Start]
    B[Set number of clusters k]
    C[Randomly initialize 
    k centroids]
    D[Compute distances to 
    centroids]
    E[Assign each point to the 
    nearest centroid]
    F[Update centroids as mean 
    of assigned points]
    G{Centroids changed 
    significantly or max 
    iterations reached?}
    H[Stop]
    I[Output cluster labels 
    and  final centroids]

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G -- No --> H
    G -- Yes --> D
    H --> I

Set the number of clusters \(k\):
The user specifies the number of clusters \(k\) before starting.
Randomly initialize centroids:
Randomly select \(k\) centroids from the dataset.
Compute distances and assign groups:
Calculate the distance of each data point to the centroids (commonly using Euclidean Distance)
and assign each point to the cluster with the nearest centroid.
Update centroid positions:
Recalculate each centroid by taking the mean of all points in its cluster.
Repeat steps 3 and 4:
Continue until centroids stop changing significantly, or the algorithm converges.

Distance Metrics in \(k\)-means Clustering

In the process of \(k\)-means clustering, calculating the distance between data points and centroids is crucial, as it determines which cluster each point belongs to.

By default, \(k\)-means uses Euclidean Distance.
However, other distance metrics can also be applied depending on the nature of the data.

1. Euclidean Distance

\[ d(x, y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2} \]

✅ Easy to use and understand
✅ Suitable for numerical quantitative data

❌ Sensitive to outliers
❌ Not suitable when features are on different scales (apply Standardization first)

2. Manhattan Distance

\[ d(x, y) = \sum_{i=1}^{n} |x_i - y_i| \]

✅ Suitable for data distributed in a grid-like structure (e.g., pixel images)
✅ Less sensitive to outliers compared to Euclidean

❌ Not ideal for data with strong linear relationships or directional patterns

3. Cosine Similarity

\[ \cos(\theta) = \frac{x \cdot y}{||x|| \cdot ||y||} = \frac{\sum_{i=1}^{n} x_i y_i}{\sqrt{\sum_{i=1}^{n} x_i^2} \cdot \sqrt{\sum_{i=1}^{n} y_i^2}} \]

In general, Cosine Similarity measures the angle-based similarity between vectors rather than the distance.

✅ Suitable for vector-based data such as documents, text, and embeddings ✅ Not affected by vector magnitude (e.g., documents of different lengths)

❌ Not suitable when magnitude itself is important for grouping

Distance Calculation with Different Methods

Example data:

	x	y
a	1	2
b	3	4
c	4	3

Euclidean Distance

edges = ["a→b", "a→c", "b→c"]
viewof connect_value = Inputs.checkbox(edges, {value: []})
// cell สำหรับค่าใช้งานจริง
connect = connect_value.length === 0 ? ["NA"] : connect_value

\[ \begin{aligned} Dis_{ab}&=\sqrt{2^2+2^2}=\sqrt{8}=2.828\\ Dis_{ac}&=\sqrt{3^2+1^2}=\sqrt{10}=3.162\\ Dis_{bc}&=\sqrt{1^2+1^2}=\sqrt{2}=1.414 \end{aligned} \]

Manhattan Distance

edges2 = ["a→b", "a→c", "b→c"]
viewof connect2_value = Inputs.checkbox(edges2, {value: []})
// cell สำหรับค่าใช้งานจริง
connect2 = connect2_value.length === 0 ? ["NA"] : connect2_value

\[ \begin{aligned} Dis_{ab}&= |2|+|2|= 4\\ Dis_{ac}&= |3|+|1|= 4\\ Dis_{bc}&= |1|+|1| = 2 \end{aligned} \]

Interactive: Distance

(async () => {
  // =============== Utilities ===============
  const TAU = 2 * Math.PI;

  // "Design size" ของภาพวาด (หน่วย CSS px)
  const W = 680, H = 460;

  // world coords -> (unscaled) canvas coords
  const xlim = [-5, 5], ylim = [-3.5, 3.5];

  // ==== สร้าง DOM ====
  const box = html`<div style="max-width:980px; font:14px system-ui; margin:0 auto;">
    <div id="ctrl" style="display:flex; flex-wrap:wrap; gap:16px; align-items:flex-end; justify-content:center; margin-bottom:10px;"></div>
    <div style="display:flex; justify-content:center;">
      <canvas id="cv" style="border:1px solid #ccc; border-radius:8px; touch-action:none;"></canvas>
    </div>
    <div id="note" style="margin-top:10px; color:#222; line-height:1.35; text-align:center;"></div>
  </div>`;
  const ctrl = box.querySelector("#ctrl");
  const cv   = box.querySelector("#cv");
  const ctx  = cv.getContext("2d");
  const note = box.querySelector("#note");

  // =============== Controls (no grid step) ===============
  const showL2    = Inputs.toggle({label:"Show L2", value:true});
  const showL1    = Inputs.toggle({label:"Show L1", value:true});
  const showLinf  = Inputs.toggle({label:"Show L∞", value:true});
  const showCos   = Inputs.toggle({label:"Show Cosine (CCW only)", value:true});
  ctrl.append(showL2, showL1, showLinf, showCos);

  // =============== DPR & Resize Handling ===============
  function setupCanvasSize() {
    // ขนาด CSS (อาจถูก responsive ปรับได้—เรากำหนดค่าตั้งต้น)
    cv.style.width  = `${W}px`;
    cv.style.height = `${H}px`;

    // ปรับ pixel density ให้คมชัดบน HiDPI
    const dpr = Math.max(1, (window.devicePixelRatio || 1));
    cv.width  = Math.round(W * dpr);
    cv.height = Math.round(H * dpr);

    // scale context ให้วาดด้วยหน่วย "CSS px"
    ctx.setTransform(1,0,0,1,0,0); // reset
    ctx.scale(dpr, dpr);
  }

  setupCanvasSize();
  // ถ้าอยากรองรับการสลับ DPR ตอน zoom OS/Browser:
  window.matchMedia(`(resolution: ${window.devicePixelRatio}dppx)`).addEventListener?.("change", setupCanvasSize);

  // world <-> screen helpers (อิงขนาด "design" W,H)
  const sx0 = W / (xlim[1] - xlim[0]);
  const sy0 = H / (ylim[1] - ylim[0]);
  const toScreenX = x => (x - xlim[0]) * sx0;
  const toScreenY = y => H - (y - ylim[0]) * sy0;
  const toScreen  = p => ({X: toScreenX(p.x), Y: toScreenY(p.y)});

  // พิกัด pointer ต้องชดเชยเมื่อ canvas ถูกย่อ/ขยายด้วย CSS
  function pointerToWorld(evt){
    const r = cv.getBoundingClientRect();
    // สัดส่วน: จากขนาดที่ "เห็นจริง" (rect) -> ระบบพิกัดวาด (W,H)
    const scaleX = W / r.width;
    const scaleY = H / r.height;
    // แปลงพิกัด pointer (ที่เป็น CSS px) -> พิกัดในระบบวาด (W,H)
    const xCanvas = (evt.clientX - r.left) * scaleX;
    const yCanvas = (evt.clientY - r.top)  * scaleY;
    // จาก canvas coords -> world
    const x = xlim[0] + xCanvas / sx0;
    const y = ylim[0] + (H - yCanvas) / sy0;
    return {x, y};
  }

  const dot   = (a,b)=> a.x*b.x + a.y*b.y;
  const norm  = a => Math.hypot(a.x, a.y);
  const clamp = (x,a,b)=> Math.max(a, Math.min(b,x));
  const near  = (a,b,eps=1e-9)=> Math.abs(a-b) <= eps;
  const fmt   = v => Number.isFinite(v) ? v.toFixed(4) : "—";
  const fmtS  = v => Number.isFinite(v) ? v.toFixed(3) : "—";

  // =============== State ===============
  let A = {id:"A", x: -2, y: -1};
  let B = {id:"B", x:  2, y:  1.8};
  let dragging = null;

  // =============== Drawing Helpers ===============
  function drawGrid(){
    const g = 1; // fixed grid
    ctx.save();
    ctx.clearRect(0,0,W,H);

    // bg
    ctx.fillStyle = "#fafafa";
    ctx.fillRect(0,0,W,H);

    // grid
    ctx.lineWidth = 1;
    ctx.strokeStyle = "#eee";
    for(let x = Math.ceil(xlim[0]/g)*g; x <= xlim[1]; x += g){
      const X = toScreenX(x);
      ctx.beginPath(); ctx.moveTo(X, 0); ctx.lineTo(X, H); ctx.stroke();
    }
    for(let y = Math.ceil(ylim[0]/g)*g; y <= ylim[1]; y += g){
      const Y = toScreenY(y);
      ctx.beginPath(); ctx.moveTo(0, Y); ctx.lineTo(W, Y); ctx.stroke();
    }

    // axes
    ctx.strokeStyle = "#999";
    ctx.lineWidth   = 1.5;
    const Y0 = toScreenY(0);
    ctx.beginPath(); ctx.moveTo(0, Y0); ctx.lineTo(W, Y0); ctx.stroke();
    const X0 = toScreenX(0);
    ctx.beginPath(); ctx.moveTo(X0, 0); ctx.lineTo(X0, H); ctx.stroke();

    // ticks & labels
    ctx.fillStyle = "#555";
    ctx.font = "12px system-ui";
    for(let x = Math.ceil(xlim[0]); x <= xlim[1]; x++){
      const X = toScreenX(x);
      ctx.beginPath(); ctx.moveTo(X, Y0-4); ctx.lineTo(X, Y0+4);
      ctx.strokeStyle="#bbb"; ctx.stroke();
      if(!near(x,0)) ctx.fillText(`${x}`, X-4, Y0+14);
    }
    for(let y = Math.ceil(ylim[0]); y <= ylim[1]; y++){
      const Y = toScreenY(y);
      ctx.beginPath(); ctx.moveTo(X0-4, Y); ctx.lineTo(X0+4, Y);
      ctx.strokeStyle="#bbb"; ctx.stroke();
      if(!near(y,0)) ctx.fillText(`${y}`, X0+6, Y+4);
    }
    ctx.restore();
  }

  function drawPoint(P, color="steelblue"){
    const {X,Y} = toScreen(P);
    ctx.beginPath();
    ctx.arc(X,Y,6,0,TAU);
    ctx.fillStyle = color;
    ctx.fill();
    ctx.lineWidth = 2;
    ctx.strokeStyle = "#fff";
    ctx.stroke();

    ctx.fillStyle = "#222";
    ctx.font = "bold 13px system-ui";
    ctx.fillText(P.id, X+10, Y-10);
  }

  function drawSegment(P, Q, style="#666", width=2, dash=null){
    const p = toScreen(P), q = toScreen(Q);
    ctx.save();
    ctx.beginPath();
    if(dash) ctx.setLineDash(dash);
    ctx.moveTo(p.X, p.Y);
    ctx.lineTo(q.X, q.Y);
    ctx.lineWidth = width;
    ctx.strokeStyle = style;
    ctx.stroke();
    ctx.restore();
  }

  // --- CCW minor-arc drawer (always CCW "right→left" ในความหมายเชิงคณิต) ---
  function drawArcCCWMinor(startAngleMath, endAngleMath){
    // CCW delta in [0, 2π)
    let delta = (endAngleMath - startAngleMath + TAU) % TAU;

    // ถ้า delta > π ให้สลับต้น/ปลาย เพื่อให้ได้ "minor arc" ที่ยังเป็น CCW
    let s = startAngleMath, e = endAngleMath;
    if (delta > Math.PI){
      s = endAngleMath;
      e = startAngleMath;
      delta = (e - s + TAU) % TAU; // <= π
    }

    const R = 60;
    const OX = toScreenX(0), OY = toScreenY(0);

    ctx.save();
    ctx.beginPath();
    // แปลงมุมคณิต → canvas (แกน Y ลง): ใช้ค่าลบ และวาด anticlockwise=false
    ctx.arc(OX, OY, R, -s, -e, false);
    ctx.lineWidth = 2;
    ctx.strokeStyle = "#9c27b0";
    ctx.stroke();
    ctx.restore();

    return {thetaMinor: delta, midAngle: s + delta/2};
  }

  // label helper (rounded box)
  function drawLabel(text, X, Y){
    ctx.save();
    ctx.font = "12px system-ui";
    const padX = 6, padY = 4;
    const tw = ctx.measureText(text).width;
    const th = 14;
    const rx = 6;
    const x = X - (tw/2) - padX;
    const y = Y - (th/2) - padY;
    const w = tw + padX*2, h = th + padY*2;

    // rounded rect
    ctx.beginPath();
    ctx.moveTo(x+rx, y);
    ctx.arcTo(x+w, y,   x+w, y+h, rx);
    ctx.arcTo(x+w, y+h, x,   y+h, rx);
    ctx.arcTo(x,   y+h, x,   y,   rx);
    ctx.arcTo(x,   y,   x+w, y,   rx);
    ctx.closePath();
    ctx.fillStyle = "rgba(255,255,255,0.92)";
    ctx.fill();
    ctx.strokeStyle = "rgba(0,0,0,0.15)";
    ctx.stroke();

    ctx.fillStyle = "#111";
    ctx.fillText(text, X - tw/2, Y + 4);
    ctx.restore();
  }

  // =============== Metric Drawers (with labels) ===============
  function drawL2(A,B,dx,dy){
    const L2 = Math.hypot(dx,dy);
    drawSegment(A, B, "#1976d2", 3);
    const Ax_By = {x:B.x, y:A.y};
    drawSegment(A, Ax_By, "#1976d2", 1.5, [6,6]);
    drawSegment(Ax_By, B, "#1976d2", 1.5, [6,6]);
    const mid = toScreen({x:(A.x+B.x)/2, y:(A.y+B.y)/2});
    drawLabel(`L2 = ${fmt(L2)}`, mid.X, mid.Y - 12);
  }

  function drawL1(A,B,dx,dy){
    const L1 = Math.abs(dx) + Math.abs(dy);
    const Ax_By = {x:B.x, y:A.y};
    drawSegment(A, Ax_By, "#f57c00", 3);
    drawSegment(Ax_By, B, "#f57c00", 3);
    // legs
    const leg1mid = toScreen({x:(A.x+Ax_By.x)/2, y:(A.y+Ax_By.y)/2});
    const leg2mid = toScreen({x:(Ax_By.x+B.x)/2, y:(Ax_By.y+B.y)/2});
    drawLabel(`|dx|=${fmtS(Math.abs(dx))}`, leg1mid.X, leg1mid.Y - 14);
    drawLabel(`|dy|=${fmtS(Math.abs(dy))}`, leg2mid.X, leg2mid.Y - 14);
    const mid = toScreen({x:(A.x+B.x)/2, y:(A.y+B.y)/2});
    drawLabel(`L1 = ${fmt(L1)}`, mid.X, mid.Y + 16);
  }

  function drawLinf(A,B,dx,dy){
    const Linf = Math.max(Math.abs(dx), Math.abs(dy));
    const xmin = Math.min(A.x, B.x), xmax = Math.max(A.x, B.x);
    const ymin = Math.min(A.y, B.y), ymax = Math.max(A.y, B.y);
    // rectangle
    ctx.save();
    ctx.beginPath();
    [{x:xmin,y:ymin},{x:xmax,y:ymin},{x:xmax,y:ymax},{x:xmin,y:ymax},{x:xmin,y:ymin}]
      .forEach((p,i)=>{ const {X,Y}=toScreen(p); if(i===0) ctx.moveTo(X,Y); else ctx.lineTo(X,Y); });
    ctx.lineWidth = 2; ctx.strokeStyle = "#2e7d32"; ctx.stroke(); ctx.restore();

    if(Math.abs(dx) >= Math.abs(dy)){
      drawSegment({x:xmin,y:ymin},{x:xmax,y:ymin},"#2e7d32",4);
      drawSegment({x:xmin,y:ymax},{x:xmax,y:ymax},"#2e7d32",4);
      const P = toScreen({x:xmin,y:ymin}), Q = toScreen({x:xmax,y:ymin});
      drawLabel(`L∞ = ${fmt(Linf)}`, (P.X+Q.X)/2, (P.Y+Q.Y)/2 - 12);
    } else {
      drawSegment({x:xmin,y:ymin},{x:xmin,y:ymax},"#2e7d32",4);
      drawSegment({x:xmax,y:ymin},{x:xmax,y:ymax},"#2e7d32",4);
      const P = toScreen({x:xmin,y:ymin}), Q = toScreen({x:xmin,y:ymax});
      drawLabel(`L∞ = ${fmt(Linf)}`, (P.X+Q.X)/2 - 12, (P.Y+Q.Y)/2);
    }
  }

  function drawCos(A,B){
    const O = {x:0, y:0};
    drawSegment(O, A, "#6a1b9a", 3);
    drawSegment(O, B, "#6a1b9a", 3);

    const aN = norm(A), bN = norm(B);
    if(!(aN>0 && bN>0)){
      const OX = toScreenX(0), OY = toScreenY(0);
      drawLabel(`cosine: ‖A‖= ${fmtS(aN)}, ‖B‖= ${fmtS(bN)} ⇒ undefined`, OX+90, OY-40);
      return;
    }
    const cosT   = clamp(dot(A,B)/(aN*bN), -1, 1);
    const cosDst = 1 - cosT;

    // คำนวณมุมคณิต
    const angA = Math.atan2(A.y, A.x);
    const angB = Math.atan2(B.y, B.x);

    // วาด "minor CCW arc" เสมอ
    const {thetaMinor, midAngle} = drawArcCCWMinor(angA, angB);

    // ป้ายกำกับ
    const theta = Math.min(thetaMinor, Math.PI);
    const deg = (theta * 180 / Math.PI).toFixed(1);
    const R = 70;
    const X = toScreenX(R * Math.cos(midAngle));
    const Y = toScreenY(R * Math.sin(midAngle));
    drawLabel(`θ≈${deg}°, 1−cosθ≈${fmt(cosDst)}`, X, Y);
  }

  // =============== Compose & Numbers ===============
  function render(){
    drawGrid();
    drawPoint(A, "#1565c0");
    drawPoint(B, "#ef6c00");

    const dx = B.x - A.x, dy = B.y - A.y;

    if(showL2.value)   drawL2(A,B,dx,dy);
    if(showL1.value)   drawL1(A,B,dx,dy);
    if(showLinf.value) drawLinf(A,B,dx,dy);
    if(showCos.value)  drawCos(A,B);

    // summary
    const L2   = Math.hypot(dx, dy);
    const L1   = Math.abs(dx) + Math.abs(dy);
    const Linf = Math.max(Math.abs(dx), Math.abs(dy));
    const aN = norm(A), bN = norm(B);
    const cosSim = (aN>0 && bN>0) ? clamp(dot(A,B)/(aN*bN), -1, 1) : NaN;
    const cosDist = Number.isFinite(cosSim) ? (1 - cosSim) : NaN;
    note.innerHTML = `
      <b>A</b> = (${fmt(A.x)}, ${fmt(A.y)}), <b>B</b> = (${fmt(B.x)}, ${fmt(B.y)}) · 
      Δ = (${fmt(dx)}, ${fmt(dy)}) · 
      <b>L2</b> ${fmt(L2)} · <b>L1</b> ${fmt(L1)} · <b>L∞</b> ${fmt(Linf)} · <b>Cos</b> ${fmt(cosDist)}
    `;
  }

  // =============== Pointer Dragging (robust to scaling) ===============
  function pickPoint(world){
    const hitR = 0.25;
    const hit = P => Math.hypot(P.x - world.x, P.y - world.y) < hitR;
    if(hit(A)) return "A";
    if(hit(B)) return "B";
    return null;
  }

  function clampWorld(p){
    return {
      x: Math.max(xlim[0], Math.min(xlim[1], p.x)),
      y: Math.max(ylim[0], Math.min(ylim[1], p.y)),
    };
  }

  function onPointerDown(e){
    e.preventDefault();
    const w = pointerToWorld(e);
    dragging = pickPoint(w);
    if(dragging){
      cv.setPointerCapture?.(e.pointerId);
    }
  }
  function onPointerMove(e){
    if(!dragging) return;
    e.preventDefault();
    const w = clampWorld(pointerToWorld(e));
    if(dragging === "A"){ A = {...A, ...w}; }
    else if(dragging === "B"){ B = {...B, ...w}; }
    render();
  }
  function onPointerUp(e){
    if(dragging){
      cv.releasePointerCapture?.(e.pointerId);
    }
    dragging = null;
  }

  cv.addEventListener("pointerdown", onPointerDown, {passive:false});
  cv.addEventListener("pointermove", onPointerMove, {passive:false});
  cv.addEventListener("pointerup",   onPointerUp,   {passive:false});
  cv.addEventListener("pointercancel", onPointerUp, {passive:false});

  [showL2, showL1, showLinf, showCos].forEach(el => el.addEventListener("input", render));

  // initial draw
  render();
  return box;
})()

Example of \(k\)-means

1. Sample Data

We have a dataset of 5 points in 2 dimensions (X, Y) and want to split them into \(k=2\) clusters.

Data Point	X	Y
A	2	10
B	2	5
C	8	4
D	5	8
E	7	5

2. Steps of \(k\)-means Calculation

(1) Define \(k\) and randomly select initial centroids

Set the number of clusters: \(k = 2\)
Randomly choose initial centroids (suppose we pick A and C):
- Centroid 1 (C1) = A = (2, 10)
- Centroid 2 (C2) = C = (8, 4)

(2) Calculate the distance from each point to the centroids

We use Euclidean Distance:

\[ d(x, y) = \sqrt{(X_2 - X_1)^2 + (Y_2 - Y_1)^2} \]

Data Point	\(d(A, C1)\)	\(d(A, C2)\)	Nearest Cluster
A (2,10)	0.00	7.21	C1
B (2,5)	5.00	6.00	C1
C (8,4)	7.21	0.00	C2
D (5,8)	3.61	4.24	C1
E (7,5)	7.81	1.00	C2

First-round result:

C1 cluster: {A, B, D}, C2 cluster: {C, E}

(3) Compute new centroids for each cluster

Centroid 1 (C1) {A, B, D}

\[\begin{aligned} C1_x &= \frac{2 + 2 + 5}{3} = \frac{9}{3} = 3.00\\ C1_y &= \frac{10 + 5 + 8}{3} = \frac{23}{3} = 7.67\end{aligned}\]

Centroid 2 (C2) {C, E}

\[\begin{aligned} C2_x &= \frac{8 + 7}{2} = \frac{15}{2} = 7.50\\ C2_y &= \frac{4 + 5}{2} = \frac{9}{2} = 4.50\end{aligned}\]

(4) Repeat steps 2 and 3

Recompute distances and reassign points to clusters.
Update the centroids, and continue iterating until the centroids no longer change (convergence).

3. Compute \(k\)-means

Animation of K-means

triIcon = (color, size = 16) => {
  const ns = "http://www.w3.org/2000/svg";
  const svg = document.createElementNS(ns, "svg");
  svg.setAttribute("width", size);
  svg.setAttribute("height", size);
  svg.setAttribute("viewBox", "0 0 20 20");
  svg.style.verticalAlign = "-2px";  // จัดให้อยู่ระดับเดียวกับตัวอักษร

  const poly = document.createElementNS(ns, "polygon");
  poly.setAttribute("points", "10,2 18,18 2,18");  // ▲
  poly.setAttribute("fill", color);
  poly.setAttribute("stroke", "black");
  poly.setAttribute("stroke-width", "1.5");

  svg.appendChild(poly);
  return svg;
};

// ต่อไอคอน + ข้อความเป็น label เดียว
triLabel = (color, text) => {
  const span = document.createElement("span");
  span.style.display = "inline-flex";
  span.style.alignItems = "center";
  span.style.gap = ".4rem";
  span.append(triIcon(color, 16), document.createTextNode(text));
  return span;
};

// สีประจำคลัสเตอร์ (ปรับได้)
C1_COLOR = "#f4b400";  // เหลืองทอง
C2_COLOR = "#1a73e8";  // น้ำเงิน
C3_COLOR = "#ea4335";

Set up

viewof km_c1x = Inputs.range([-2, 9], {step: 1, value:  2, label: triLabel(C2_COLOR, "C1: x")})
viewof km_c1y = Inputs.range([-2, 7], {step: 1, value:  1, label: triLabel(C2_COLOR, "C1: y")})

viewof km_c2x = Inputs.range([-2, 9], {step: 1, value:  1, label: triLabel(C1_COLOR, "C2: x")})
viewof km_c2y = Inputs.range([-2, 7], {step: 1, value:  4, label: triLabel(C1_COLOR, "C2: y")})

viewof km_c3x = Inputs.range([-2, 9], {step: 1, value:  6, label: triLabel(C3_COLOR, "C3: x")})
viewof km_c3y = Inputs.range([-2, 7], {step: 1, value:  2, label: triLabel(C3_COLOR, "C3: y")})



viewof km_clicks = Inputs.button("Randomize Data")
viewof km_reset  = Inputs.button("Reset (frame → 0)")
viewof km_point_r = Inputs.range([4, 14], { step: 1, value: 8, label: "Point size (data)" })

xmur3 = (str) => { let h=1779033703^str.length;
  for (let i=0;i<str.length;i++) h=Math.imul(h^str.charCodeAt(i),3432918353), h=h<<13|h>>>19;
  return ()=>{ h=Math.imul(h^h>>>16,2246822507); h=Math.imul(h^h>>>13,3266489909); return (h^h>>>16)>>>0; };
}
mulberry32 = (a) => () => { let t=a+=0x6D2B79F5; t=Math.imul(t^t>>>15,t|1); t^=t+Math.imul(t^t>>>7,t|61); return ((t^t>>>14)>>>0)/4294967296; }

// ===== Data (3 กลุ่ม) =====
km_makeData = (n=60, seed=1) => {
  const rand = mulberry32(xmur3(String(seed))());
  const jitter = (s)=> (rand()*2-1)*s;
  const blob = (cx,cy,s,n) => Array.from({length:n},()=>({x:cx+jitter(s), y:cy+jitter(s)}));
  const sigma = 1.25;
  return [
    ...blob(0,   0,   sigma, n),
    ...blob(3.5, 3.5, sigma, n),
    ...blob(7,   0,   sigma, n)
  ];
}
km_data = km_makeData(60, km_clicks)   // กดปุ่มแล้วสุ่มใหม่

// ===== K-means (อัปเดตเต็มแบบมาตรฐาน) =====
km_nearest = (p, centers) => {
  let best=0, bestd=Infinity;
  for (let i=0;i<centers.length;i++){
    const dx=p.x-centers[i].x, dy=p.y-centers[i].y, d=dx*dx+dy*dy;
    if (d<bestd){bestd=d; best=i;}
  }
  return best;
}
km_step = (points, centers) => {
  const k = centers.length;
  const groups = Array.from({length:k}, ()=>[]);
  for (const p of points) groups[km_nearest(p,centers)].push(p);

  const next = centers.map((c,i)=>{
    const g=groups[i];
    if (g.length===0) return {...c};
    const mx=g.reduce((s,p)=>s+p.x,0)/g.length;
    const my=g.reduce((s,p)=>s+p.y,0)/g.length;
    return {x:mx, y:my};
  });
  return {groups, centers: next};
}

// ===== History (รวมเฟรม 0 = จุดตั้งต้นตามสไลเดอร์) =====
KM_MAX_ITER    = 15
km_initCenters = [{x:km_c1x,y:km_c1y},{x:km_c2x,y:km_c2y},{x:km_c3x,y:km_c3y}]

km_history = (() => {
  let centers = km_initCenters.map(d=>({...d}));
  const hist = [];

  // frame 0: assign ด้วยศูนย์เริ่มต้น (ตรงสไลเดอร์)
  const initGroups = Array.from({length: centers.length}, ()=>[]);
  for (const p of km_data) initGroups[km_nearest(p, centers)].push(p);
  hist.push({iter: 0, groups: initGroups, centers: centers.map(d=>({...d}))});

  // frame 1..N
  for (let iter=1; iter<=KM_MAX_ITER; iter++){
    const {groups, centers: next} = km_step(km_data, centers);
    hist.push({iter, groups, centers: next});
    const conv = next.every((c,i)=> Math.hypot(c.x-centers[i].x,c.y-centers[i].y) < 1e-6);
    centers = next;
    if (conv){
      for (let extra=iter+1; extra<=KM_MAX_ITER; extra++) hist.push(hist[hist.length-1]);
      break;
    }
  }
  return hist;
})()

// ===== Frame + Progress (ไม่มี autoplay) =====
viewof km_frame = Inputs.range([0, KM_MAX_ITER], { value: 0, step: 1, label: "Iteration (0 = initial)" })

{
  const el = document.createElement("div");
  el.style.display = "flex"; el.style.alignItems = "center"; el.style.gap = ".5rem";
  const pr = document.createElement("progress"); pr.max = KM_MAX_ITER; pr.value = km_frame; pr.style.width = "220px";
  const span = document.createElement("span"); span.textContent = `${km_frame} / ${KM_MAX_ITER}`;
  el.append(pr, span);
  return el;
}

{
  // list dependency เพื่อให้ cell นี้ re-run เมื่อค่าใดค่าหนึ่งเปลี่ยน
  const _triggers = [km_reset, km_clicks, km_c1x, km_c1y, km_c2x, km_c2y, km_c3x, km_c3y];

  if (viewof km_frame) {
    // ตั้งค่าเฟรมกลับ 0
    viewof km_frame.value = 0;
    // 🔔 บังคับแจ้งเตือนระบบรีแอคทีฟ (สำคัญมาก)
    viewof km_frame.dispatchEvent(new InputEvent("input", { bubbles: true }));
  }
  null  // ไม่ให้แสดงค่า 'undefined' บนหน้าจอ
}

{
  const state0 = km_history[km_frame];
  const { groups, centers } = state0;

  const colored = [];
  groups.forEach((g, i) => g.forEach(p => colored.push({ ...p, cluster: `C${i + 1}` })));
  const cpts = centers.map((c, i) => ({ ...c, cluster: `C${i + 1}` }));

  // เส้นทาง centroid จากเฟรม 0 → เฟรมปัจจุบัน (tracks)
  const tracks = [];
  centers.forEach((_, i) => {
    for (let t = 0; t <= km_frame; t++) {
      const c = km_history[t].centers[i];
      tracks.push({ x: c.x, y: c.y, cluster: `C${i + 1}`, t });
    }
  });

  const W = width, H = Math.round(W * 0.65);

// โหลด Plot
const P = await import("https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6/+esm");

return P.plot({
  width: W, height: H,
  grid: true, nice: false,
  x: { label: "X", domain: [-2, 9] },
  y: { label: "Y", domain: [-2, 7] },
  marks: [
    // จุดข้อมูล: (ถ้าอยากใหญ่ขึ้นก็เพิ่ม r)
    P.dot(colored, { x: "x", y: "y", r: km_point_r, fill: "cluster", opacity: 0.85, tip: true }),

    // เส้นทาง centroid
    P.line(tracks, { x: "x", y: "y", stroke: "cluster" }),

    // ✅ centroid เป็น “สามเหลี่ยม” และใหญ่ขึ้น
    P.dot(cpts, {
      x: "x", y: "y",
      r: 14,                       // ขนาดใหญ่ขึ้น (ปรับได้)
      symbol: "triangle",          // เปลี่ยนเป็นสามเหลี่ยม
      stroke: "black",
      strokeWidth: 1.5,
      fill: "cluster"
      // ถ้าอยากหันปลายขึ้น-ลง: เพิ่ม rotate: 0 (ขึ้น), 180 (ลง) ได้
    })
  ]
});
}

K-mean (3D)

viewof mode3d = Inputs.radio(["Uniform","Gaussian"], {label: "Data mode (3D)", value: "Gaussian"})
viewof n_uniform3d = Inputs.range([100, 1500], {label: "Number of points (Uniform)", value: 600, step: 50})

viewof ktrue3d = Inputs.range([2, 8], {label: "True clusters (Gaussian)", value: 4, step: 1})
viewof nper3d  = Inputs.range([30, 400], {label: "Points per cluster", value: 120, step: 10})
viewof spread3d = Inputs.range([0.1, 1.5], {label: "Cluster spread (sd)", value: 0.45, step: 0.05})

viewof k3d = Inputs.range([2, 8], {label: "k-means clusters (k)", value: 4, step: 1})
viewof seed3d = Inputs.number({label: "Seed", value: 123, step: 1, min: 0})
viewof rerun3d = Inputs.button({label: "Re-run dataset (3D)"})

function mulberry32_3d(a){return function(){let t=a+=0x6D2B79F5;t=Math.imul(t^t>>>15,t|1);t^=t+Math.imul(t^t>>>7,t|61);return((t^t>>>14)>>>0)/4294967296;};}
function randn3d(rng){ const u=1-rng(), v=1-rng(); return Math.sqrt(-2*Math.log(u))*Math.cos(2*Math.PI*v); }
function L2sq3d(a,b){ return (a.x-b.x)**2 + (a.y-b.y)**2 + (a.z-b.z)**2; }

// ===== Generate 3D data =====
data3d = {
  rerun3d; // regenerate when button clicked
  const rng = mulberry32_3d(seed3d);

  if (mode3d === "Uniform") {
    const n = n_uniform3d;
    return Array.from({length:n}, () => ({
      x: 10*rng(), y: 10*rng(), z: 10*rng()
    }));
  } else { // Gaussian clusters
    const K = ktrue3d, sd = spread3d, per = nper3d;
    const centers = Array.from({length:K}, () => ({
      cx: 1.5 + 7*rng(), cy: 1.5 + 7*rng(), cz: 1.5 + 7*rng()
    }));
    const pts = [];
    for (const c of centers){
      for (let i=0;i<per;i++){
        pts.push({
          x: c.cx + sd*randn3d(rng),
          y: c.cy + sd*randn3d(rng),
          z: c.cz + sd*randn3d(rng)
        });
      }
    }
    return pts;
  }
}

function kmeans3d(points, k, maxIter=30) {
  let centroids = points.slice().sort(()=>0.5-Math.random()).slice(0,k).map(p=>({x:p.x,y:p.y,z:p.z}));
  for (let it=0; it<maxIter; it++){
    points.forEach(p=>{
      let best=0, bd=Infinity;
      for (let i=0;i<centroids.length;i++){
        const d = L2sq3d(p, centroids[i]);
        if (d<bd){ bd=d; best=i; }
      }
      p.cid3d = best;
    });
    centroids = Array.from({length:k}, (_,i)=>{
      const S = points.filter(p=>p.cid3d===i);
      const mx = d3.mean(S,d=>d.x), my = d3.mean(S,d=>d.y), mz = d3.mean(S,d=>d.z);
      return {
        x: Number.isFinite(mx)?mx:Math.random()*10,
        y: Number.isFinite(my)?my:Math.random()*10,
        z: Number.isFinite(mz)?mz:Math.random()*10
      };
    });
  }
  return {points, centroids};
}

result3d = kmeans3d([...data3d], k3d)

// ===== Load Plotly (3D) =====
Plotly3d = await require("https://cdn.plot.ly/plotly-2.27.0.min.js")

viewof layout3d = {
  const container = html`<div style="max-width:1080px; font:14px system-ui;">
    <!-- controls: 3-row grid -->
    <div id="ctrl" style="
      display:grid;
      grid-template-columns: repeat(3, minmax(200px, 1fr));
      gap: 12px;
      margin-bottom: 12px;
    "></div>
    <div id="plot"></div>
  </div>`;

  const ctrl    = container.querySelector("#ctrl");
  const plotDiv = container.querySelector("#plot");

  // เอา inputs มาไว้บนสุด (กระจายเป็น 3 แถวใน grid)
  ctrl.append(
    viewof mode3d,
    viewof n_uniform3d,
    viewof ktrue3d,
    viewof nper3d,
    viewof spread3d,
    viewof k3d,
    viewof seed3d,
    viewof rerun3d
  );

  // plot area
  const plotArea = html`<div style="width:100%; height:520px;"></div>`;
  plotDiv.append(plotArea);

  // traces
  const groups  = d3.group(result3d.points, d => d.cid3d);
  const palette = d3.schemeTableau10;

  const pointTraces = Array.from(groups, ([cid, arr]) => ({
    name: `Cluster ${cid+1}`,
    type: "scatter3d",
    mode: "markers",
    x: arr.map(d=>d.x), y: arr.map(d=>d.y), z: arr.map(d=>d.z),
    marker: { size: 3, color: palette[cid % palette.length] }
  }));

  const cents = result3d.centroids.map((c,i)=>({...c, cid:i}));
  const centroidTrace = {
    name: "Centroids",
    type: "scatter3d",
    mode: "markers+text",
    x: cents.map(d=>d.x), y: cents.map(d=>d.y), z: cents.map(d=>d.z),
    marker: { size: 7, color: cents.map(d=>palette[d.cid % palette.length]), line:{color:"black", width:1} },
    text: cents.map((_,i)=>String(i+1)),
    textposition: "top center"
  };

  const layout = {
    margin: {l:0, r:0, t:0, b:0},
    scene: {aspectmode: "cube", xaxis:{title:"X"}, yaxis:{title:"Y"}, zaxis:{title:"Z"}},
    legend: {orientation: "h"}
  };

  Plotly3d.newPlot(plotArea, [...pointTraces, centroidTrace], layout, {responsive:true});

  return container;
}

Choosing the Optimal \(k\) in \(k\)-means using Silhouette Score

Selecting the right number of clusters \(k\) in \(k\)-means is crucial.
If \(k\) is chosen too high or too low, clustering performance may suffer.

Silhouette Score is a metric for evaluating clustering quality, based on:

\(a(i)\): the average distance between a point and other points in the same cluster → “cluster cohesion”
\(b(i)\): the average distance between a point and points in the nearest neighboring cluster → “cluster separation”

\[ S(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))} \]

\(S(i)\) close to 1 → The point is well clustered within its group.
\(S(i)\) close to 0 → The point lies near the boundary between clusters.
\(S(i)\) close to -1 → The point is likely assigned to the wrong cluster.

How to Use Silhouette Score to Choose \(k\)

The average Silhouette Score indicates how appropriate the chosen \(k\) is.

Run \(k\)-means with multiple values of \(k\) (e.g., \(k = 2, 3, 4, \dots, 10\)).
Calculate the Silhouette Score for each \(k\).
Select the \(k\) that yields the highest average Silhouette Score.

Interpreting Silhouette Values by Cluster

Cluster 1 (Well-clustered): All three points have high positive Silhouette values (0.60–0.85). This means the points are tightly grouped and clearly separated from other clusters.
Cluster 2 (Borderline): One point is exactly at 0 and others are close to 0 (0.05–0.10). This indicates these points lie near the boundary between clusters, so their assignment is uncertain.
Cluster 3 (Poor cluster): The only point has a negative Silhouette value (-0.20). This shows the point is closer to another cluster than to its own, suggesting possible misclassification.

Interactive K-means & Silhouette Score

(async () => {
  // ====== UI skeleton ======
  const box = html`<div style="max-width:1200px;font:14px system-ui;">
    <div id="ctrl" style="display:grid;grid-template-columns:repeat(2,minmax(0,1fr));gap:10px;margin-bottom:10px;"></div>

    <!-- Side-by-side row: scatter (wider) | table (narrower) -->
    <div id="viz" style="
      display:grid;
      grid-template-columns:8fr 2fr;   /* ~80/20 */
      gap:12px;
      align-items:start;
      min-height:680px;
    ">
      <div id="plot" style="width:100%; overflow-x:auto;"></div>
      <div id="dash" style="width:100%; overflow:auto;"></div>
    </div>

    <div id="note" style="margin-top:6px;color:#444"></div>
  </div>`;
  const ctrl    = box.querySelector("#ctrl");
  const viz     = box.querySelector("#viz");
  const plotDiv = box.querySelector("#plot");
  const dash    = box.querySelector("#dash");
  const note    = box.querySelector("#note");

  // ====== Controls ======
  const seedS     = Inputs.range([1,9999],  {value: 1234, step: 1,    label: "Seed"});
  const trueKS    = Inputs.range([2,5],     {value: 3,    step: 1,    label: "True clusters (for data)"}); 
  const nPerS     = Inputs.range([20,300],  {value: 80,   step: 10,   label: "Points per cluster"});
  const spreadS   = Inputs.range([0.2,2.0], {value: 0.7,  step: 0.1,  label: "Spread (σ)"});
  const kS        = Inputs.range([2,10],    {value: 3,    step: 1,    label: "k (K-means)"});
  const btnRes    = Inputs.button("Resample");

  const left  = html`<div></div>`;
  const right = html`<div></div>`;
  left.append(seedS, trueKS, nPerS);
  right.append(spreadS, kS, btnRes);
  ctrl.append(left, right);

  // ====== Utils ======
  function mulberry32(a){ return function(){ let t=a+=0x6D2B79F5; t=Math.imul(t^t>>>15,t|1); t^=t+Math.imul(t^t>>>7,t|61); return ((t^t>>>14)>>>0)/4294967296 } }
  function rnorm(rng, m=0, s=1){ let u=0,v=0; while(u===0) u=rng(); while(v===0) v=rng(); return m + s*Math.sqrt(-2*Math.log(u))*Math.cos(2*Math.PI*v); }
  function dist(a,b){ return Math.hypot(a.x-b.x, a.y-b.y); }
  function dist2(a,b){ const dx=a.x-b.x, dy=a.y-b.y; return dx*dx+dy*dy; }
  const clamp = (x,a,b)=> Math.max(a, Math.min(b,x));
  const palette = k => ["#1f77b4","#ff7f0e","#2ca02c","#d62728","#9467bd","#8c564b","#e377c2","#7f7f7f","#bcbd22","#17becf"].slice(0,k);

  // ====== Nonce for Resample ======
  let resampleNonce = 0; // จะเพิ่มทุกครั้งที่กดปุ่ม Resample

  // ====== Data generator ======
  function genData(trueK, nPer, spread, seed, nonce){
    const mixedSeed = (seed|0) ^ (nonce * 0x9e3779b9); // golden-ratio mix
    const rng = mulberry32(9000 + mixedSeed);
    const centers = Array.from({length:trueK}, (_,i)=>({
      x: (i+1)/(trueK+1) + 0.1*rnorm(rng,0,0.2),
      y: 0.35 + 0.25*Math.sin(i*1.7) + 0.1*rnorm(rng,0,0.2)
    })).map(c => ({x: clamp(c.x,0.1,0.9), y: clamp(c.y,0.1,0.9)}));

    const pts = [];
    for (let k=0;k<trueK;k++){
      for (let n=0;n<nPer;n++){
        pts.push({
          x: clamp(rnorm(rng, centers[k].x, spread*0.12), 0, 1),
          y: clamp(rnorm(rng, centers[k].y, spread*0.12), 0, 1)
        });
      }
    }
    return pts;
  }

  // ====== K-means (k-means++ init) ======
  function kmeansPPInit(points, k, rng){
    const n = points.length;
    const centers = [];
    centers.push(points[(rng()*n)|0]);
    const d2 = new Array(n).fill(0);
    while(centers.length < k){
      for(let i=0;i<n;i++){
        let best = Infinity;
        for(const c of centers){
          const vv = dist2(points[i], c);
          if(vv<best) best = vv;
        }
        d2[i] = best;
      }
      const sum = d2.reduce((s,x)=>s+x,0) || 1e-12;
      let r = rng()*sum, pick = 0;
      for(let i=0;i<n;i++){ r -= d2[i]; if(r<=0){ pick=i; break; } }
      centers.push({x: points[pick].x, y: points[pick].y});
    }
    return centers;
  }
  function kmeans(points, k, maxIters, seed, nonce){
    // ใช้ nonce ผสมเข้า seed ของ K-means ด้วย เพื่อให้ path เปลี่ยนแน่
    const rng = mulberry32(5000 + ((seed|0) ^ (nonce * 0x517cc1)) + k*31);
    k = Math.max(1, Math.min(k, points.length));
    let centers = kmeansPPInit(points, k, rng).map(c => ({x:c.x, y:c.y}));
    let labels  = new Array(points.length).fill(0);

    for(let iter=0; iter<maxIters; iter++){
      let changed = false;
      for(let i=0;i<points.length;i++){
        let best=-1, bestd=Infinity;
        for(let c=0;c<k;c++){
          const d = dist2(points[i], centers[c]);
          if(d < bestd){ bestd=d; best=c; }
        }
        if(labels[i] !== best){ labels[i]=best; changed = true; }
      }
      const sum = Array.from({length:k}, _ => ({x:0,y:0,c:0}));
      for(let i=0;i<points.length;i++){ const c=labels[i]; sum[c].x += points[i].x; sum[c].y += points[i].y; sum[c].c++; }
      for(let c=0;c<k;c++){
        if(sum[c].c>0){
          centers[c].x = sum[c].x / sum[c].c;
          centers[c].y = sum[c].y / sum[c].c;
        }
      }
      if(!changed) break;
    }
    const labeled = points.map((p,i)=> ({...p, cid: labels[i]}));
    return {labeled, centers};
  }

  // ====== Silhouette (mean) ======
  function silhouette(points, labels, k){
    const groups = Array.from({length:k}, ()=>[]);
    for (let i=0;i<labels.length;i++) groups[labels[i]].push(i);

    const D = new Array(points.length);
    for (let i=0;i<points.length;i++){
      D[i] = new Array(points.length);
      for (let j=0;j<points.length;j++){
        D[i][j] = i===j? 0 : dist(points[i], points[j]);
      }
    }

    const s = new Array(points.length).fill(0);
    for (let i=0;i<points.length;i++){
      const ci = labels[i];
      const same = groups[ci];
      let a = 0;
      if (same.length>1){
        for (const j of same) if (j!==i) a += D[i][j];
        a /= (same.length-1);
      } else a = 0;
      let b = Infinity;
      for (let c=0;c<k;c++){
        if (c===ci || groups[c].length===0) continue;
        let avg=0;
        for (const j of groups[c]) avg += D[i][j];
        avg /= groups[c].length;
        if (avg<b) b=avg;
      }
      s[i] = (b - a) / Math.max(a, b || 1e-9);
    }
    return s.reduce((u,v)=>u+v,0)/s.length;
  }

  // ====== RENDER ======
  function draw(){
    const seed   = seedS.value|0;
    const trueK  = trueKS.value|0;
    const nPer   = nPerS.value|0;
    const spread = spreadS.value;
    const k      = kS.value|0;

    const pts = genData(trueK, nPer, spread, seed, resampleNonce);
    const out = kmeans(pts, k, 50, seed, resampleNonce);
    const labeled = out.labeled;
    const centers = out.centers;

    // main scatter: base width (~80% of row) × 1.5
    plotDiv.innerHTML = "";
    const baseW = Math.max(360, Math.floor(((viz.clientWidth || box.clientWidth) * 0.8) - 16));
    const figW  = Math.floor(baseW * 1.5);
    const fig = Plot.plot({
      width: figW,
      height: 680,
      marginLeft: 56,
      marginBottom: 44,
      grid: true,
      x: {domain:[0,1], label:"x₁"},
      y: {domain:[0,1], label:"x₂"},
      marks: [
        Plot.dot(labeled, {
          x:"x", y:"y",
          fill:d=>palette(k)[d.cid],
          r:6, opacity:0.9, stroke:"white",
          title:d=>`cluster ${d.cid}`
        }),
        Plot.dot(centers, {x:"x", y:"y", r:10, symbol:"cross", stroke:"#000", strokeWidth:2, fill:"white"}),
        Plot.text(centers.map((c,i)=>({x:c.x,y:c.y,t:`C${i}`})), {x:"x", y:"y", text:"t", dy:-12, fill:"#000", fontWeight:700})
      ]
    });
    plotDiv.append(fig);

    // Silhouette table
    dash.innerHTML = "";
    const rows = [];
    let best = {k:2, s:-Infinity};
    for (let kk=2; kk<=10; kk++){
      const outkk = kmeans(pts, kk, 50, seed, resampleNonce);
      const s = silhouette(pts, outkk.labeled.map(d=>d.cid), kk);
      rows.push({k: kk, s});
      if (s > best.s) best = {k: kk, s};
    }

    const tbl = html`
      <table style="border-collapse:collapse;font:13px system-ui;min-width:180px;max-width:100%;width:100%;table-layout:fixed;">
        <colgroup>
          <col style="width:70px;">
          <col style="width:80px;"> <!-- Silhouette column ~60% narrower -->
        </colgroup>
        <thead>
          <tr>
            <th style="border:1px solid #ddd;padding:6px 8px;background:#f7f7f7;text-align:center;">k</th>
            <th style="border:1px solid #ddd;padding:6px 8px;background:#f7f7f7;text-align:center;">Silhouette</th>
          </tr>
        </thead>
        <tbody>
          ${rows.map(r=>{
            const isBest = (r.k===best.k);
            const bg = isBest ? "#e6ffe6" : "white";
            const fw = isBest ? "700" : "400";
            return html`<tr>
              <td style="border:1px solid #ddd;padding:6px 8px;text-align:center;background:${bg};font-weight:${fw}">${r.k}</td>
              <td style="border:1px solid #ddd;padding:6px 8px;text-align:right;background:${bg};font-weight:${fw}">${r.s.toFixed(4)}</td>
            </tr>`;
          })}
        </tbody>
      </table>`;
    dash.append(tbl);

    note.innerHTML = `True clusters (data): <b>${trueK}</b> · Displayed k (K-means): <b>${k}</b> · Best k by Silhouette: <b>${best.k}</b> ` +
                     (resampleNonce ? `<span style="color:#0a7">• resampled ×${resampleNonce}</span>` : "");
  }

  // events
  [seedS,trueKS,nPerS,spreadS,kS].forEach(el => el.addEventListener("input", draw));
  btnRes.addEventListener("click", () => { resampleNonce++; draw(); }); // ← เปลี่ยนชุดข้อมูลทุกครั้ง
  window.addEventListener("resize", draw);

  // initial
  draw();
  return box;
})()

Interactive Silhouette Values

(async () => {
  // ====== UI skeleton ======
  const box = html`<div style="max-width:980px;font:14px system-ui;">
    <div id="ctrl" style="display:flex; gap:12px; flex-wrap:wrap; margin-bottom:10px;"></div>
    <div id="plots" style="display:grid; grid-template-columns: 1fr 1fr; gap:14px;"></div>
    <div id="note" style="margin-top:6px;color:#444"></div>
  </div>`;
  const ctrl  = box.querySelector("#ctrl");
  const plots = box.querySelector("#plots");
  const note  = box.querySelector("#note");

  // ====== Controls ======
  const seedS   = Inputs.number({label:"Seed", value: 42, step:1, min:0});
  const nPerS   = Inputs.range([10, 80], {label:"Points per cluster", value: 30, step: 5});
  const spreadS = Inputs.range([0.2, 2.0], {label:"Spread (σ)", value: 0.7, step: 0.1});
  const kS      = Inputs.range([2, 4], {label:"K-means: k", value: 3, step: 1});
  const btnRes  = Inputs.button("Resample");
  ctrl.append(seedS, nPerS, spreadS, kS, btnRes);



  // ====== Utils ======
  function mulberry32(a){ return function(){ let t=a+=0x6D2B79F5; t=Math.imul(t^t>>>15,t|1); t^=t+Math.imul(t^t>>>7,t|61); return ((t^t>>>14)>>>0)/4294967296 } }
  function rnorm(rng, m=0, s=1){ let u=0,v=0; while(u===0) u=rng(); while(v===0) v=rng(); return m + s*Math.sqrt(-2*Math.log(u))*Math.cos(2*Math.PI*v); }
  const clamp = (x,a,b)=> Math.max(a, Math.min(b,x));
  const dist  = (a,b)=> Math.hypot(a.x-b.x, a.y-b.y);
  const dist2 = (a,b)=> { const dx=a.x-b.x, dy=a.y-b.y; return dx*dx+dy*dy; };
  const palette = k => ["#1f77b4","#ff7f0e","#2ca02c","#d62728"].slice(0,k);

  // ====== Data (true = 3 clusters) ======
  function genData(nPer, spread, seed){
    const rng = mulberry32(9000 + seed);
    const centers = [
      {x:0.25, y:0.30},
      {x:0.70, y:0.35},
      {x:0.50, y:0.75}
    ];
    const pts = [];
    for (let c=0;c<3;c++){
      for (let n=0;n<nPer;n++){
        pts.push({
          x: clamp(rnorm(rng, centers[c].x, spread*0.10), 0, 1),
          y: clamp(rnorm(rng, centers[c].y, spread*0.10), 0, 1)
        });
      }
    }
    return pts;
  }

  // ====== K-means (k-means++ init) ======
  function kmeansPPInit(points, k, rng){
    const n = points.length, centers = [];
    centers.push(points[(rng()*n)|0]);
    const d2 = new Array(n).fill(0);
    while(centers.length < k){
      for(let i=0;i<n;i++){
        let best=Infinity;
        for(const c of centers){ const vv = dist2(points[i], c); if(vv<best) best=vv; }
        d2[i] = best;
      }
      const sum = d2.reduce((s,x)=>s+x,0) || 1e-12;
      let r = rng()*sum, pick = 0;
      for(let i=0;i<n;i++){ r -= d2[i]; if(r<=0){ pick=i; break; } }
      centers.push({x:points[pick].x, y:points[pick].y});
    }
    return centers;
  }
  function kmeans(points, k, seed, maxIter=50){
    const rng = mulberry32(5000 + seed + k*31);
    k = Math.max(1, Math.min(k, points.length));
    let centers = kmeansPPInit(points, k, rng).map(c => ({x:c.x, y:c.y}));
    let labels  = new Array(points.length).fill(0);

    for(let it=0; it<maxIter; it++){
      let changed=false;
      for(let i=0;i<points.length;i++){
        let best=-1, bd=Infinity;
        for(let c=0;c<k;c++){
          const d=dist2(points[i], centers[c]);
          if(d<bd){ bd=d; best=c; }
        }
        if(labels[i]!==best){ labels[i]=best; changed=true; }
      }
      const sum = Array.from({length:k}, ()=>({x:0,y:0,c:0}));
      for(let i=0;i<points.length;i++){ const L=labels[i]; sum[L].x+=points[i].x; sum[L].y+=points[i].y; sum[L].c++; }
      for(let c=0;c<k;c++){ if(sum[c].c>0){ centers[c].x=sum[c].x/sum[c].c; centers[c].y=sum[c].y/sum[c].c; } }
      if(!changed) break;
    }
    return {labels, centers};
  }

  // ====== Silhouette per-point ======
  function silhouetteSamples(points, labels, k){
    const n = points.length;
    const groups = Array.from({length:k}, ()=>[]);
    for (let i=0;i<n;i++) groups[labels[i]].push(i);

    const D = new Array(n);
    for (let i=0;i<n;i++){
      D[i] = new Array(n);
      for (let j=0;j<n;j++){
        D[i][j] = (i===j? 0 : dist(points[i], points[j]));
      }
    }

    const s = new Array(n).fill(0);
    for (let i=0;i<n;i++){
      const ci = labels[i];
      const same = groups[ci];
      let a = 0;
      if (same.length>1){
        for (const j of same) if (j!==i) a += D[i][j];
        a /= (same.length-1);
      } else a = 0;

      let b = Infinity;
      for (let c=0;c<k;c++){
        if (c===ci || groups[c].length===0) continue;
        let avg=0;
        for (const j of groups[c]) avg += D[i][j];
        avg /= groups[c].length;
        if (avg<b) b=avg;
      }
      if (!isFinite(b)) b = a;
      s[i] = (b - a) / Math.max(a, b, 1e-12);
    }
    return s;
  }

  // ====== STATE (selection linking) ======
  let selected = new Set(); // indices

  function draw(){
    const seed   = seedS.value|0;
    const nPer   = nPerS.value|0;
    const spread = spreadS.value;
    const k      = kS.value|0;

    // data & kmeans
    const pts = genData(nPer, spread, seed);
    const {labels, centers} = kmeans(pts, k, seed);

    const cols = palette(k);
    const labeledPts = pts.map((p,i)=> ({...p, cid: labels[i], idx:i}));

    // silhouette per point → sort by cluster then by value desc
    const sil = silhouetteSamples(pts, labels, k);
    const silByCluster = sil
      .map((v,i)=> ({idx:i, s:v, cid: labels[i]}))
      .sort((a,b)=> (a.cid - b.cid) || (b.s - a.s));

    // cluster separators for barplot
    const clusterBreaks = [];
    for (let i=1; i<silByCluster.length; i++){
      if (silByCluster[i].cid !== silByCluster[i-1].cid) clusterBreaks.push(i - 0.5);
    }

    // ---------- Figures ----------
    function scatterFigure(){
      const base = labeledPts.filter(p => !selected.has(p.idx));
      const hi   = labeledPts.filter(p => selected.has(p.idx));
      return Plot.plot({
        width: 470, height: 430, marginLeft: 50, marginBottom: 40, grid:true,
        x: {domain:[0,1], label:"x₁"}, y:{domain:[0,1], label:"x₂"},
        marks: [
          Plot.dot(base, {x:"x", y:"y", r:3.6, fill:d=>cols[d.cid], stroke:"white", opacity:0.55, title:d=>`cluster ${d.cid}`}),
          Plot.dot(hi,   {x:"x", y:"y", r:6.0, fill:d=>cols[d.cid], stroke:"black", strokeWidth:1.5, title:d=>`SELECTED • i=${d.idx} • c=${d.cid}`}),
          Plot.dot(centers, {x:"x", y:"y", r:10, symbol:"cross", stroke:"#000", strokeWidth:2, fill:"white"}),
          Plot.text(centers.map((c,i)=>({x:c.x,y:c.y,t:`C${i}`})), {x:"x", y:"y", text:"t", dy:-12, fill:"#000", fontWeight:700})
        ]
      });
    }

    function barFigure(){
      const rows = silByCluster.map((d,i)=> ({rank:i, ...d}));
      return Plot.plot({
        width: 470, height: 430, marginLeft: 50, marginBottom: 40, grid:true,
        x: {label:"sorted by cluster → value (high → low)"},
        y: {label:"silhouette (−1..1)", domain:[-1,1]},
        marks: [
          Plot.ruleY([0]),
          clusterBreaks.length ? Plot.ruleX(clusterBreaks, {stroke:"#999", strokeOpacity:0.6, strokeDasharray:"4,3"}) : null,
          Plot.barY(rows, {
            x: "rank",
            y: "s",
            fill: d => selected.has(d.idx) ? "#222" : cols[d.cid],
            title: d => `i=${d.idx}, s=${d.s.toFixed(2)}, c=${d.cid}`
          })
        ]
      });
    }

    // render
    plots.innerHTML = "";
    let scatter = scatterFigure();
    let bars    = barFigure();
    plots.append(scatter);
    plots.append(bars);

    // click → toggle selection
    function bindBarClicks(barsNode){
      const rects = barsNode.querySelectorAll("rect");
      rects.forEach((rect, i) => {
        const d = silByCluster[i];
        rect.style.cursor = "pointer";
        rect.addEventListener("click", () => {
          if (selected.has(d.idx)) selected.delete(d.idx); else selected.add(d.idx);
          const s2 = scatterFigure();
          const b2 = barFigure();
          plots.replaceChild(s2, scatter);
          plots.replaceChild(b2, bars);
          scatter = s2; bars = b2;
          bindBarClicks(b2); // re-bind
          updateNote();
        }, {passive:true});
      });
    }
    bindBarClicks(bars);

    function updateNote(){
      const neg = sil.filter(v => v < 0).length;
      const near0 = sil.filter(v => Math.abs(v) < 0.05).length;
      const meanS = (sil.reduce((a,b)=>a+b,0)/sil.length).toFixed(3);
      note.innerHTML = `k = <b>${k}</b> · mean silhouette = <b>${meanS}</b> · negatives = <b>${neg}</b> · |s|<0.05 = <b>${near0}</b> · selected = <b>${selected.size}</b>`;
    }
    updateNote();
  }

  // events
  [seedS, nPerS, spreadS, kS].forEach(el => el.addEventListener("input", () => { selected.clear(); draw(); }));
  btnRes.addEventListener("click", () => { selected.clear(); draw(); });

  draw();
  return box;
})()

Advantages and Disadvantages of \(k\)-means

Advantages of \(k\)-means

✅ Easy to implement and computationally efficient
✅ Works well when data has clear cluster structures
✅ Scales effectively to large datasets

Disadvantages of \(k\)-means

❌ Requires specifying \(k\) in advance
❌ Sensitive to outliers
❌ Performs poorly when clusters are non-spherical or vary in size

\(k\)-means with Orange Data Mining

Example of \(k\)-means in Orange (1)

Example of \(k\)-means in Orange (2)

References

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer. https://doi.org/10.1007/978-0-387-84858-7
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning with applications in R (2nd ed.). Springer. https://doi.org/10.1007/978-1-0716-1418-1
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011
Scikit-learn. (n.d.). K-means clustering. Retrieved from https://scikit-learn.org/stable/modules/clustering.html#k-means
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.

\(~~~~~~~~~~\)Unsupervised Learning\(~~~~~~~~~~\)