k-means and EM clustering algorithms

The final data mining assignment (which was due in the middle of exams) was to implement the k-means and EM algorithms in C. These are two pretty simple clustering algorithms, with k-means (or renditions of k-means) used in industry all the time. Both algorithms work in 2-dimensions and take an input file of data points separated by a space.

K-means works as follow:

  • Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.
  • Assign each object to the group that has the closest centroid.
  • When all objects have been assigned, recalculate the positions of the K centroids.
  • Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.
Your Ad Here

5 Responses to “k-means and EM clustering algorithms”

  1. Leilane Castro says:

    Hi! I’m working in a research project about video segmentation and like to know if I can use your source code in this. If I change something I’ll let you know, sure. And, in case you say yes, tell me how to thanks you properly in my publication.

    Thanks for your attention.

  2. George A. Papayiannis says:

    Hi Leilane,

    The code is licensed uder GPL v2.
    So go ahead, as long as your confrom to that license.

    On a side note, as I said in the post I wrote them both in one night.
    The EM version surely has problems, k-means is probably right…

    Good luck,
    George

  3. Ravi says:

    Hi George
    Thanks for the excellent work!
    Just wondering how Kmeans can be extended to more then 2 dimensions.

    Thanks

    Ravi

  4. Preeti says:

    Hi, I would like to know how kmeans will be useful in case of incremental clustering. If the information is dynamically changing, then how kmeans or nearest neighbor will handle editing correct cluster. How to handle unlabelled data? if to begin with we have 90% unlabelled data and 10% labelled then how the algorithm looks like? if this ration need to change, 80% and 20% then how will be the said algorithm change? kindly guide in this regards.

    Preeti.

  5. bolt says:

    damn em don`t work at all

Leave a Reply

Line and paragraph breaks automatic.
XHTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>