PROBI is a data stream algorithm for the probabilistic Euclidean k-median problem based on 1. This implementation is an heuristic and fast version of PROBI. It also features a second algorithm for the probabilistic Euclidean k-means problem.

Getting PROBI

The PROBI sourcecode consists of the actual source code and an example program for computing probabilistic k-means and k-median clusterings. Unzip both folders into the same directory.
A pdf file of this documentation is also included.

The sourcecode provided on this webpage is the intellectual property of its authors. You have permission to download and use this sourcecode for private and academic use. You are not allowed to redistribute this sourcecode without our permission. If you want to use the sourcecode in any other way, please contact us.


Sourcecode and data files provided on this webpage come without any warranty. Use at your own risk!

Building PROBI

The PROBI sample applications can be built by generating the project files with Premake Premake in probi-environment and compiling them. A generated Makefile using GCC is provided for convenience.


Generation of a Makefile using Linux
> premake4 gmake

Four configurations are available
  • "Debug" and "Release" for k-median
  • "DebugKmeans" and "ReleaseKmeans" for k-means


Compiling using the "Release" configuration
> make config=release

Attention: Debug configurations need header files which are ordinarily available only on Unix based operating systems.

Data Sets

PROBI was experimentelly evaluated on five data sets. Tower, Covertype and Census are from the UCI Machine Learning Repository. BigCross is a subset of the Cartesian product of Tower and Covertype, created by the authors of 2. The fifth data set is called CalTech128 and consists of 128 SIFT descriptors. For further information on the data sets and the availability of these, please contact us.


For more detailed information also see PROBI: A Heuristic for the probabilistic k-median problem.


  1. C. Lammersen, M. Schmidt, C. Sohler: Probabilistic k-Median Clustering in Data Streams. WAOA 2012.
  2. M. Ackermann, C. Lammersen, M. Märtens, C. Raupach, C. Sohler, K. Swierkot: StreamKM++: A Clustering Algorithm for Data Streams. ALENEX 2010. Also see the webpage of StreamKM++.