BICO
1.0
|
Fast computation of k-means coresets in a data stream. More...
#include <bico.h>
Classes | |
class | BicoNode |
Class representing a node in BICO's tree. More... | |
Public Member Functions | |
Bico (size_t dimension, size_t n, size_t k, size_t p, size_t nMax, DissimilarityMeasure< T > *measure, WeightModifier< T > *weightModifier) | |
Constructs BICO for points of type T T can be an arbitrary type but it has to fulfil the requirements of CFREntry. More... | |
virtual ProxySolution< T > * | compute () |
Returns a coreset of all point read so far. More... | |
virtual Bico< T > & | operator<< (T const &element) |
Read a point Insert the point into BICO's tree. More... | |
void | print (std::ostream &os) |
Write the tree as GraphViz source into a stream. More... | |
Private Member Functions | |
void | insert (BicoNode *node, int level, T const &element) |
Inserts an element into a BicoNode at a certain level. More... | |
void | allocateBucket (int bucket, bool left) |
Allocates a new bucket. More... | |
int | calcBucketNumber (int rnd, double val) |
void | buildBuckets () |
Create initial buckets. More... | |
double | project (T point, int i) |
Projects a point onto a projection line. More... | |
void | rebuild () |
Rebuilds the tree. More... | |
void | rebuildFirstLevel (BicoNode *parent, BicoNode *child) |
void | rebuildTraversMerge (BicoNode *node, int level) |
void | computeTraverse (BicoNode *node, ProxySolution< T > *solution) |
Recursive computation of the coreset. More... | |
double | getT (int level) |
Returns the threshold for a given level. More... | |
double | getR (int level) |
Returns the radius for a given level. More... | |
void | print (std::ostream &os, BicoNode *node) |
Private Attributes | |
size_t | k |
Number of centers. More... | |
size_t | L |
Number of projections. More... | |
std::vector< std::vector < double > > | rndprojections |
Random projection vectors. More... | |
std::vector< std::vector < std::vector< typename BicoNode::FeatureList::iterator > > > | buckets |
Buckets for nearest neighbour search in first level. More... | |
std::vector< std::pair< double, double > > | borders |
Bucket borders. More... | |
std::vector< int > | bucket_radius |
Width of buckets. More... | |
int | nodeIdCounter |
Counter for unique BicoNode object ids. More... | |
std::unique_ptr < DissimilarityMeasure< T > > | measure |
Buffer for DissimilarityMeasure. More... | |
std::unique_ptr < WeightModifier< T > > | weightModifier |
Buffer for WeightModifier. More... | |
size_t | maxNumOfCFs |
Maximum coreset size. More... | |
size_t | curNumOfCFs |
Current coreset size. More... | |
size_t | dimension |
Dimension of the input points. More... | |
double | optEst |
Current estimation of the optimal clustering cost. More... | |
std::vector< double > | maxVal |
Extreme values used for constructing the nearest neighbour buckets. More... | |
BicoNode * | root |
Root node of BICO's tree. More... | |
bool | bufferPhase |
Buffer phase indicator. More... | |
std::vector< T > | buffer |
Buffer phase's buffer. More... | |
int | numOfRebuilds |
Current number of rebuilding. More... | |
Fast computation of k-means coresets in a data stream.
BICO maintains a tree which is inspired by the clustering tree of BIRCH, a SIGMOD Test of Time award-winning clustering algorithm. Each node in the tree represents a subset of these points. Instead of storing all points as individual objects, only the number of points, the sum and the squared sum of the subset's points are stored as key features of each subset. Points are inserted into exactly one node. A detailed description of BICO can be found here:
CluE::Bico< T >::Bico | ( | size_t | dimension, |
size_t | n, | ||
size_t | k, | ||
size_t | p, | ||
size_t | nMax, | ||
DissimilarityMeasure< T > * | measure, | ||
WeightModifier< T > * | weightModifier | ||
) |
Constructs BICO for points of type T T can be an arbitrary type but it has to fulfil the requirements of CFREntry.
dimension | Dimension of the data |
n | Size of the data |
k | Number of desired centeres |
p | Number of random projections used for nearest neighbour search in the first level |
nMax | Maximum coreset size |
measure | Implementation of the squared L2 metric for T |
weightModifier | Class to read and modify weight of T |
|
virtual |
|
virtual |
Read a point Insert the point into BICO's tree.
element | Point of type T |
Implements CluE::StreamingAlgorithm< T >.
void CluE::Bico< T >::print | ( | std::ostream & | os) |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
Recursive computation of the coreset.
node | Some node to be processed |
solution | ProxySolution containing the coreset |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
Buffer for DissimilarityMeasure.
|
private |
Buffer for WeightModifier.
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |