kmpp

When dealing with lots of data points, clustering algorithms may be used to group them. The k-means algorithm partitions n data points into k clusters and finds the centroids of these clusters incrementally.

The algorithm assigns data points to the closest cluster, and the centroids of each cluster are re-calculated. These steps are repeated until the centroids do not changing anymore.

The basic k-means algorithm is initialized with k centroids at random positions. This implementation addresses some disadvantages of the arbitrary initialization method with the k-means++ algorithm (see "Further reading" at the end).

Installation

Installing via npm

Install kmpp as Node.js module via NPM:

$ npm install kmpp

Example

var kmpp = require('kmpp');

kmpp([
  [x1, y1, ...],
  [x2, y2, ...],
  [x3, y3, ...],
  ...
], {
  k: 4
});

// =>
// { converged: true,
//   centroids: [[xm1, ym1, ...], [xm2, ym2, ...], [xm3, ym3, ...]],
//   counts: [ 7, 6, 7 ],
//   assignments: [ 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 ]
// }

API

`kmpp(points[, opts)`

Exectes the k-means++ algorithm on points.

Arguments:

points (Array): An array-of-arrays containing the points in format [[x1, y1, ...], [x2, y2, ...], [x3, y3, ...], ...]
opts: object containing configuration parameters. Parameters are
- distance (function): Optional function that takes two points and returns the distance between them.
- initialize (Boolean): Perform initialization. If false, uses the initial state provided in centroids and assignments. Otherwise discards any initial state and performs initialization.
- k (Number): number of centroids. If not provided, sqrt(n / 2) is used, where n is the number of points.
- kmpp (Boolean, default: true): If true, uses k-means++ initialization. Otherwise uses naive random assignment.
- maxIterations (Number, default: 100): Maximum allowed number of iterations.
- norm (Number, default: 2): L-norm used for distance computation. 1 is Manhattan norm, 2 is Euclidean norm. Ignored if distance function is provided.
- centroids (Array): An array of centroids. If initialize is false, used as initialization for the algorithm, otherwise overwritten in-place if of the correct size.
- assignments (Array): An array of assignments. Used for initialization, otherwise overwritten.
- counts (Array): An output array used to avoid extra allocation. Values are discarded and overwritten.

Returns an object containing information about the centroids and point assignments. Values are:

converged: true if the algorithm converged successfully
centroids: a list of centroids
counts: the number of points assigned to each respective centroid
assignments: a list of integer assignments of each point to the respective centroid
iterations: number of iterations used

Credits

Jared Harkins improved the performance by reducing the amount of function calls, reverting to Manhattan distance for measurements and improved the random initialization by choosing from points
Ricky Reusser refactored API

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docs		docs
example		example
lib		lib
test		test
.gitignore		.gitignore
.npmignore		.npmignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

kmpp

Installation

Installing via npm

Example

API

`kmpp(points[, opts)`

Credits

Further reading

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

cmtt/kmpp

Folders and files

Latest commit

History

Repository files navigation

kmpp

Installation

Installing via npm

Example

API

kmpp(points[, opts)

Credits

Further reading

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

`kmpp(points[, opts)`

Packages