Skip to content

Commit db61e8d

Browse files
committed
Added YingYang benchmark. v0.1.5 ready
1 parent 1c0a8ae commit db61e8d

File tree

3 files changed

+122
-32
lines changed

3 files changed

+122
-32
lines changed

docs/src/benchmark_image.png

-172 KB
Loading

docs/src/index.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -120,9 +120,9 @@ r.converged # whether the procedure converged
120120
### Supported KMeans algorithm variations and recommended use cases
121121

122122
- [Lloyd()](https://cs.nyu.edu/~roweis/csc2515-2006/readings/lloyd57.pdf) - Default algorithm but only recommended for very small matrices (switch to `n_threads = 1` to avoid overhead).
123-
- [Hamerly()](https://www.researchgate.net/publication/220906984_Making_k-means_Even_Faster) - Useful in most cases. If uncertain about your use case, try this!
123+
- [Hamerly()](https://www.researchgate.net/publication/220906984_Making_k-means_Even_Faster) - Hamerly is good for moderate number of clusters (< 50?) and moderate dimensions (<100?).
124124
- [Elkan()](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf) - Recommended for high dimensional data.
125-
- [Yinyang()](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ding15.pdf) - An excellent choice for most cases. Swiss blade for many use cases.
125+
- [Yinyang()](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ding15.pdf) - Recommended for large dimensions and/or large number of clusters.
126126
- [Geometric()](http://cs.baylor.edu/~hamerly/papers/sdm2016_rysavy_hamerly.pdf) - (Coming soon)
127127
- [MiniBatch()](https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf) - (Coming soon)
128128

@@ -179,16 +179,17 @@ Currently, the benchmark speed tests are based on the search for optimal number
179179

180180
_________________________________________________________________________________________________________
181181

182-
|1 million (ms)|100k (ms)|10k (ms)|1k (ms)|package |language |
183-
|:------------:|:-------:|:------:|:-----:|:---------------------:|:---------:|
184-
| 580079 | 47804 |882.486 |17.424 | Clustering.jl | Julia |
185-
| 238716 | 20224 | 721.43 |24.581 | mlpack |C++ Wrapper|
186-
| 22946 | 2844 |177.329 | 6.403 | Lloyd | Julia |
187-
| 11084 | 1160 | 96.67 | 6.459 | Hamerly | Julia |
188-
| 13773 | 1457 | 80.484 | 6.854 | Elkan | Julia |
189-
| 1430000 | 146000 | 5770 | 344 | Sklearn Kmeans | Python |
190-
| 30100 | 3750 | 613 | 201 |Sklearn MiniBatchKmeans| Python |
191-
| 218200 | 15510 | 733.7 | 19.47 | Knor | R |
182+
|1 million sample (secs)|100k sample (secs)|10k sample (secs)|1k sample (secs)|package |language |
183+
|:---------------------:|:----------------:|:---------------:|:--------------:|:---------------------:|:---------:|
184+
| 538.53100 | 33.15700 | 0.74238 | 0.01710 | Clustering.jl | Julia |
185+
| 220.35700 | 20.93600 | 0.82430 | 0.02639 | mlpack |C++ Wrapper|
186+
| 20.55400 | 2.91300 | 0.17559 | 0.00609 | Lloyd | Julia |
187+
| 11.51800 | 0.96637 | 0.09990 | 0.00635 | Hamerly | Julia |
188+
| 14.01900 | 1.13100 | 0.07912 | 0.00646 | Elkan | Julia |
189+
| 9.97000 | 1.14600 | 0.10834 | 0.00704 | YingYang | Julia |
190+
| 1,430.00000 | 146.00000 | 5.77000 | 0.34400 | Sklearn Kmeans | Python |
191+
| 30.10000 | 3.75000 | 0.61300 | 0.20100 |Sklearn MiniBatchKmeans| Python |
192+
| 218.20000 | 15.51000 | 0.73370 | 0.01947 | Knor | R |
192193

193194
_________________________________________________________________________________________________________
194195

extras/ClusteringJL, Mlpack, & ParallelKMeans Benchmarks Final.ipynb

Lines changed: 109 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@
157157
"name": "stdout",
158158
"output_type": "stream",
159159
"text": [
160-
" 238.716 s (85 allocations: 2.08 GiB)\n"
160+
" 220.357 s (85 allocations: 2.08 GiB)\n"
161161
]
162162
}
163163
],
@@ -174,7 +174,7 @@
174174
"name": "stdout",
175175
"output_type": "stream",
176176
"text": [
177-
" 20.224 s (85 allocations: 212.88 MiB)\n"
177+
" 20.936 s (85 allocations: 212.88 MiB)\n"
178178
]
179179
}
180180
],
@@ -191,7 +191,7 @@
191191
"name": "stdout",
192192
"output_type": "stream",
193193
"text": [
194-
" 721.430 ms (85 allocations: 21.30 MiB)\n"
194+
" 824.295 ms (85 allocations: 21.30 MiB)\n"
195195
]
196196
}
197197
],
@@ -208,7 +208,7 @@
208208
"name": "stdout",
209209
"output_type": "stream",
210210
"text": [
211-
" 24.581 ms (85 allocations: 2.14 MiB)\n"
211+
" 26.388 ms (85 allocations: 2.14 MiB)\n"
212212
]
213213
}
214214
],
@@ -239,7 +239,7 @@
239239
"name": "stdout",
240240
"output_type": "stream",
241241
"text": [
242-
" 580.079 s (32485 allocations: 34.42 GiB)\n"
242+
" 538.531 s (31820 allocations: 33.24 GiB)\n"
243243
]
244244
}
245245
],
@@ -256,7 +256,7 @@
256256
"name": "stdout",
257257
"output_type": "stream",
258258
"text": [
259-
" 47.804 s (27599 allocations: 2.90 GiB)\n"
259+
" 33.157 s (20032 allocations: 2.13 GiB)\n"
260260
]
261261
}
262262
],
@@ -273,7 +273,7 @@
273273
"name": "stdout",
274274
"output_type": "stream",
275275
"text": [
276-
" 882.486 ms (8650 allocations: 93.42 MiB)\n"
276+
" 742.375 ms (7614 allocations: 82.82 MiB)\n"
277277
]
278278
}
279279
],
@@ -290,7 +290,7 @@
290290
"name": "stdout",
291291
"output_type": "stream",
292292
"text": [
293-
" 17.424 ms (1577 allocations: 2.20 MiB)\n"
293+
" 17.098 ms (1699 allocations: 2.34 MiB)\n"
294294
]
295295
}
296296
],
@@ -328,7 +328,7 @@
328328
"name": "stdout",
329329
"output_type": "stream",
330330
"text": [
331-
" 22.946 s (43965 allocations: 210.36 MiB)\n"
331+
" 20.554 s (40737 allocations: 210.10 MiB)\n"
332332
]
333333
}
334334
],
@@ -345,7 +345,7 @@
345345
"name": "stdout",
346346
"output_type": "stream",
347347
"text": [
348-
" 2.844 s (54383 allocations: 26.01 MiB)\n"
348+
" 2.913 s (56992 allocations: 26.27 MiB)\n"
349349
]
350350
}
351351
],
@@ -362,7 +362,7 @@
362362
"name": "stdout",
363363
"output_type": "stream",
364364
"text": [
365-
" 177.329 ms (34604 allocations: 5.56 MiB)\n"
365+
" 175.590 ms (34558 allocations: 5.56 MiB)\n"
366366
]
367367
}
368368
],
@@ -379,7 +379,7 @@
379379
"name": "stdout",
380380
"output_type": "stream",
381381
"text": [
382-
" 6.403 ms (10587 allocations: 1.37 MiB)\n"
382+
" 6.093 ms (10349 allocations: 1.35 MiB)\n"
383383
]
384384
}
385385
],
@@ -403,7 +403,7 @@
403403
"name": "stdout",
404404
"output_type": "stream",
405405
"text": [
406-
" 11.084 s (52379 allocations: 349.14 MiB)\n"
406+
" 11.518 s (62467 allocations: 350.25 MiB)\n"
407407
]
408408
}
409409
],
@@ -420,7 +420,7 @@
420420
"name": "stdout",
421421
"output_type": "stream",
422422
"text": [
423-
" 1.160 s (67677 allocations: 41.87 MiB)\n"
423+
" 966.373 ms (58405 allocations: 40.85 MiB)\n"
424424
]
425425
}
426426
],
@@ -437,7 +437,7 @@
437437
"name": "stdout",
438438
"output_type": "stream",
439439
"text": [
440-
" 96.670 ms (58154 allocations: 9.93 MiB)\n"
440+
" 99.897 ms (61185 allocations: 10.27 MiB)\n"
441441
]
442442
}
443443
],
@@ -456,7 +456,7 @@
456456
"name": "stdout",
457457
"output_type": "stream",
458458
"text": [
459-
" 6.459 ms (16734 allocations: 2.29 MiB)\n"
459+
" 6.350 ms (16373 allocations: 2.25 MiB)\n"
460460
]
461461
}
462462
],
@@ -487,7 +487,7 @@
487487
"name": "stdout",
488488
"output_type": "stream",
489489
"text": [
490-
" 13.773 s (50855 allocations: 700.80 MiB)\n"
490+
" 14.019 s (55965 allocations: 701.39 MiB)\n"
491491
]
492492
}
493493
],
@@ -504,7 +504,7 @@
504504
"name": "stdout",
505505
"output_type": "stream",
506506
"text": [
507-
" 1.457 s (69447 allocations: 77.21 MiB)\n"
507+
" 1.131 s (50298 allocations: 75.12 MiB)\n"
508508
]
509509
}
510510
],
@@ -521,7 +521,7 @@
521521
"name": "stdout",
522522
"output_type": "stream",
523523
"text": [
524-
" 80.484 ms (46490 allocations: 12.13 MiB)\n"
524+
" 79.120 ms (49220 allocations: 12.43 MiB)\n"
525525
]
526526
}
527527
],
@@ -540,13 +540,102 @@
540540
"name": "stdout",
541541
"output_type": "stream",
542542
"text": [
543-
" 6.854 ms (17482 allocations: 2.71 MiB)\n"
543+
" 6.464 ms (16613 allocations: 2.61 MiB)\n"
544544
]
545545
}
546546
],
547547
"source": [
548548
"@btime [ParallelKMeans.kmeans(Elkan(), $X_1k, i; tol=1e-6, max_iters=1000, verbose=false).totalcost for i = 2:10];"
549549
]
550+
},
551+
{
552+
"cell_type": "code",
553+
"execution_count": null,
554+
"metadata": {},
555+
"outputs": [],
556+
"source": []
557+
},
558+
{
559+
"cell_type": "markdown",
560+
"metadata": {},
561+
"source": [
562+
"## YingYang"
563+
]
564+
},
565+
{
566+
"cell_type": "code",
567+
"execution_count": 31,
568+
"metadata": {},
569+
"outputs": [
570+
{
571+
"name": "stdout",
572+
"output_type": "stream",
573+
"text": [
574+
" 9.970 s (23622 allocations: 346.02 MiB)\n"
575+
]
576+
}
577+
],
578+
"source": [
579+
"@btime [ParallelKMeans.kmeans(Yinyang(7), $X_1m, i; tol=1e-6, max_iters=1000, verbose=false).totalcost for i = 2:10];"
580+
]
581+
},
582+
{
583+
"cell_type": "code",
584+
"execution_count": 32,
585+
"metadata": {},
586+
"outputs": [
587+
{
588+
"name": "stdout",
589+
"output_type": "stream",
590+
"text": [
591+
" 1.146 s (31409 allocations: 37.89 MiB)\n"
592+
]
593+
}
594+
],
595+
"source": [
596+
"@btime [ParallelKMeans.kmeans(Yinyang(7), $X_100k, i; tol=1e-6, max_iters=1000, verbose=false).totalcost for i = 2:10];"
597+
]
598+
},
599+
{
600+
"cell_type": "code",
601+
"execution_count": 33,
602+
"metadata": {},
603+
"outputs": [
604+
{
605+
"name": "stdout",
606+
"output_type": "stream",
607+
"text": [
608+
" 108.337 ms (24498 allocations: 6.24 MiB)\n"
609+
]
610+
}
611+
],
612+
"source": [
613+
"@btime [ParallelKMeans.kmeans(Yinyang(7), $X_10k, i; tol=1e-6, max_iters=1000, verbose=false).totalcost for i = 2:10];"
614+
]
615+
},
616+
{
617+
"cell_type": "code",
618+
"execution_count": 34,
619+
"metadata": {},
620+
"outputs": [
621+
{
622+
"name": "stdout",
623+
"output_type": "stream",
624+
"text": [
625+
" 7.044 ms (9805 allocations: 1.53 MiB)\n"
626+
]
627+
}
628+
],
629+
"source": [
630+
"@btime [ParallelKMeans.kmeans(Yinyang(7), $X_1k, i; tol=1e-6, max_iters=1000, verbose=false).totalcost for i = 2:10];"
631+
]
632+
},
633+
{
634+
"cell_type": "code",
635+
"execution_count": null,
636+
"metadata": {},
637+
"outputs": [],
638+
"source": []
550639
}
551640
],
552641
"metadata": {

0 commit comments

Comments
 (0)