Implementation of apriori algorithm for association rule mining. Written in Rust π¦ with Python bindings.
First install Rust
curl https://sh.rustup.rs -sSf | sh -s -- -ythen install the package
pip install git+https://github.com/remykarem/apriori-rs.gitTo compile the module yourself (macOS),
cargo rustc --release -- -C link-arg=-undefined -C link-arg=dynamic_lookup && mv target/release/libapriori.dylib ./apriori.soPrepare the data as a list of sets of strings.
>>> from apriori import generate_frequent_itemsets
>>> transactions = [
... set(["bread", "milk", "cheese"]),
... set(["bread", "milk"]),
... set(["milk", "cheese", "bread"]),
... set(["milk", "cheese", "bread"]),
... set(["milk", "cheese", "yoghurt"]),
... set(["milk", "bread"])]Then
>>> itemsets, id2item = generate_frequent_itemsets(transactions, min_support=0.5, max_length=3)
>>> itemsets[1]
{frozenset({2}): 4, frozenset({0}): 5, frozenset({1}): 6}
>>> itemsets[2]
{frozenset({0, 1}): 5, frozenset({1, 2}): 4, frozenset({0, 2}): 3}
>>> itemsets[3]
{frozenset({0, 1, 2}): 3}
>>> id2item
{2: 'cheese', 0: 'bread', 3: 'yoghurt', 1: 'milk'}Use generate_frequent_itemsets_id if your items are indices.
>>> rules, counts = apriori(
... transactions,
... min_support=0.3,
... min_confidence=0.2,
... max_length=3)>>> rules
[{"cheese", "bread"} -> {"milk"},
{"cheese"} -> {"milk"},
{"bread"} -> {"milk"},
{"milk"} -> {"bread"},
{"milk", "cheese"} -> {"bread"},
{"cheese"} -> {"bread", "milk"},
{"cheese"} -> {"bread"},
{"milk"} -> {"cheese"},
{"bread", "milk"} -> {"cheese"},
{"bread"} -> {"milk", "cheese"},
{"bread"} -> {"cheese"},
{"milk"} -> {"cheese", "bread"}]Obtain confidence and lift for a rule.
>>> rules[0]
{"bread", "cheese"} -> {"milk"}
>>> rules[0].confidence
1.0
>>> rules[0].lift
1.0Time taken (s) to generate frequent itemsets for the Online Retail II dataset (https://archive.ics.uci.edu/ml/machine-learning-databases/00502/) given minimum support and maximum length of itemset.
| Min support, length | apriori-rs | efficient-apriori | mlxtend | apyori |
|---|---|---|---|---|
| 0.100, 1 | 0.2s | 0.1s | 0.1s | 0.29s |
| 0.100, 2 | 0.2s | 0.1s | 0.1s | 0.26s |
| 0.100, 3 | 0.2s | 0.1s | 0.1s | 0.25s |
| 0.100, 4 | 0.2s | 0.1s | 0.1s | 0.25s |
| 0.100, 5 | 0.2s | 0.1s | 0.1s | 0.25s |
| 0.050, 1 | 0.2s | 0.1s | 0.1s | 0.25s |
| 0.050, 2 | 0.2s | 0.2s | 0.1s | 0.25s |
| 0.050, 3 | 0.2s | 0.2s | 0.1s | 0.25s |
| 0.050, 4 | 0.2s | 0.2s | 0.1s | 0.25s |
| 0.050, 5 | 0.2s | 0.2s | 0.2s | 0.25s |
| 0.010, 1 | 0.2s | 0.1s | 0.1s | 0.32s |
| 0.010, 2 | 16s | 261s | 73s | 2.1s |
| 0.010, 3 | 15s | 272s | 79s | 2.3s |
| 0.010, 4 | 17s | 284s | 78s | 2.4s |
| 0.010, 5 | 14s | 279s | 92s | 2.4s |
| 0.005, 1 | 0.2s | 0.1s | 0.1s | 0.25s |
| 0.005, 2 | 76s | 1190s | 327s | 5.7s |
| 0.005, 3 | 68s | 1278s | 643s | 20s |
| 0.005, 4 | 81s | 1168s | 638s | 39s |
| 0.005, 5 | 70s | 1217s | 643s | 41s |
Benchmark was carried out on macOS Big Sur (11.6); 2.7 GHz Quad-Core Intel Core i7. Python version 3.8.11.