discriminative_lexicon_model - Discriminative Lexicon Model in Python

discriminative_lexicon_model is a Python implementation of the Discriminative Lexicon Model (DLM; Baayen et al., 2019), a unified computational model of the mental lexicon that handles both comprehension (form-to-meaning mapping) and production (meaning-to-form mapping) using linear discriminative learning.

Overview

DLM consists of six core matrices:

C (word-form matrix): cue representations of words (e.g., triphones)
S (word-meaning matrix): semantic vectors for each word
F (comprehension weights): maps forms to meanings, estimated so that CF approximates S
G (production weights): maps meanings to forms, estimated so that SG approximates C
S-hat: predicted meanings (CF), the model's comprehension output
C-hat: predicted forms (SG), the model's production output

Installation

discriminative_lexicon_model is available on PyPI:

pip install discriminative_lexicon_model

Note that the PyPI version may be outdated. For the latest version, install directly from GitHub:

pip install git+https://github.com/quantling/discriminative_lexicon_model

Or clone the repository first and install locally:

git clone https://github.com/quantling/discriminative_lexicon_model
cd discriminative_lexicon_model
pip install .

Quick start

import discriminative_lexicon_model as dlm
import pandas as pd

# Define words and semantic vectors
words = ['walk', 'walked', 'walks']
sems = pd.DataFrame(
    {'WALK': [1, 1, 1], 'Present': [1, 0, 1],
     'Past': [0, 1, 0], 'ThirdPerson': [0, 0, 1]},
    index=words
)

# Build the full model in one step
mdl = dlm.LDL(words, sems, allmatrices=True)

# Check comprehension and production accuracy
mdl.accuracy(print_output=True)

# Examine predictions for individual words
dlm.predict_df(pred=mdl.chat, gold=mdl.cmat)
#       Word    Pred  Correct
# 0     walk    walk     True
# 1   walked  walked     True
# 2    walks   walks     True

# Compute linguistic measures
round(dlm.semantic_support('walked', 'ed#', mdl.chat), 3)   # 1.0
round(dlm.functional_load('ed#', mdl.fmat, 'walked', mdl.smat), 3)  # 0.853

You can also build the model step by step:

mdl = dlm.LDL()
mdl.gen_cmat(words)                # C-matrix from trigrams
mdl.gen_smat(sems)                 # S-matrix from semantic DataFrame
mdl.gen_fmat()                     # F = pinv(t(C)C) t(C)S
mdl.gen_gmat()                     # G = pinv(t(S)S) t(S)C
mdl.gen_shat()                     # S-hat = CF
mdl.gen_chat()                     # C-hat = SG

Matrices can be saved to and loaded from compressed CSV files:

# Save all matrices to a directory
mdl.save_matrices('path/to/directory')

# Load matrices back into a new model
mdl2 = dlm.LDL()
mdl2.load_matrices('path/to/directory')

Individual matrices can also be saved and loaded directly:

dlm.save_mat(mdl.cmat, 'cmat.csv.gz')
cmat = dlm.load_mat('cmat.csv.gz')

Features

Full DLM pipeline: cue extraction, matrix estimation, prediction, and evaluation
Endstate-learning, incremental learning, frequency-weighted learning
Incremental production via the produce algorithm (iterative cue selection with validity constraints)
Linguistic measures: semantic support, functional load, production accuracy, uncertainty, vector length
Optional GPU acceleration via PyTorch for production and incremental learning
Semantic vectors from fastText embeddings or custom DataFrames
All matrices stored as xarray.DataArray with labeled dimensions
Matrix I/O (save/load as CSV)

Modules

ldl: LDL class that bundles all matrices and methods into a single model object
mapping: core functions for cue extraction, matrix generation, production, and incremental learning
measures: linguistic measures (semantic support, functional load, uncertainty, etc.)
performance: prediction accuracy and evaluation utilities

Dependencies

Python >= 3.11
numpy, scipy, pandas, xarray, netCDF4, tqdm, fasttext

Optional:

PyTorch (for GPU-accelerated production and incremental learning)

Documentation

Full documentation is hosted on Read the Docs.

Citation

If you use this package in your research, please cite:

Baayen, R. H., Chuang, Y.-Y., Shafaei-Bajestan, E., & Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1--39.

Authors and Contributors

discriminative_lexicon_model is being developed and maintained by Motoki Saito.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
discriminative_lexicon_model		discriminative_lexicon_model
docs		docs
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

discriminative_lexicon_model - Discriminative Lexicon Model in Python

Overview

Installation

Quick start

Features

Modules

Dependencies

Documentation

Citation

Authors and Contributors

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

discriminative_lexicon_model - Discriminative Lexicon Model in Python

Overview

Installation

Quick start

Features

Modules

Dependencies

Documentation

Citation

Authors and Contributors

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages