Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1241 commits
Select commit Hold shift + click to select a range
0e9b440
Merge pull request #147 from databio/dev_Julia
Julia820 Apr 3, 2024
1e12e38
Merge branch 'dev' into dev_search
khoroshevskyi Apr 3, 2024
24668b6
fixed bbclient
khoroshevskyi Apr 3, 2024
bee8b22
Merge remote-tracking branch 'origin/dev' into dev
khoroshevskyi Apr 3, 2024
c806918
updated requirements
khoroshevskyi Apr 3, 2024
c36d061
cleaned tests
khoroshevskyi Apr 3, 2024
3e8b590
Merge remote-tracking branch 'origin/dev_search' into dev_search
khoroshevskyi Apr 4, 2024
6515a45
few fixes and linting
khoroshevskyi Apr 4, 2024
6b279c0
updated docstrings
khoroshevskyi Apr 4, 2024
0310e69
Merge pull request #143 from databio/dev_search
khoroshevskyi Apr 4, 2024
69189f1
fixed python 3.8
khoroshevskyi Apr 4, 2024
117b183
Merge remote-tracking branch 'origin/dev' into dev
khoroshevskyi Apr 4, 2024
870e25c
skip atacformer for now
nleroy917 Apr 4, 2024
cf34587
fix region2vec tests
nleroy917 Apr 4, 2024
07982ca
fix tests
nleroy917 Apr 4, 2024
24cd19a
fix typos
nleroy917 Apr 4, 2024
cf2e671
test skip (for testing)
khoroshevskyi Apr 4, 2024
0768131
fixed tests
khoroshevskyi Apr 4, 2024
fe3740a
python 3.9
khoroshevskyi Apr 4, 2024
2c0cb96
fixed variable
khoroshevskyi Apr 4, 2024
2d6de26
lint
khoroshevskyi Apr 4, 2024
4728857
version
khoroshevskyi Apr 4, 2024
6f650a5
Merge pull request #144 from databio/dev
khoroshevskyi Apr 4, 2024
4657e53
small fix
khoroshevskyi Apr 4, 2024
4aab0c5
try to optimize embedding generation
nleroy917 Apr 5, 2024
68754f2
Change output of search retrieval evaluation from Tuple to Dict
ClaudeHu Apr 10, 2024
c309341
Update on text2bednn
ClaudeHu Apr 10, 2024
653343b
remove tokenizers from geniml... keep them in genimtools
nleroy917 Apr 21, 2024
1fde0e8
Add files via upload
ClaudeHu Apr 21, 2024
acc9dfa
Add files via upload
ClaudeHu Apr 21, 2024
49f688d
update on init
ClaudeHu Apr 21, 2024
e807e2b
update on requirements for training
ClaudeHu Apr 21, 2024
aeee11f
adjust return type
ClaudeHu Apr 21, 2024
5288e86
adjust according to example code of new version of hnswlib
ClaudeHu Apr 21, 2024
ad38597
create AnnDataTokenizer
nleroy917 Apr 22, 2024
96386e6
adjust training vec pairs curation from local storage backend
ClaudeHu Apr 24, 2024
894d39c
nl embed model names for logging
ClaudeHu Apr 24, 2024
69d2218
adust default M for hnsw search
ClaudeHu May 2, 2024
b352c47
tweaks to support new tokenizers
nleroy917 May 10, 2024
5a33450
fix some tests
nleroy917 May 10, 2024
14b1935
fix tests
nleroy917 May 10, 2024
7b04cc2
logo
nsheff May 10, 2024
6a599ff
Merge branch 'master' of github.com:databio/geniml
nsheff May 10, 2024
4ecaf00
Added bioconductor cache for bedfiles
khoroshevskyi May 20, 2024
48eb48a
blaack .
khoroshevskyi May 20, 2024
8c6b39e
apply langchain in search interface for text embedding
ClaudeHu May 20, 2024
77cad1f
fixed #151
khoroshevskyi May 21, 2024
8d4b8e0
requirements update
khoroshevskyi May 21, 2024
5a7141f
tweak things to fit API
nleroy917 May 22, 2024
49a7340
tweaks
nleroy917 May 22, 2024
d43bfd3
Merge pull request #152 from databio/dev_alex
khoroshevskyi May 22, 2024
9c9027b
fix type checking
nleroy917 May 22, 2024
3a62ebb
Merge branch 'dev' into optimize_r2v_encoding
nleroy917 May 22, 2024
4be4df3
updated bbclient cli
khoroshevskyi May 22, 2024
518f443
reverted some changes
khoroshevskyi May 22, 2024
48fae30
first caching zarr
khoroshevskyi May 22, 2024
5a0c2e7
bump genimtools
nleroy917 May 23, 2024
63fea71
Merge remote-tracking branch 'refs/remotes/origin/optimize_r2v_encodi…
nleroy917 May 23, 2024
92f2e65
fix issue with getting special tokens
nleroy917 May 23, 2024
e92f0e4
Fixed #153
khoroshevskyi May 23, 2024
c877e63
do we need .tolist() anymore?
nleroy917 May 23, 2024
b3f78e1
updated supprt of python3.12
khoroshevskyi May 23, 2024
97173ab
updated scipy version
khoroshevskyi May 23, 2024
f06cc4e
added tokens to CLI
khoroshevskyi May 23, 2024
cc33815
updated typing
khoroshevskyi May 23, 2024
d817601
tweaks
nleroy917 May 23, 2024
09ada8f
fix error in plot training graph
ClaudeHu May 24, 2024
1a1f29c
delete embedder package from setup
ClaudeHu May 24, 2024
f003309
tokenizer performance updates
nleroy917 May 24, 2024
2a2e25b
sorted dependencies
khoroshevskyi May 24, 2024
131031b
improved exceptions in bbclient
khoroshevskyi May 24, 2024
97815a8
updated return of zarr
khoroshevskyi May 24, 2024
f995035
update on file backend
ClaudeHu May 24, 2024
a20ee4c
Merge pull request #154 from databio/dev_alex
khoroshevskyi May 24, 2024
bde05d3
add anecdotal search from huggingface data
ClaudeHu May 24, 2024
8483715
fix relative import error
ClaudeHu May 24, 2024
d417aa6
fix relative import error
ClaudeHu May 24, 2024
765b384
fix relative import error
ClaudeHu May 24, 2024
5948d01
hf anecdotal search
ClaudeHu May 24, 2024
dcded89
edit error logging during search
ClaudeHu May 26, 2024
465d89c
prep for MLM training
nleroy917 May 28, 2024
c55cea2
updated tokens endpoint
khoroshevskyi May 28, 2024
35e86df
Merge branch 'dev' into optimize_r2v_encoding
nleroy917 May 28, 2024
7f6743b
optimize masking
nleroy917 May 28, 2024
e3b64eb
Merge remote-tracking branch 'refs/remotes/origin/optimize_r2v_encodi…
nleroy917 May 28, 2024
ecfeb47
add extensive documentation to masking strategy
nleroy917 May 29, 2024
aaedf7b
more mlm documentation
nleroy917 May 29, 2024
4aa99d2
allow loading embeddings from huggingface repo
ClaudeHu May 31, 2024
aaf4b5b
work on training of Atacformer
nleroy917 Jun 1, 2024
7d51028
skip atacformer tests
nleroy917 Jun 3, 2024
6d18f69
fix tests -- tokenizer bugs... oops
nleroy917 Jun 3, 2024
f33a269
fix tests... again
nleroy917 Jun 3, 2024
ad01e71
Merge pull request #149 from databio/optimize_r2v_encoding
nleroy917 Jun 3, 2024
99cd17f
pin genimtools
nleroy917 Jun 3, 2024
04cb428
bump version
nleroy917 Jun 3, 2024
87b90cc
updated typing check in region2vec
khoroshevskyi Jun 3, 2024
ddb7e9a
updated typing in encode function
khoroshevskyi Jun 3, 2024
fcf8fe9
added remove and cache tokens to bbconf
khoroshevskyi Jun 4, 2024
48deaba
Merge branch 'dev_search_merge' into dev_search
ClaudeHu Jun 4, 2024
dfeff80
Merge pull request #159 from databio/dev_search
ClaudeHu Jun 4, 2024
cf55281
lint and docstrings
ClaudeHu Jun 4, 2024
e0c4b7f
lint
khoroshevskyi Jun 4, 2024
e44045d
Pass test in all cases including qdrant
ClaudeHu Jun 4, 2024
6d5ddc6
Update text2bed.py
ClaudeHu Jun 4, 2024
8bc724a
HNSWBackend accepts yaml file payloads
ClaudeHu Jun 4, 2024
ead90b6
local & huggingface load detection
ClaudeHu Jun 4, 2024
8699160
comment in const for hnsw parameters
ClaudeHu Jun 4, 2024
ea8368b
correct docstring format
ClaudeHu Jun 4, 2024
3e0e131
type hint add
ClaudeHu Jun 4, 2024
5c0f5ba
version adjustment for langchain
ClaudeHu Jun 5, 2024
8af4829
lint
ClaudeHu Jun 5, 2024
495faaf
lint
khoroshevskyi Jun 5, 2024
427812c
remove tool tokens not associated with regions from eval
ClaudeHu Jun 5, 2024
494b5fc
remove tool tokens not associated with regions from eval
ClaudeHu Jun 5, 2024
2f02c54
Delete accidentally pushed content
ClaudeHu Jun 5, 2024
1a21a48
adjustment based on dev_guangtao
ClaudeHu Jun 5, 2024
847f6d6
Only lint on PR to master
nsheff Jun 6, 2024
ff9658f
Simplify pytesting
nsheff Jun 6, 2024
fae0918
address issue with RAM in Region2VecExModel encode (#161)
nleroy917 Jun 6, 2024
0029b26
Merge pull request #160 from databio/dev_search_merge
khoroshevskyi Jun 6, 2024
5bb6375
fixed pytest caching
khoroshevskyi Jun 6, 2024
440f979
Merge branch 'master' of github.com:databio/geniml
nsheff Jun 6, 2024
fa9476b
improved opening bed files
khoroshevskyi Sep 18, 2024
7e78cc0
Fixed #164
khoroshevskyi Sep 18, 2024
95cd35b
fixed incorrect constant issue
khoroshevskyi Sep 19, 2024
42f3f03
cleand unused data
khoroshevskyi Sep 19, 2024
2809e8e
Merge pull request #170 from databio/dev_io
khoroshevskyi Sep 19, 2024
27d9d1b
Fixed #171
khoroshevskyi Sep 23, 2024
ad876a8
updated genimtools to gtars
khoroshevskyi Sep 23, 2024
2cde700
updated pytest github
khoroshevskyi Sep 23, 2024
cc32265
updated pytest github 2
khoroshevskyi Sep 23, 2024
22e8ff6
updated pytest github 3
khoroshevskyi Sep 23, 2024
ab77363
Few PR updates
khoroshevskyi Sep 27, 2024
ba48861
cleaning imports
khoroshevskyi Sep 27, 2024
735a129
new bi-vector search backend
ClaudeHu Oct 1, 2024
9e58e80
edit based on PR feedback
ClaudeHu Oct 2, 2024
c8aafaf
update based on feedback
ClaudeHu Oct 2, 2024
f71a279
fixed #175
khoroshevskyi Oct 4, 2024
c4b132e
Merge remote-tracking branch 'origin/dev_deps' into dev_bivec_backend
khoroshevskyi Oct 4, 2024
06e4e21
Merge pull request #172 from databio/dev_deps
khoroshevskyi Oct 7, 2024
e026067
fixed deploying bedbase
khoroshevskyi Oct 9, 2024
61de096
bump version
khoroshevskyi Oct 9, 2024
22bfc1e
Merge pull request #179 from databio/dev_torch_dep
khoroshevskyi Oct 9, 2024
b73c584
updated version of torch
khoroshevskyi Oct 9, 2024
84c94e5
avoid error occur when sorting retrieval result
ClaudeHu Oct 10, 2024
e7d0041
adjust comment
ClaudeHu Oct 10, 2024
2c30b42
prevent potential error from only one file matching a metadata string
ClaudeHu Oct 11, 2024
ee2d136
testing qdrant backend load with uuid
ClaudeHu Oct 11, 2024
e6506d5
add hyphens to uuid
ClaudeHu Oct 11, 2024
6448106
clean unused import
ClaudeHu Oct 11, 2024
364c4cf
gave warning for missing vector
ClaudeHu Oct 11, 2024
bbb0923
correct error statement
ClaudeHu Oct 11, 2024
6d82419
batch search at qdrant client
ClaudeHu Oct 11, 2024
53fcb42
few bivec search improvements
khoroshevskyi Oct 12, 2024
8ec788a
isort
khoroshevskyi Oct 12, 2024
2d88033
change version
ClaudeHu Oct 14, 2024
096974f
Merge branch 'master' into dev_bivec_backend
khoroshevskyi Oct 14, 2024
479dd57
Merge pull request #174 from databio/dev_bivec_backend
ClaudeHu Oct 14, 2024
e7a1722
Fixed offset
khoroshevskyi Oct 15, 2024
01c4a12
single batch for search request
ClaudeHu Oct 16, 2024
ee0e5ee
recover overwritten edit on offset
ClaudeHu Oct 16, 2024
4ae2b17
double check score range
ClaudeHu Oct 16, 2024
74def11
small changes
khoroshevskyi Oct 16, 2024
7a1a81b
fixed #189
khoroshevskyi Oct 17, 2024
34e3b96
remove try except for a solved but with previously unknown reason
ClaudeHu Oct 17, 2024
4e70a4e
reduce number of retrieve_info() to qdrant client
ClaudeHu Oct 17, 2024
29888a1
set max batch size to not crash server
ClaudeHu Oct 18, 2024
e507723
Merge pull request #188 from databio/dev_bivec_backend
khoroshevskyi Oct 18, 2024
364d7b6
fixed incorrect caching of the files
khoroshevskyi Oct 28, 2024
f3a9cfe
Merge pull request #190 from databio/dev_bbcache
khoroshevskyi Oct 31, 2024
13330c1
switch to fastembed?
nsheff Nov 27, 2024
7dc3e03
correct the fastembed embedding fuction, double check in pytest it wo…
ClaudeHu Dec 2, 2024
d315f9f
Merge pull request #196 from databio/fastembed
khoroshevskyi Dec 3, 2024
7cae0fc
fix scembed tokenizer
nleroy917 Dec 11, 2024
f23397a
fix tokenizer creation issue with ScEmbed (#198)
nleroy917 Dec 12, 2024
ded5b72
add data to gitignore
nleroy917 Dec 12, 2024
decdede
fix tokenizers
nleroy917 Dec 13, 2024
41c73ca
remove the data stuff that got made by bbcache
nleroy917 Dec 13, 2024
b174be8
fix things
nleroy917 Dec 14, 2024
89127ea
fix things from merge
nleroy917 Dec 14, 2024
6ace3bc
fix merge issues
nleroy917 Dec 14, 2024
453c7cb
add cell_hierarchy back in
nleroy917 Dec 15, 2024
643af14
add imports back
nleroy917 Dec 15, 2024
82ca8b0
fix merge issues
nleroy917 Dec 15, 2024
4701fbc
fix merge issues again
nleroy917 Dec 15, 2024
a02a07f
add positional encodings
nleroy917 Dec 16, 2024
1d22ded
actually implement in forward method
nleroy917 Dec 16, 2024
edc672d
send positional embeddings to device on lightning start
nleroy917 Dec 16, 2024
5376ac3
fix tests
nleroy917 Dec 16, 2024
e8d2144
try new place for moving to device
nleroy917 Dec 16, 2024
bc4d9a5
check for batche dinput
nleroy917 Dec 16, 2024
bd9fcb7
revert
nleroy917 Dec 17, 2024
6b21044
tie embeddings
nleroy917 Dec 17, 2024
8a90312
comments
nleroy917 Dec 17, 2024
a1dda35
assertions
nleroy917 Dec 17, 2024
ff26773
restructure MLM
nleroy917 Dec 17, 2024
9a4267a
issues with norm_first
nleroy917 Dec 19, 2024
0b5af33
update export + loading
nleroy917 Dec 19, 2024
9cbbb6c
add safetensors
nleroy917 Dec 19, 2024
6f4ddba
update encoder_layer def
nleroy917 Dec 19, 2024
754eb17
Fixed biocfilecache. Added inspect functions.
khoroshevskyi Dec 20, 2024
f1ec0c8
sort tokens
nleroy917 Dec 21, 2024
aeb75e0
tests, imports and lint
khoroshevskyi Dec 22, 2024
612ce32
speeding up tests
khoroshevskyi Dec 22, 2024
11ce521
Merge pull request #200 from databio/bbcache_0_6
khoroshevskyi Jan 3, 2025
94a5f74
Merge pull request #199 from databio/bug_fix_r2v_scembed
nsheff Jan 3, 2025
8561d96
updated requirements
khoroshevskyi Jan 9, 2025
5b5f5dd
remove pos stuff for now
nleroy917 Jan 13, 2025
14bc689
improve reindex time cost
ClaudeHu Feb 12, 2025
10ed147
Some playing around with Rust
khoroshevskyi Feb 17, 2025
1bd1ae8
Updated requirements
khoroshevskyi Feb 18, 2025
313c9e1
Merge pull request #205 from databio/dev_reindex
ClaudeHu Feb 19, 2025
20f7818
Merge branch 'master' into bigbed
khoroshevskyi Feb 26, 2025
7b1fd8e
update so that we are only computing for masked tokens
nleroy917 Feb 26, 2025
bdc3f4a
merge with atacformer_pos
nleroy917 Feb 26, 2025
039db63
moved to gtars
khoroshevskyi Mar 3, 2025
03a9e05
fixed calculation of identifier in RegionSet
khoroshevskyi Mar 3, 2025
f946b93
Fixed RegionSet imports in region2vec
khoroshevskyi Mar 24, 2025
a0ae125
Merge branch 'master' into bigbed
khoroshevskyi Mar 26, 2025
26b9352
few fixes for RegionSet from gtars
khoroshevskyi Mar 26, 2025
673fd19
fixed new RegionSet
khoroshevskyi Apr 9, 2025
7d7fc6f
updated gtars version
khoroshevskyi Apr 9, 2025
c76a44e
updated version
khoroshevskyi Apr 9, 2025
b7b537b
fix tokenizrs in r2v
nleroy917 Apr 14, 2025
eb3f7fd
fix some tests
nleroy917 Apr 14, 2025
b637b61
fix error in loading bedfile
nleroy917 Apr 14, 2025
02cccbe
all tests are passing
nleroy917 Apr 14, 2025
bfaabee
fix the thing
nleroy917 Apr 14, 2025
6cb10e1
remove atacformer
nleroy917 Apr 14, 2025
6a51a8a
fix error with tokenization.main
nleroy917 Apr 14, 2025
6507008
Added deprecation warning for RegionSet
khoroshevskyi Apr 15, 2025
4609820
removed geniml atacformer from setup
khoroshevskyi Apr 15, 2025
2a16219
Merge pull request #211 from databio/bigbed
khoroshevskyi Apr 15, 2025
1b1a974
smarter batching strategy for r2v encode
nleroy917 Apr 28, 2025
002a969
decreased batching
khoroshevskyi Apr 30, 2025
fb84bc9
updated version
khoroshevskyi May 2, 2025
4679f41
Merge pull request #212 from databio/r2v_batch_process_hf
khoroshevskyi May 2, 2025
0ec7fc5
add atacformer code
nleroy917 Jul 16, 2025
d3b915d
add CRAFT code
nleroy917 Jul 16, 2025
c418022
add geneformer implementation code
nleroy917 Jul 16, 2025
5c7c956
add modules
nleroy917 Jul 22, 2025
fbc38c4
warning_once
nleroy917 Jul 22, 2025
7428371
replace with logger
nleroy917 Jul 22, 2025
df463e0
add patching helper utility to enable atacformer to be used with appl…
nleroy917 Jul 24, 2025
db9cf9a
add patch function to atacformer __init__.py
nleroy917 Jul 24, 2025
43736e0
Merge pull request #214 from databio/atacformer
nleroy917 Aug 4, 2025
6de9a69
bump the version
nleroy917 Aug 12, 2025
0166d76
use optimized anndata_tokenize function. start with new imports
nleroy917 Sep 5, 2025
b3dd061
switch r2v dataset to us parquet files
nleroy917 Sep 5, 2025
3432aba
line spacing
nleroy917 Sep 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions .github/workflows/black.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
name: Lint

on: [pull_request]
on:
pull_request:
branches: master

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: psf/black@20.8b1
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
- uses: psf/black@stable
31 changes: 31 additions & 0 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# This workflows will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries

name: Upload Python Package

on:
release:
types: [created]

jobs:
deploy:
runs-on: ubuntu-latest
name: upload release to PyPI
permissions:
id-token: write # IMPORTANT: this permission is mandatory for trusted publishing

steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build and publish
run: |
python setup.py sdist bdist_wheel
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
41 changes: 41 additions & 0 deletions .github/workflows/run-pytest-dev.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Run pytests for dev

on:
pull_request:
branches: [dev]

jobs:

pytest:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: ["3.12"]
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v3

- name: Hack setup-python cache

if: hashFiles('**/requirements.txt', '**/pyproject.toml') == ''
run: |
touch ./requirements.txt

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip' # caching can speed up the workflow by reusing the installed dependencies

- name: Install uv
run: pip install uv

- name: Install test dependencies
run: if [ -f requirements/requirements-test.txt ]; then uv pip install -r requirements/requirements-test.txt --system; fi

- name: Install package
run: uv pip install .[ml] --system

- name: Run pytest tests
run: pytest tests -x -vv --remote-data
35 changes: 35 additions & 0 deletions .github/workflows/run-pytest-release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Run pytests for release

on:
pull_request:
branches: [master]

jobs:

pytest:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: ["3.9", "3.12"]
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install uv
run: pip install uv

- name: Install test dependencies
run: if [ -f requirements/requirements-test.txt ]; then uv pip install -r requirements/requirements-test.txt --system; fi

- name: Install package
run: uv pip install .[ml] --system

- name: Run pytest tests
run: pytest tests -x -vv --remote-data

33 changes: 0 additions & 33 deletions .github/workflows/run-pytest.yml

This file was deleted.

38 changes: 31 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
Expand Down Expand Up @@ -156,14 +156,14 @@ cython_debug/
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea name.
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

gitk/__pycache__
gitk/eval/__pycache__
gitk/region2vec/__pycache__
gitk/tokenization/__pycache__
gitk/tokenization/bedtools
geniml/__pycache__
geniml/eval/__pycache__
geniml/region2vec/__pycache__
geniml/tokenization/__pycache__
geniml/tokenization/bedtools
# for testing
test.*

Expand All @@ -172,6 +172,30 @@ test.*
# data
tests/data/buenrostro2018.h5ad
tests/data/buenrostro_metadata.tsv
tests/data/model-tests/*
bedshifted*
examples/sh_output*
examples/py_output*

# integration test stuff
tests/integration/buenrostro2018.model

# examples
examples/scembed/pbmc/
examples/scembed/buenrostro
examples/scembed/atlas
examples/scembed/luecken2021

# vector db stuff
qdrant_storage/

.ruff_cache/


# MacOS
.DS_Store

local_cache

lightning_logs
data/
15 changes: 15 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: 2

build:
os: ubuntu-22.04
tools:
python: "3.11"

mkdocs:
configuration: mkdocs.yml
fail_on_warning: false

# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: requirements/requirements-doc.txt
9 changes: 4 additions & 5 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
{
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
},
"python.testing.pytestArgs": [
"tests"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
},
"python.formatting.provider": "none"
"python.testing.pytestEnabled": true
}
9 changes: 9 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Copyright 2023 Nathan Sheffield

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
4 changes: 4 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
include README.md
include LICENSE.txt
include requirements/*
include pyproject.toml
36 changes: 34 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,35 @@
# Genomic interval toolkit
# Genomic interval machine learning (geniml)

Geniml is a python package for building machine learning models of genomic interval data (BED files). It also includes ancillary functions to support other types of analyses of genomic interval data.

Documentation is hosted at <https://docs.bedbase.org/geniml/>.


## Installation
### To install `geniml` use this commands.

Without specifying dependencies, the default dependencies will be installed,
which DO NOT include machine learning (ML) or heavy processing libraries.


From pypi:
```
pip install geniml
```
or install the latest version from the GitHub repository:
```
pip install git+https://github.com/databio/geniml.git
```

### To install Machine learning dependencies use this command:

From pypi:
```
pip install geniml[ml]
```


## Development

Run tests (from `/tests`) with `pytest`. Please read the [contributor guide](https://docs.bedbase.org/geniml/contributing/) to contribute.

You can find documentation in the `docs` subfolder.
43 changes: 5 additions & 38 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,8 @@
# Genomic interval toolkit
# Geniml documentation

## Introduction
The documentation for `geniml` is now part of BEDbase. You can find

`gitk` is a suite of tools for apply machine learning approaches to genomic interval data. It is organized as a set of modules that provide related functions, such as building HMMs, assessing genomic interval universes, calculating likelihoods of consensus genomic interval sets, and computing single-cell clusters.

## Install

```
pip install --user --upgrade .
```

## gitk modules

- [gitk/hmm](gitk/hmm) - Building HMMs
- [gitk/assess](gitk/assess) - Assess universe fit
- [gitk/likelihood](gitk/likelihood) - Calculate likelihood of universe
- [gitk/scembed](gitk/scembed) - Compute single-cell clusters from a cell-feature matrix using Word2Vec

## Using modules from Python

This repo is divided into modules. Each module should be written in a way that it provides utility as a Python library. For example, you can call functions in the `hmm` module like this:

```
import gitk

gitk.hmm.function()
```

## Command-line interfaces

In addition to being importable from Python, *some* modules also provide a CLI. For these, developers provide a subcommand for CLI use. The root `gitk` package provides a generalized command-line interface with the command `gitk`. The modules that provide CLIs then correspond to CLI commands, *e.g* `gitk hmm` or `gitk likelihood`, with the corresponding code contained within a sub-folder named after the model:

```
gitk <module> ...
```

This is implemented within each module folder with:

- `gitk/<module>/cli.py` - defines the command-line interface and provides a subparser for this module's CLI command.
- the rendered [documentation for geniml](https://docs.bedbase.org/geniml/).
- the [repository with the documentation source](https://github.com/databio/bedbase).

If you have any questions, please open an issue on [this repository](https://github.com/databio/geniml/issues) or on the [bedbase](https://github.com/databio/bedbase/issues) repository.
32 changes: 0 additions & 32 deletions docs/autodoc_build/gitk.md

This file was deleted.

7 changes: 0 additions & 7 deletions docs/changelog.md

This file was deleted.

Loading
Loading