Skip to content
This repository was archived by the owner on Nov 16, 2025. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
192 commits
Select commit Hold shift + click to select a range
13d5b17
change folder name
ixaxaar Sep 13, 2023
e0e646f
add requirements and classifiers
ixaxaar Sep 13, 2023
93a43b7
fix package name
ixaxaar Sep 13, 2023
ce5d960
update packages
ixaxaar Sep 13, 2023
2eedb41
bump version
ixaxaar Sep 13, 2023
ead8547
remove state and deploy from the docstrings
ixaxaar Sep 17, 2023
8b9cf55
add data formatting lambda
ixaxaar Sep 17, 2023
fa486b3
add lambda data mapper
ixaxaar Sep 22, 2023
958cff1
add proper docstrings
ixaxaar Sep 22, 2023
88b3709
upgrade geniusrise
ixaxaar Sep 22, 2023
579ba84
add data extractor lambda
ixaxaar Sep 22, 2023
3e35e12
readme
ixaxaar Sep 22, 2023
fa9672d
Merge pull request #1 from geniusrise/feat/autoconfig
ixaxaar Sep 22, 2023
603da6c
ci job
ixaxaar Sep 22, 2023
de328c4
add apache2 license
ixaxaar Sep 23, 2023
75e3228
bump version
ixaxaar Sep 23, 2023
a6bd170
add embeddings api
ixaxaar Oct 20, 2023
71b047a
add embeddings bulk
ixaxaar Oct 20, 2023
e8e14b5
add embeddings bulk
ixaxaar Oct 20, 2023
7c6db51
add embeddings bulk
ixaxaar Oct 20, 2023
c94e5f7
add base and optimizations
ixaxaar Oct 20, 2023
97653d1
format
ixaxaar Oct 20, 2023
50bdcc7
refactor
ixaxaar Oct 21, 2023
9b9eb2f
fine tune tests
ixaxaar Oct 21, 2023
0cb01ed
fine tune tests with lora
ixaxaar Oct 21, 2023
808afb2
fine tune tests with lora 4bit
ixaxaar Oct 22, 2023
bd05d58
fine tune tests with lora 4bit
ixaxaar Oct 22, 2023
ba32444
fine tune tests with lora 4bit gptq
ixaxaar Oct 22, 2023
295d42d
fine tune tests with lora 4bit gptq
ixaxaar Oct 22, 2023
25823bb
fine tune tests with lora 8bit
ixaxaar Oct 22, 2023
94dfa48
load models for apis
ixaxaar Oct 23, 2023
9912afa
all tests for apis
ixaxaar Oct 23, 2023
988c540
add embeddings tests
ixaxaar Oct 25, 2023
0235994
add embeddings bulk tests
ixaxaar Oct 25, 2023
ff0abe6
add classification fine tuning tests
ixaxaar Oct 26, 2023
ee5e027
refactor eval name
ixaxaar Oct 26, 2023
6cece8f
add tests for classification fine tuning
ixaxaar Oct 26, 2023
a7d9593
add bulk base class
ixaxaar Oct 26, 2023
820da76
add docs
ixaxaar Oct 26, 2023
ad2670c
add bulk classification
ixaxaar Oct 27, 2023
c510066
add api for classification
ixaxaar Oct 27, 2023
8510dc9
refactor variable names
ixaxaar Oct 27, 2023
c65625b
add models for clasification
ixaxaar Oct 28, 2023
6880fe1
fix broken tests
ixaxaar Oct 28, 2023
2ef6414
fix tests
ixaxaar Oct 29, 2023
5ba973f
tests for language model
ixaxaar Oct 29, 2023
c50edc5
tests for language model
ixaxaar Oct 29, 2023
5db4290
tests for language model
ixaxaar Oct 29, 2023
01198d5
fix type
ixaxaar Oct 29, 2023
6a7ca3d
add api and bulk
ixaxaar Oct 29, 2023
50c1248
rename huggingface to text
ixaxaar Oct 29, 2023
aa0a0ed
rename huggingface to text
ixaxaar Oct 29, 2023
977df30
rename huggingface to text
ixaxaar Oct 29, 2023
f06d0ec
fix small issues
ixaxaar Oct 30, 2023
28b07b4
add instruct tuning and tests
ixaxaar Oct 30, 2023
ec910f0
add docs
ixaxaar Oct 30, 2023
e5c4beb
add bulk instruction
ixaxaar Oct 30, 2023
7e74ff9
add instruction api
ixaxaar Oct 30, 2023
bd78855
change map lambda name
ixaxaar Oct 30, 2023
0ddff48
change map lambda name
ixaxaar Oct 30, 2023
f86ac6c
add fine tune for nli
ixaxaar Oct 31, 2023
82db362
add fine tune for nli
ixaxaar Oct 31, 2023
77aadb3
add fine tune for ner
ixaxaar Oct 31, 2023
fece3e5
add fine tune for ner
ixaxaar Oct 31, 2023
43262b8
add fine tune for translation
ixaxaar Oct 31, 2023
2315c15
add fine tune for translation
ixaxaar Oct 31, 2023
163a5c8
add summarization fine tuning
ixaxaar Nov 1, 2023
7658b6a
add qa fine tuning
ixaxaar Nov 1, 2023
276da24
geenrate and fix bulk tasks
ixaxaar Nov 2, 2023
3a0f438
add more models to test later
ixaxaar Nov 9, 2023
fefab12
fix imports
ixaxaar Nov 19, 2023
46bcae4
argument parsing to support python objects
ixaxaar Nov 19, 2023
2b8462f
fixes in api
ixaxaar Nov 20, 2023
70fd479
fixes in api
ixaxaar Nov 20, 2023
7018be2
fix all issues
ixaxaar Nov 20, 2023
f85cd88
Merge branch 'master' of github.com:geniusrise/geniusrise-text
ixaxaar Nov 20, 2023
a71a831
lm api tests
ixaxaar Nov 21, 2023
c313109
do qa for bulk chat
ixaxaar Nov 21, 2023
a9db381
qa class apis
ixaxaar Nov 22, 2023
b3ddd02
qa lm bulk
ixaxaar Nov 22, 2023
1d90699
qa for api for nli
ixaxaar Nov 23, 2023
bd024ba
qa for bulk for txtclass
ixaxaar Nov 24, 2023
cc101eb
api for qa and table qa
ixaxaar Nov 24, 2023
97d1ce4
bulk for qa and table qa
ixaxaar Nov 25, 2023
67eb273
api qa for summarization
ixaxaar Nov 25, 2023
d264aaf
api and qa for translation
ixaxaar Nov 25, 2023
dce84a6
api for ner
ixaxaar Nov 26, 2023
b08e88c
cleanup
ixaxaar Nov 26, 2023
c9dd3c9
bulk for summz
ixaxaar Nov 26, 2023
a410cc2
pass max length
ixaxaar Nov 26, 2023
a6e9192
bulk for translation
ixaxaar Nov 26, 2023
c746527
add imports
ixaxaar Nov 26, 2023
1b8c686
add imports
ixaxaar Nov 26, 2023
2d951e6
bulk for nli
ixaxaar Nov 26, 2023
ac26f76
api done
ixaxaar Nov 26, 2023
31f0c89
bulk done for ner
ixaxaar Nov 26, 2023
b0f2ece
bulk done for translation
ixaxaar Nov 26, 2023
78b2070
done all
ixaxaar Nov 26, 2023
98c3f02
add bulk testing dummy data
ixaxaar Nov 26, 2023
91b9f1c
refactor names
ixaxaar Nov 27, 2023
eacf205
override cherrypy's default error responses
ixaxaar Dec 5, 2023
66025fe
make basic auth creds work
ixaxaar Dec 5, 2023
eca2582
package
ixaxaar Dec 7, 2023
aeef19d
package
ixaxaar Dec 7, 2023
acbacd2
add support for AWQ and upgrade pytorch
ixaxaar Dec 26, 2023
de11ed4
add support for flash attention
ixaxaar Dec 26, 2023
147e5ed
generate docs for trans
ixaxaar Dec 26, 2023
5dc0147
generate docs for summz
ixaxaar Dec 26, 2023
87371c4
generate docs for qa
ixaxaar Dec 26, 2023
7400458
generate docs for nli
ixaxaar Dec 27, 2023
c06742a
generate docs for ner
ixaxaar Dec 27, 2023
4d15dfd
generate docs for lm
ixaxaar Dec 27, 2023
7b3e390
generate docs for chat
ixaxaar Dec 27, 2023
a8a1c25
generate docs for txtclass
ixaxaar Dec 27, 2023
d76c956
generate docs for base classes
ixaxaar Dec 27, 2023
1915fae
make concurrent requests serial
ixaxaar Dec 28, 2023
80c6ff2
add capability to send emails after bulk tasks
ixaxaar Dec 29, 2023
33fbc3f
switch to legacy resolver
ixaxaar Dec 29, 2023
fc13277
add pipeline apis
ixaxaar Jan 12, 2024
ca11b16
add hf hub data
ixaxaar Jan 12, 2024
c174c9e
add more args and email at end of fine tuning
ixaxaar Jan 12, 2024
8453fed
expose all underlying sampling methods
ixaxaar Jan 13, 2024
4ac5651
expose all underlying sampling methods
ixaxaar Jan 13, 2024
2ca2d00
update peft layers
ixaxaar Jan 13, 2024
b38a46b
add torch jit compile
ixaxaar Jan 13, 2024
dab6def
fix classes fine tuning
ixaxaar Jan 15, 2024
ca1e09a
fix cors domain bug
ixaxaar Jan 16, 2024
772263d
end to end fine tune
ixaxaar Jan 16, 2024
01ace51
Merge branch 'master' of github.com:geniusrise/geniusrise-text
ixaxaar Jan 16, 2024
c16e976
add lr
ixaxaar Jan 16, 2024
6b751b4
isort
ixaxaar Jan 16, 2024
feb3a0a
fix fine tune for nli
ixaxaar Jan 17, 2024
12c5d0a
Merge branch 'master' of github.com:geniusrise/geniusrise-text
ixaxaar Jan 17, 2024
51a165b
fix fine tune for nli
ixaxaar Jan 17, 2024
bf7abbf
minor fixes
ixaxaar Jan 17, 2024
1ec319f
add docs
ixaxaar Jan 18, 2024
4c85fd2
fix map_data and lora config
ixaxaar Jan 18, 2024
84cf9d4
add notebook bolt
ixaxaar Jan 18, 2024
d4571fd
fixes
ixaxaar Jan 18, 2024
ae03d6c
fixes
ixaxaar Jan 19, 2024
8796b5d
fixes
ixaxaar Jan 19, 2024
6ff1d86
Merge pull request #1 from geniusrise/feat/notebook
ixaxaar Jan 19, 2024
dc8b136
install nbformat
ixaxaar Jan 19, 2024
2d5e842
add notebooks
ixaxaar Jan 19, 2024
6aecddd
clean dependencies
ixaxaar Jan 21, 2024
1e11493
fix ner fine tuning
ixaxaar Jan 22, 2024
2bf1a61
add tutorial-style to notebooks
ixaxaar Jan 24, 2024
945a177
fix ner tags
ixaxaar Jan 27, 2024
1920ae5
bulk speech to text working
ixaxaar Jan 29, 2024
f47aec6
Merge pull request #2 from geniusrise/feat/clean_deps
ixaxaar Jan 29, 2024
a899446
fix requirements and add dev reqs
ixaxaar Jan 29, 2024
80db784
fix deps
ixaxaar Jan 29, 2024
dcf016e
fix notebooks plugin installation
ixaxaar Jan 30, 2024
f4e49c0
add audio
ixaxaar Jan 30, 2024
f2569ac
remove audio from reqs
ixaxaar Feb 5, 2024
a78ff00
Update README.md
ixaxaar Feb 6, 2024
33cf0fc
add openapi docs
ixaxaar Feb 8, 2024
a168b68
Merge branch 'master' of github.com:geniusrise/geniusrise-text
ixaxaar Feb 9, 2024
be53d25
update docs
ixaxaar Feb 19, 2024
60120d3
update loading of local models
ixaxaar Feb 19, 2024
d377a89
use apache 2.0 license
ixaxaar Feb 19, 2024
05feef8
fix dependencies
ixaxaar Feb 19, 2024
d9996f8
bump version
ixaxaar Feb 19, 2024
859e651
better transformers
ixaxaar Feb 19, 2024
82c4e4b
fix pypi license
ixaxaar Feb 19, 2024
576264d
fix pypi license fuuu and add prometheus
ixaxaar Feb 19, 2024
02a18bd
fix psycopg2 compilation requirement and default compilation
ixaxaar Feb 19, 2024
bafc772
bump version
ixaxaar Feb 19, 2024
47ec257
upgrade geniusrise and bump version
ixaxaar Feb 20, 2024
9fa4893
upgrade geniusrise and bump version
ixaxaar Feb 20, 2024
3db92a6
add vllm to chat
ixaxaar Feb 21, 2024
daf6ae4
fix single thread inference
ixaxaar Feb 21, 2024
e3f6a7e
vllm api working
ixaxaar Feb 22, 2024
24761b3
vllm bulk working
ixaxaar Feb 22, 2024
7f79718
vllm bulk working
ixaxaar Feb 22, 2024
2d9fd00
add docs
ixaxaar Feb 22, 2024
8cdcee3
vllm lm api
ixaxaar Feb 23, 2024
1540162
add vllm to lm
ixaxaar Feb 24, 2024
9ca6db8
Merge pull request #3 from geniusrise/feat/vllm
ixaxaar Feb 24, 2024
9570470
integrate llama.cpp
ixaxaar Feb 24, 2024
661740e
fix nli and translation issues with certain models
ixaxaar Feb 26, 2024
75426f0
test chat api for llama
ixaxaar Feb 26, 2024
f4599a4
test lm api for llama
ixaxaar Feb 26, 2024
571fb15
test chat bulk for llama
ixaxaar Feb 26, 2024
54eeb2b
test lm bulk for llama
ixaxaar Feb 26, 2024
505c56a
Merge pull request #4 from geniusrise/feat/llama.cpp
ixaxaar Feb 26, 2024
f92f627
update docs
ixaxaar Feb 26, 2024
12c7479
fix deps
ixaxaar Feb 27, 2024
b8257b3
fix numpy compiled version mismatch with flash attn
ixaxaar Feb 28, 2024
6cd6db2
adds support for mem issue and cleans Dockerfile
jalotra Feb 28, 2024
f67949c
moving back, due to flash-attn
jalotra Feb 28, 2024
73f6e5a
moving requirements back
jalotra Feb 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
__pycache__
venv
package
build
56 changes: 56 additions & 0 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
name: Run Pytest

on:
pull_request:
branches:
- master

jobs:
test:
runs-on: ubuntu-latest
env:
ENV: dev
LOGLEVEL: DEBUG
WANDB_DISABLED: true
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
OPENAI_ORGANIZATION: ${{ secrets.OPENAI_ORGANIZATION }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_API_TYPE: open_ai
OPENAI_API_BASE_URL: https://api.openai.com/v1
OPENAI_API_VERSION: ${{ secrets.OPENAI_API_VERSION }}
HUGGINGFACE_ACCESS_TOKEN: ${{ secrets.HUGGINGFACE_ACCESS_TOKEN }}
PALM_KEY: ${{ secrets.PALM_KEY }}

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: "3.x"

- name: Install Docker Compose
run: sudo apt-get install docker-compose

- name: Start services
run: docker-compose up -d

- name: Install dependencies
run: |
pip install -r requirements.txt
pip install coverage
pip install pytest

- name: Run tests with coverage
run: |
coverage run -m pytest -vv --log-cli-level=ERROR ./tests/

- name: Generate coverage report
run: coverage report

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v2
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
13 changes: 13 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,19 @@ repos:
- id: black
args: [--line-length=120]

- repo: https://github.com/Lucas-C/pre-commit-hooks
rev: v1.1.12
hooks:
- id: insert-license
name: "Insert license header in C++ source files"
args:
[
--license-filepath=assets/header.txt,
"--comment-style=#",
--detect-license-in-X-top-lines=2,
]
types_or: [python, makefile, dockerfile]

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.1.0
hooks:
Expand Down
14 changes: 14 additions & 0 deletions .pypirc
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[distutils]
index-servers =
pypi
local

[pypi]
username = __token__
password = pypi-AgEIcHlwaS5vcmcCJDBjMjQ0MWMyLWZlNjYtNGJkYS1iMTQyLTUwYTVhODM1NTkyZAACKlszLCIwOGU4ZGFjYS1jZTJlLTQzNGYtYTFkMi03ZGRlNDBmZmJmZTgiXQAABiDllKewzbF_OAnOrY1yuMdEG6yTLvrIVJrma5SNz0cgRA


[local]
repository: https://pypi.setu.co/
username: infra
password: c2RramN2Ymd3ZHljdmFzcXEK
38 changes: 38 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
FROM nvidia/cuda:12.2.0-devel-ubuntu20.04 AS base

WORKDIR /app

ENV DEBIAN_FRONTEND=noninteractive
RUN useradd --create-home genius

RUN apt-get update \
&& apt-get install -y software-properties-common build-essential curl wget vim git libpq-dev pkg-config \
&& add-apt-repository ppa:deadsnakes/ppa \
&& apt-get update \
&& apt-get install -y python3.10 python3.10-dev python3.10-distutils \
&& apt-get clean
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py \
&& python3.10 get-pip.py

RUN apt-get update && apt-get install -y git && apt-get clean

RUN pip install --no-cache-dir torch
RUN pip install --no-cache-dir jupyterlab
RUN pip install --no-cache-dir transformers
RUN pip install --no-cache-dir datasets
RUN pip install --no-cache-dir diffusers
RUN pip install --no-cache-dir --upgrade geniusrise

ENV AWS_DEFAULT_REGION=ap-south-1
ENV AWS_SECRET_ACCESS_KEY=
ENV AWS_ACCESS_KEY_ID=
ENV HUGGINGFACE_ACCESS_TOKEN=
ENV GENIUS=/home/genius/.local/bin/genius

COPY --chown=genius:genius . /app/

RUN pip3.10 install --no-cache-dir --use-deprecated=legacy-resolver -r requirements.txt
RUN pip install --no-cache-dir numpy==1.26.3
USER genius

CMD ["genius", "--help"]
51 changes: 51 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/

TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

1. Definitions.

"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.

"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.

"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.

"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.

"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.

"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.

"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).

"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.

"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."

"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.

2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.

3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.

4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:

You must give any other recipients of the Work or Derivative Works a copy of this License; and
You must cause any modified files to carry prominent notices stating that You changed the files; and
You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.

5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.

6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.

7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.

8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.

9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.

END OF TERMS AND CONDITIONS
130 changes: 52 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,78 +1,52 @@
![banner](./assets/banner.jpg)

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->

# Huggingface Bolts

This is a collection of generic streaming and (micro) batch bolts interfacing
with the huggingface ecosystem.

**Table of Contents**

- [Huggingface Bolts](#huggingface-bolts)
- [Usage](#usage)
- [Usage](#usage-1)
- [Text Classification](#text-classification)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

Includes:

| No. | Name | Description | Input Type | Output Type |
| --- | ----------------------------------------------------- | ---------------------------------------------- | ---------- | ----------- |
| 1 | [Text Classification](#text-classification) | Fine-tuning for text classification tasks | Batch | Batch |
| 2 | [Instruction Tuning](#instruction-tuning) | Fine-tuning for instruction tuning tasks | Batch | Batch |
| 3 | [Commonsense Reasoning](#commonsense-reasoning) | Fine-tuning for commonsense reasoning tasks | Batch | Batch |
| 4 | [Language Modeling](#language-modeling) | Fine-tuning for language modeling tasks | Batch | Batch |
| 5 | [Named Entity Recognition](#named-entity-recognition) | Fine-tuning for named entity recognition tasks | Batch | Batch |
| 6 | [Question Answering](#question-answering) | Fine-tuning for question answering tasks | Batch | Batch |
| 7 | [Sentiment Analysis](#sentiment-analysis) | Fine-tuning for sentiment analysis tasks | Batch | Batch |
| 8 | [Summarization](#summarization) | Fine-tuning for summarization tasks | Batch | Batch |
| 9 | [Translation](#translation) | Fine-tuning for translation tasks | Batch | Batch |

## Usage

To test, first bring up all related services via the supplied docker-compose:

```bash
docker compose up -d
docker compose logs -f
```

These management consoles will be available:

| Console | Link |
| -------- | ---------------------- |
| Kafka UI | http://localhost:8088/ |

Postgres can be accessed with:

```bash
docker exec -it geniusrise-postgres-1 psql -U postgres
```

## Usage

### Text Classification

To fine-tune a model for text classification tasks, you can use the following
command:

```bash
genius HuggingFaceClassificationFineTuner rise \
batch \
--input_folder my_dataset \
streaming \
--output_kafka_topic my_topic \
--output_kafka_cluster_connection_string localhost:9094 \
postgres \
--postgres_host 127.0.0.1 \
--postgres_port 5432 \
--postgres_user postgres \
--postgres_password postgres \
--postgres_database geniusrise \
--postgres_table state \
load_dataset \
--args
```
![logo_with_text](https://github.com/geniusrise/.github/assets/144122/2f8e51ee-0fcd-4f74-90fd-97301ef7943d)

### AI Ecosystem

<h3 align="center">
<a style="color:#f34960" href="https://docs.geniusrise.ai">Documentation</a>
||
<a style="color:#f34960" href="https://github.com/geniusrise/examples">Examples</a>
||
<a style="color:#f34960" href="https://geniusrise.com">Cloud</a>
</h3>

### <span style="color:#e667aa">About</span>

<span style="color:#e4e48c">Geniusrise</span> is a modular, loosely-coupled
MLOps framework designed for the era of Large Language Models,
offering flexibility and standardization in designing networks of
AI agents.

It defines components and orchestrates them providing observability, state management and data handling,
all while supporting diverse infrastructures. With its modular and unopinonated architecture,
<span style="color:#e4e48c">Geniusrise</span> empowers teams to build, share,
and deploy AI across various platforms.

Geniusrise is powered by its components:

- [geniusrise-text](https://github.com/geniusrise/geniusrise-text): Text components offerring:
- Inference APIs
- Bulk inference
- Fine-tuning
- [geniusrise-audio](https://github.com/geniusrise/geniusrise-audio): Audio components offerring:
- Inference APIs
- Bulk inference
- Fine-tuning
- [geniusrise-vision](https://github.com/geniusrise/geniusrise-vision): Vision components offerring:
- Inference APIs
- Bulk inference
- Fine-tuning
- [geniusrise-listeners](https://github.com/geniusrise/geniusrise-listeners): Streaming data ingestion
- [geniusrise-databases](https://github.com/geniusrise/geniusrise-databases): Bulk data ingestion

### <span style="color:#e667aa">Links</span>

- **Website**: [geniusrise.ai](https://geniusrise.ai)
- **Docs**: [docs.geniusrise.ai](https://docs.geniusrise.ai)
- **Examples**: [geniusrise/examples](https://github.com/geniusrise/examples)
- **Cloud**: [geniusrise.com](https://geniusrise.com)

# Text Components

These are text components, mainly focused around models of the text modality (both input and output).
This also includes large language models.
14 changes: 14 additions & 0 deletions assets/header.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
🧠 Geniusrise
Copyright (C) 2023 geniusrise.ai

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Binary file added assets/logo_with_text.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions data/chat/chat.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
instruction
<|system|>\n</s>\n<|user|>\nHow do I sort a list in Haskell?</s>\n<|assistant|>
<|system|>\n</s>\n<|user|>\nHow do I sort a list in Python?</s>\n<|assistant|>
2 changes: 2 additions & 0 deletions data/lm/lm.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
text
<|system|>\n</s>\n<|user|>\nHow do I sort a list in Python?</s>\n<|assistant|>
4 changes: 4 additions & 0 deletions data/ner/ner.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
text
"My name is Clara and I live in Berkeley, California."
"My name is Clara and I live in Berkeley, California and has sever back pain and liver cirrhosis."
"My name is Clara and I live in Berkeley, California and i deal with sulfuric acid."
Loading