Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
787bba0
Attempt to use multirepo-mkdocs and mkdocstrings
aversey Nov 19, 2025
c57819b
Add dark theme
aversey Nov 19, 2025
bdb764e
Style fixes
aversey Nov 19, 2025
b921825
Import API nav
aversey Nov 20, 2025
4e233fa
Fix spacing of nested nav items
aversey Nov 20, 2025
bd5b70f
Hide full paths
aversey Nov 20, 2025
666b0c6
Prepare for multirepo builds
aversey Nov 20, 2025
efdbed0
Add markdownlint config and fixes
aversey Nov 20, 2025
675bb2f
Fix
aversey Nov 20, 2025
58386ba
Place a sentence per line
aversey Nov 20, 2025
86ccc4c
Fix CSS
aversey Nov 20, 2025
d4c67d5
Fix macro
aversey Nov 20, 2025
3eb9946
Fix dark theme CSS
aversey Nov 20, 2025
215852b
More fixes
aversey Nov 20, 2025
a8d33ba
Upodate markdownlint action
aversey Nov 20, 2025
21549cc
Fix tables
aversey Nov 20, 2025
de1052e
Fix release action
aversey Nov 20, 2025
2815f81
Add repo
aversey Nov 20, 2025
03798a5
Merge remote-tracking branch 'main/main' into branch-4.7
aversey Nov 20, 2025
b945952
Fix header z-index
aversey Nov 21, 2025
8c6fb57
Fix mobile
aversey Nov 21, 2025
0c121a4
Fix search overlay
aversey Nov 21, 2025
3f4cf62
Fix sidebar overlay
aversey Nov 21, 2025
dc9b8fb
Fix branch
aversey Nov 21, 2025
3e06c74
Update GitHub links
aversey Nov 28, 2025
269007b
Start fixing {{{hopsworks_version}}} links
aversey Nov 28, 2025
028e4c5
Fix local links
aversey Dec 2, 2025
36ff8c2
Use https
aversey Dec 2, 2025
9fd061e
Fix links
aversey Dec 2, 2025
4d1c6f3
Update deps
aversey Dec 2, 2025
61ff6da
Fix deps
aversey Dec 2, 2025
e9a7b0a
Fix mkdocs.yaml
aversey Dec 2, 2025
48509f3
Copy CONTRIBUTING.md
aversey Dec 5, 2025
2e94f0a
Fix style
aversey Dec 5, 2025
28c08ad
Remove redandant code
aversey Dec 5, 2025
b1ef12c
Fix links
aversey Dec 5, 2025
80c2955
Fix pydantic inv
aversey Dec 5, 2025
05c8310
Fix more links
aversey Dec 5, 2025
21b9375
Fix a link
aversey Dec 5, 2025
7ffb342
Fix missing docs
aversey Dec 5, 2025
4c6062c
Remove redundant, fix spine
aversey Dec 5, 2025
5813978
Fix annotations and polars imports
aversey Dec 5, 2025
1d5d94b
Final change
aversey Dec 5, 2025
67d9ea5
Final fixes
aversey Dec 5, 2025
1cdf78b
Fix z-index back
aversey Dec 5, 2025
e46de46
Pin versions
aversey Dec 8, 2025
81a4ee2
Update deps
aversey Dec 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 58 additions & 9 deletions .github/workflows/mkdocs-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ name: mkdocs-release
on:
push:
branches: [branch-*\.*]
repository_dispatch:
types:
- trigger-rebuild

concurrency:
group: ${{ github.workflow }}
Expand All @@ -13,25 +16,71 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Extract branch name (push)
if: ${{ github.event_name == 'push' }}
run: echo "BRANCH=${GITHUB_REF#refs/heads/}" >> "$GITHUB_ENV"

- name: Extract branch name (repository_dispatch)
if: ${{ github.event_name == 'repository_dispatch' }}
run: echo "BRANCH=${{ github.event.client_payload.branch }}" >> "$GITHUB_ENV"

- name: Extract version from branch name
run: echo "HOPSWORKS_VERSION=${BRANCH#branch-}" >> "$GITHUB_ENV"

- name: Checkout main repo
uses: actions/checkout@v4
with:
fetch-depth: 0
ref: ${{ env.BRANCH }}

- name: Checkout the API repo
uses: actions/checkout@v4
with:
repository: logicalclocks/hopsworks-api
ref: ${{ env.BRANCH }}
path: hopsworks-api

- name: Cache local Maven repository
uses: actions/cache@v4
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-${{ hashFiles('java/pom.xml') }}
restore-keys: |
${{ runner.os }}-maven-

- name: Set up JDK 8
uses: actions/setup-java@v5
with:
java-version: "8"
distribution: "adopt"

- name: Build javadoc documentation
working-directory: hopsworks-api/java
run: mvn clean install javadoc:javadoc javadoc:aggregate -DskipTests && cp -r target/site/apidocs ../../docs/javadoc

- uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install ubuntu dependencies
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
activate-environment: true
working-directory: hopsworks-api/python

- name: install deps
run: pip3 install -r requirements-docs.txt
- name: Install Python API dependencies
run: uv sync --extra dev --group docs --project hopsworks-api/python

- name: Install Python dependencies
run: uv pip install -r requirements-docs.txt

- name: Install Ubuntu dependencies
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev

- name: setup git
- name: Setup git for mike
run: |
git config --global user.name Mike
git config --global user.email [email protected]

# Put this back and increment version when cutting a new release branch
# - name: mike deploy docs
# run: mike deploy 3.0 latest -u --push
- name: Deploy the docs with mike
run: mike deploy ${HOPSWORKS_VERSION} latest -u --push
61 changes: 50 additions & 11 deletions .github/workflows/mkdocs-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,25 +12,59 @@ jobs:
with:
fetch-depth: 0

- name: Checkout the API repo
uses: actions/checkout@v4
with:
repository: logicalclocks/hopsworks-api
ref: ${{ github.base_ref }}
path: hopsworks-api

- name: Markdownlint
uses: DavidAnson/markdownlint-cli2-action@v21
with:
globs: '**/*.md'

- name: Cache local Maven repository
uses: actions/cache@v4
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-${{ hashFiles('java/pom.xml') }}
restore-keys: |
${{ runner.os }}-maven-

- name: Set up JDK 8
uses: actions/setup-java@v5
with:
java-version: "8"
distribution: "adopt"

- name: Build javadoc documentation
working-directory: hopsworks-api/java
run: mvn clean install javadoc:javadoc javadoc:aggregate -DskipTests && cp -r target/site/apidocs ../../docs/javadoc

- uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install ubuntu dependencies
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
activate-environment: true
working-directory: hopsworks-api/python

- name: install deps
run: pip3 install -r requirements-docs.txt
- name: Install Python API dependencies
run: uv sync --extra dev --group docs --project hopsworks-api/python

- name: setup git
run: |
git config --global user.name Mike
git config --global user.email [email protected]
- name: Install Python dependencies
run: uv pip install -r requirements-docs.txt

- name: Install Ubuntu dependencies
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev

- name: test broken links
- name: Check for broken links
run: |
# run the server
mkdocs serve > /dev/null 2>&1 &
mkdocs serve > /dev/null 2>&1 &
SERVER_PID=$!
echo "mk server in PID $SERVER_PID"
# Give enough time for deployment
Expand All @@ -41,5 +75,10 @@ jobs:
# If ok just kill the server
kill -9 $SERVER_PID

- name: mike deploy docs
- name: Setup git for mike
run: |
git config --global user.name Mike
git config --global user.email [email protected]

- name: Generate the docs with mike
run: mike deploy 3.2-SNAPSHOT dev -u
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,4 @@ target/
# Mac
.DS_Store

/temp_dir
8 changes: 8 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
MD041: false
MD013: false
MD033: false
MD045: false
MD046: false
MD052: false
MD004:
style: dash
60 changes: 42 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,69 @@
# Documentation landing page
# Hopsworks Documentation

This is the source of the landing page for https://docs.hopsworks.ai
This is the source of the Hopsworks Documentation published at <https://docs.hopsworks.ai>.

## Build instructions

### Step 1: Setup python environment
We use `mkdocs` together with [`mike`]((https://github.com/jimporter/mike/) for versioning to build the documentation.
We also use this two main mkdocs plugins: [`mkdocstrings`](https://mkdocstrings.github.io/) and [its Python handler](https://mkdocstrings.github.io/python/), and [`mkdocs-material`](https://squidfunk.github.io/mkdocs-material/) as the theme.

Create a python 3.10 environment, using a python environment manager of your own choosing. For example `virtualenv` or `anaconda`.
**Background about `mike`:**
`mike` builds the documentation and commits it as a new directory to the `gh-pages` branch.
Each directory corresponds to one version of the documentation.
Additionally, `mike` maintains a json in the root of `gh-pages` with the mappings of versions/aliases for each of the directories available.
With aliases, you can define extra names like `dev` or `latest`, to indicate stable and unstable releases.

### Step 2
### Versioning on docs.hopsworks.ai

On docs.hopsworks.ai we implement the following versioning scheme:

- the latest release: rendered with full current version, e.g. **4.4 [latest]** with `latest` alias to indicate that this is the latest stable release.
- previous stable releases: rendered without alias, e.g. **4.3**.

### Step 1

Clone this repository
Clone this repository:

```bash
git clone https://github.com/logicalclocks/logicalclocks.github.io.git
```

### Step 3

Install the required dependencies to build the documentation in the python environment created in the previous step.
### Step 2

**Note that {PY_ENV} is the path to your python environment.**
Create a python virtual environment to build the documentation:

```bash
cd logicalclocks.github.io
{PY_ENV}/bin/pip3 install -r requirements-docs.txt
uv venv
uv pip install -r requirements-docs.txt
# Install hopsworks-api for gathering docstrings for the API reference
uv pip install git+https://github.com/logicalclocks/hopsworks-api.git@main#subdirectory=python
```

### Step 4
Alternatively, you can just activate the virtual environment you use for development of `hopsworks-api` (obtained via `uv sync`), this is the way it is done in the actions.
Namely, in `.github/workflows/mkdocs-release.yml` and `.github/workflows/mkdocs-test.yml`, the `hopsworks-api` repo is cloned, and its uv virtual environment is used with `dev` extra and all development groups.

Use mkdocs to build the documentation and serve it locally
A callback is set in `hopsworks-api` GitHub Actions, which triggers `.github/workflows/mkdocs-release.yml` on any pushes to release branches (that is, `branch-x.x`).

### Step 3

Build and serve the docs using mike.

```bash
{PY_ENV}/bin/mkdocs serve
# Use the current version instead of 4.4:
mike deploy 4.4 latest --update-alias
# Next, serve the docs to access them locally:
mike serve
```

The documentation should now be available locally on the following URL: http://127.0.0.1:8000/
**Important**: The first time you serve the docs, you have to choose a default version, as follows:

```bash
mike set-default latest
```

## Adding new pages

The `mkdocs.yml` file of this repository defines the pages to show in the navigation.
The `mkdocs.yml` file of this repository defines the pages to show in the navigation.
After adding your new page in the docs folder, you also need to add it to this file for it to show up in the navigation.

## Checking links
Expand All @@ -56,4 +80,4 @@ linkchecker http://127.0.0.1:8000/

# If ok just kill the server
kill -9 $SERVER_PID
```
```
42 changes: 27 additions & 15 deletions docs/concepts/dev/inside.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,46 @@
Hopsworks provides a complete self-service development environment for feature engineering and model training. You can develop programs as Jupyter notebooks or jobs, customize the bundled FTI (feature, training and inference pipeline) python environments, you can manage your source code with Git, and you can orchestrate jobs with Airflow.

<img src="../../../assets/images/concepts/dev/dev-inside.svg">
Hopsworks provides a complete self-service development environment for feature engineering and model training.
You can develop programs as Jupyter notebooks or jobs, customize the bundled FTI (feature, training and inference pipeline) python environments, you can manage your source code with Git, and you can orchestrate jobs with Airflow.

<img src="../../../assets/images/concepts/dev/dev-inside.svg" alt="Hopsworks Development Environment" />

### Jupyter Notebooks

Hopsworks provides a Jupyter notebook development environment for programs written in Python, Spark, Flink, and SparkSQL. You can also develop in your IDE (PyCharm, IntelliJ, etc), test locally, and then run your programs as Jobs in Hopsworks. Jupyter notebooks can also be run as Jobs.
Hopsworks provides a Jupyter notebook development environment for programs written in Python, Spark, Flink, and SparkSQL.
You can also develop in your IDE (PyCharm, IntelliJ, etc), test locally, and then run your programs as Jobs in Hopsworks.
Jupyter notebooks can also be run as Jobs.

### Source Code Control

Hopsworks provides source code control support using Git (GitHub, GitLab or BitBucket). You can securely checkout code into your project and commit and push updates to your code to your source code repository.
Hopsworks provides source code control support using Git (GitHub, GitLab or BitBucket).
You can securely check out code into your project and commit and push updates to your code to your source code repository.

### FTI Pipeline Environments

Hopsworks postulates that building ML systems following the FTI pipeline architecture is best practice. This architecture consists of three independently developed and operated ML pipelines:
Hopsworks postulates that building ML systems following the FTI pipeline architecture is best practice.
This architecture consists of three independently developed and operated ML pipelines:

* Feature pipeline: takes as input raw data that it transforms into features (and labels)
* Training pipeline: takes as input features (and labels) and outputs a trained model
* Inference pipeline: takes new feature data and a trained model and makes predictions
- Feature pipeline: takes as input raw data that it transforms into features (and labels)
- Training pipeline: takes as input features (and labels) and outputs a trained model
- Inference pipeline: takes new feature data and a trained model and makes predictions

In order to facilitate the development of these pipelines Hopsworks bundles several python environments containing necessary dependencies. Each of these environments may then also be customized further by cloning it and installing additional dependencies from PyPi, Conda channels, Wheel files, GitHub repos or a custom Dockerfile. Internal compute such as Jobs and Jupyter is run in one of these environments and changes are applied transparently when you install new libraries using our APIs. That is, there is no need to write a Dockerfile, users install libraries directly in one or more of the environments. You can setup custom development and production environments by creating separate projects or creating multiple clones of an environment within the same project.
In order to facilitate the development of these pipelines Hopsworks bundles several python environments containing necessary dependencies.
Each of these environments may then also be customized further by cloning it and installing additional dependencies from PyPi, Conda channels, Wheel files, GitHub repos or a custom Dockerfile.
Internal compute such as Jobs and Jupyter is run in one of these environments and changes are applied transparently when you install new libraries using our APIs.
That is, there is no need to write a Dockerfile, users install libraries directly in one or more of the environments.
You can setup custom development and production environments by creating separate projects or creating multiple clones of an environment within the same project.

### Jobs

In Hopsworks, a Job is a schedulable program that is allocated compute and memory resources. You can run a Job in Hopsworks:
In Hopsworks, a Job is a schedulable program that is allocated compute and memory resources.
You can run a Job in Hopsworks:

* From the UI
* Programmatically with the Hopsworks SDK (Python, Java) or REST API
* From Airflow programs (either inside our outside Hopsworks)
* From your IDE using a plugin ([PyCharm/IntelliJ plugin](https://plugins.jetbrains.com/plugin/15537-hopsworks))
- From the UI
- Programmatically with the Hopsworks SDK (Python, Java) or REST API
- From Airflow programs (either inside our outside Hopsworks)
- From your IDE using a plugin ([PyCharm/IntelliJ plugin](https://plugins.jetbrains.com/plugin/15537-hopsworks))

### Orchestration

Airflow comes out-of-the box with Hopsworks, but you can also use an external Airflow cluster (with the Hopsworks Job operator) if you have one. Airflow can be used to schedule the execution of Jobs, individually or as part of Airflow DAGs.
Airflow comes out-of-the box with Hopsworks, but you can also use an external Airflow cluster (with the Hopsworks Job operator) if you have one.
Airflow can be used to schedule the execution of Jobs, individually or as part of Airflow DAGs.
7 changes: 5 additions & 2 deletions docs/concepts/dev/outside.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
You can write programs that use Hopsworks in any [Python, Spark, PySpark, or Flink environment](../../user_guides/integrations/index.md). Hopsworks also running SQL queries to compute features in external data warehouses. The Feature Store can also be queried with SQL.
You can write programs that use Hopsworks in any [Python, Spark, PySpark, or Flink environment](../../user_guides/integrations/index.md).
Hopsworks also running SQL queries to compute features in external data warehouses.
The Feature Store can also be queried with SQL.

There is REST API for Hopsworks that can be used with a valid API key, generated in Hopsworks. However, it is often easier to develop your programs against SDKs available in Python and Java/Scala for HSFS, in Python for HSML, and in Python for the Hopsworks API.
There is REST API for Hopsworks that can be used with a valid API key, generated in Hopsworks.
However, it is often easier to develop your programs against SDKs available in Python and Java/Scala for HSFS, in Python for HSML, and in Python for the Hopsworks API.

<img src="../../../assets/images/concepts/dev/dev-outside.svg">
7 changes: 5 additions & 2 deletions docs/concepts/fs/feature_group/external_fg.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
External feature groups are offline feature groups where their data is stored in an external table. An external table requires a data source, defined with the Connector API (or more typically in the user interface), to enable HSFS to retrieve data from the external table. An external feature group doesn't allow for offline data ingestion or modification; instead, it includes a user-defined SQL string for retrieving data. You can also perform SQL operations, including projections, aggregations, and so on. The SQL query is executed on-demand when HSFS retrieves data from the external Feature Group, for example, when creating training data using features in the external table.
External feature groups are offline feature groups where their data is stored in an external table.
An external table requires a data source, defined with the Connector API (or more typically in the user interface), to enable HSFS to retrieve data from the external table.
An external feature group doesn't allow for offline data ingestion or modification; instead, it includes a user-defined SQL string for retrieving data.
You can also perform SQL operations, including projections, aggregations, and so on.
The SQL query is executed on-demand when HSFS retrieves data from the external Feature Group, for example, when creating training data using features in the external table.

In the image below, we can see that HSFS currently supports a large number of data sources, including any JDBC-enabled source, Snowflake, Data Lake, Redshift, BigQuery, S3, ADLS, GCS, RDS, and Kafka

<img src="../../../../assets/images/concepts/fs/fg-connector-api.svg">

5 changes: 2 additions & 3 deletions docs/concepts/fs/feature_group/feature_monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,12 @@ HSFS supports monitoring features on your Feature Group by:

## Scheduled Statistics

After creating a Feature Group in HSFS, you can setup statistics monitoring to compute statistics over one or more features on a scheduled basis. Statistics are computed on the whole or a subset of feature data (i.e., detection window) already inserted into the Feature Group.
After creating a Feature Group in HSFS, you can setup statistics monitoring to compute statistics over one or more features on a scheduled basis.
Statistics are computed on the whole or a subset of feature data (i.e., detection window) already inserted into the Feature Group.

## Statistics Comparison

In addition to scheduled statistics, you can enable the comparison of statistics against a reference subset of feature data (i.e., reference window) and define the criteria for this comparison including the statistics metric to compare and a threshold to identify anomalous values.

!!! info "Feature Monitoring Guide"
More information can be found in the [Feature monitoring guide](../../../user_guides/fs/feature_monitoring/index.md).


Loading