Low-Level Design: AIOD REST API Database Layer Rearchitecture

# Low-Level Design: AIOD REST API Database Layer Rearchitecture

**Document Version:** 1.0
**Status:** Draft for Review
**Prepared By:** Kartik Sapra
**Date:** April 2026

---

## Table of Contents

1. Introduction
2. Architectural Context
3. Class and Object Design
4. Design Patterns
5. Module Interactions and Interfaces
6. Error Handling and Edge Cases
7. Concurrency and Connection Management
8. Database Schema
9. Security Considerations
10. Performance and Scalability
11. Maintainability and Documentation

---

## 1. Introduction

### 1.1 Purpose

This document provides a detailed low-level design for the rearchitecture of the database layer within the AIOD REST API platform. The design is informed by a systematic code review of the existing implementation and targets twelve confirmed deficiencies. It serves as the technical specification for all engineers working on the database layer during the rearchitecture phase.

### 1.2 Problem Statement

The current database layer has grown organically and carries several structural problems that now affect reliability, security, and operational confidence. Migrations cannot be auto-generated because Alembic's metadata reference is unset. Sessions do not roll back on failure. Passwords are embedded in plain-text connection strings. Identifiers are generated without collision protection. Hidden database sessions open inside the serialization layer during HTTP response rendering, causing untracked N+1 query patterns. Hard-deletion runs as a series of independent commits, meaning a mid-run failure leaves the database in a partially cleaned state. These are not theoretical risks; they are observable flaws in the current production path.

### 1.3 Scope

This document covers the following components:

| Component | In Scope |
|---|---|
| [src/database/session.py](file:///c:/Users/ASUS%20TUF/Downloads/AIOD-rest-api-develop/AIOD-rest-api-develop/src/database/session.py) | Yes |
| [src/database/identifiers.py](file:///c:/Users/ASUS%20TUF/Downloads/AIOD-rest-api-develop/AIOD-rest-api-develop/src/database/identifiers.py) | Yes |
| [src/database/model/serializers.py](file:///c:/Users/ASUS%20TUF/Downloads/AIOD-rest-api-develop/AIOD-rest-api-develop/src/database/model/serializers.py) | Yes |
| [src/database/model/helper_functions.py](file:///c:/Users/ASUS%20TUF/Downloads/AIOD-rest-api-develop/AIOD-rest-api-develop/src/database/model/helper_functions.py) | Yes |
| [src/database/model/named_relation.py](file:///c:/Users/ASUS%20TUF/Downloads/AIOD-rest-api-develop/AIOD-rest-api-develop/src/database/model/named_relation.py) | Yes |
| [src/database/deletion/hard_delete.py](file:///c:/Users/ASUS%20TUF/Downloads/AIOD-rest-api-develop/AIOD-rest-api-develop/src/database/deletion/hard_delete.py) | Yes |
| [src/database/review.py](file:///c:/Users/ASUS%20TUF/Downloads/AIOD-rest-api-develop/AIOD-rest-api-develop/src/database/review.py) | Yes |
| [src/database/authorization.py](file:///c:/Users/ASUS%20TUF/Downloads/AIOD-rest-api-develop/AIOD-rest-api-develop/src/database/authorization.py) | Yes |
| [alembic/alembic/env.py](file:///c:/Users/ASUS%20TUF/Downloads/AIOD-rest-api-develop/AIOD-rest-api-develop/alembic/alembic/env.py) and `versions/` | Yes |
| `src/routers/` | Partial (query call-sites only) |
| Connector layer | No |
| Authentication (Keycloak) | No |

### 1.4 Functional Requirements

- Sessions must roll back automatically on any unhandled exception.
- The migration toolchain must detect schema drift between ORM models and the live database.
- Every migration must have a working `downgrade()` path.
- Identifier assignment must not produce unhandled primary key violations.
- Soft-deleted resources must be excluded from relationship responses without opening additional database sessions during serialization.
- Hard-deletion of expired soft-deleted records must execute as a single atomic transaction.
- The `Submission.assets` property must not issue one query per asset.

### 1.5 Non-Functional Requirements

- The database connection URL must never appear in logs with a plaintext password.
- Dialect-specific SQL must be derived from the live engine at runtime, not from an environment variable read at module import time.
- The connection pool must be explicitly configurable through application configuration, with a default `pool_size` of 10, `max_overflow` of 20, and `pool_pre_ping` enabled.
- The approach for determining whether an `AIoDConcept` subclass is abstract must be explicit and not break when a leaf class is subclassed for any reason.

---

## 2. Architectural Context

### 2.1 High-Level Component Overview

The AIOD REST API database layer sits between the HTTP router layer and the MySQL database. It is composed of four logical subsystems: session management, ORM models, the migration chain, and operational utilities (deletion, review, authorization).

```mermaid
graph TD
    A[FastAPI Routers] --> B[Session Management]
    A --> C[Serialization Layer]
    B --> D[(MySQL Database)]
    C --> B
    E[ORM Model Layer] --> D
    F[Alembic Migrations] --> D
    G[Deletion Utilities] --> B
    H[Authorization Module] --> B
    I[Review Module] --> B
    C --> E
    A --> E
```

### 2.2 Current vs. Proposed Architecture

The fundamental model hierarchy and relationship declaration system are sound and will be retained. What changes are the infrastructure wrappers around them: how sessions are created and closed, how identifiers are assigned, how the serializer resolves soft deletes, how the migration toolchain detects schema drift, and how the connection pool is configured.

```mermaid
graph LR
    subgraph Current
        S1[DbSession - no rollback] --> DB1[(MySQL)]
        ID1[random.choices - no retry] --> DB1
        SER1[GetterDict opens new DbSession per item] --> DB1
        HD1[hard_delete commits per concept] --> DB1
        AL1[Alembic - target_metadata = None] -.->|no drift detection| DB1
    end

    subgraph Proposed
        S2[DbSession - explicit rollback] --> DB2[(MySQL)]
        S2a[CommitDbSession - auto commit] --> DB2
        ID2[assign_identifier - secrets + retry] --> DB2
        SER2[with_loader_criteria - query-level filter] --> DB2
        HD2[hard_delete - single transaction] --> DB2
        AL2[Alembic - target_metadata = SQLModel.metadata] -->|drift detection| DB2
    end
```

---

## 3. Class and Object Design

### 3.1 Session Management

The two session types establish a clear contract: `DbSession` is for reads and operations where the caller controls commit timing, and `CommitDbSession` is for write paths that should commit on success and roll back on any exception.

```mermaid
classDiagram
    class EngineSingleton {
        -dict __monostate
        +Engine engine
        +__init__()
        +patch(engine: Engine)
        -_build_engine() Engine
    }

    class DbSession {
        +bool autoflush
        +__enter__() Session
        +__exit__(exc_type, exc_val, exc_tb)
        +rollback() void
        +close() void
    }

    class CommitDbSession {
        +bool autoflush
        +__enter__() Session
        +__exit__(exc_type, exc_val, exc_tb)
        +commit() void
        +rollback() void
        +close() void
    }

    class ConnectionConfig {
        +str drivername
        +str username
        +str password
        +str host
        +int port
        +str database
        +int pool_size
        +int max_overflow
        +int pool_timeout
        +int pool_recycle
        +bool pool_pre_ping
        +build_url() URL
    }

    EngineSingleton --> ConnectionConfig : reads
    DbSession --> EngineSingleton : uses engine
    CommitDbSession --|> DbSession : extends
```

**Key Design Decision:** `CommitDbSession` inherits from `DbSession` and overrides `__exit__` to call `session.commit()` before `session.close()`. Both call `session.rollback()` in the exception path. This makes the difference in behaviour explicit at the call site rather than implicit in the session lifecycle.

### 3.2 Identifier Generation

```mermaid
classDiagram
    class IdentifierGenerator {
        -str prefix
        -str separator
        -int length
        +generate() str
    }

    class IdentifierAssigner {
        +assign_identifier(session, model_class, instance, max_retries: int) void
        -_is_taken(session, model_class, candidate: str) bool
    }

    IdentifierAssigner --> IdentifierGenerator : calls generate()
    IdentifierAssigner --> Session : queries
```

`IdentifierAssigner.assign_identifier()` calls `_is_taken()` before assigning the candidate to the instance. If the candidate is taken it generates a new one, up to `max_retries` attempts. After exhausting retries it raises a typed `IdentifierExhaustionError` rather than propagating a raw database constraint violation.

### 3.3 Serialization Layer

```mermaid
classDiagram
    class Serializer {
        <<abstract>>
        +serialize(model) Any
        +value(model, attribute_name) Any
    }

    class AttributeSerializer {
        -str attribute_name
        +serialize(model) Any
    }

    class GetPathSerializer {
        -str path
        -Serializer inner_serializer
        +serialize(model) Any
        +value(model, attribute_name) Any
    }

    class DeSerializer {
        <<abstract>>
        +deserialize(session, serialized, user) Any
    }

    class FindByNameDeserializer {
        -type clazz
        +deserialize(session, name, user) int
    }

    class FindByIdentifierDeserializerList {
        -type clazz
        +deserialize(session, ids, user) list
    }

    class QueryOptions {
        +exclude_deleted() LoaderCriteria
    }

    Serializer <|-- AttributeSerializer
    Serializer <|-- GetPathSerializer
    DeSerializer <|-- FindByNameDeserializer
    DeSerializer <|-- FindByIdentifierDeserializerList
    QueryOptions ..> AIoDConcept : filters on date_deleted
```

The `QueryOptions` class is a new addition. Its `exclude_deleted()` method returns a SQLAlchemy `with_loader_criteria` object that the router applies once at query time, removing the need for any Python-side soft-delete filtering inside the serializer and eliminating hidden session opens entirely.

### 3.4 Deletion Subsystem

```mermaid
classDiagram
    class HardDeleteService {
        +hard_delete_older_than(time_threshold: timedelta) void
        -_build_filter(concept, threshold: datetime) BinaryExpression
    }

    class DeletionTriggerFactory {
        +create_delete_triggers(parent_class) list~DDL~
        +create_identifier_sync_triggers(dialect: str) list~DDL~
        -create_deletion_trigger_one_to_one(...) DDL
        -create_deletion_trigger_many_to_one(...) DDL
        -create_deletion_trigger_many_to_many(...) DDL
    }

    HardDeleteService --> CommitDbSession : uses single transaction
    DeletionTriggerFactory --> EngineSingleton : reads dialect
```

### 3.5 Authorization

```mermaid
classDiagram
    class User {
        +str subject_identifier
    }

    class Group {
        +int identifier
        +str name
    }

    class Permission {
        +int aiod_entry_identifier
        +str user_identifier
        +int group_identifier
        +PermissionType type_
    }

    class PermissionType {
        <<enum>>
        READ
        WRITE
        ADMIN
        +__lt__(other) bool
    }

    class AIoDEntryORM {
        +int identifier
        +EntryStatus status
        +datetime date_modified
        +datetime date_created
        +list permissions
    }

    AIoDEntryORM "1" --> "*" Permission : has
    Permission "*" --> "0..1" User : granted to
    Permission "*" --> "0..1" Group : granted to
    Group "1" --> "*" User : contains
    PermissionType --> Permission : type of
```

A `CHECK` constraint enforces that exactly one of `user_identifier` or `group_identifier` is non-null on each `Permission` row. This replaces the existing TODO comment with a proper implementation.

---

## 4. Design Patterns

### 4.1 Singleton (Engine Management)

The `EngineSingleton` uses the Borg pattern (shared state dictionary) to ensure one SQLAlchemy engine exists per process. This pattern is retained because it is correct and well implemented. The only change is adding explicit pool configuration parameters sourced from `config.default.toml` rather than relying on SQLAlchemy defaults.

### 4.2 Context Manager (Session Lifecycle)

Python context managers (`__enter__` / `__exit__`) are used to bind session lifecycle to code blocks. This pattern guarantees that rollback and close are called regardless of how the block exits. The existing implementation already uses this pattern correctly for `DbSession`; the change is adding rollback on exception and introducing `CommitDbSession` as a dedicated write-path variant.

### 4.3 Repository (Identifier Assignment)

`IdentifierAssigner` acts as a repository-style service that encapsulates the rules for creating a valid, unique identifier. Callers do not need to know anything about collision handling or retry logic; they call `assign_identifier()` and receive a guarantee that the instance now has a unique primary key or an exception is raised with a clear message.

### 4.4 Strategy (Serializer / Deserializer)

The existing `Serializer` and `DeSerializer` abstract base classes already implement the Strategy pattern. Each relationship field on a model declares its own serializer and deserializer strategy independently. This design is retained as-is.

### 4.5 Decorator (Loader Criteria)

`QueryOptions.exclude_deleted()` wraps SQLAlchemy's `with_loader_criteria` in a domain-specific helper that is applied as a decorator to any query involving `AIoDConcept` relationships. This keeps the soft-delete filtering policy in one place. Any future change to what "deleted" means (for example, adding an `is_archived` state) requires a change in one class.

---

## 5. Module Interactions and Interfaces

### 5.1 Session Lifecycle Sequence

This diagram shows a typical write request, illustrating where commit and rollback now occur.

```mermaid
sequenceDiagram
    participant Router
    participant CommitDbSession
    participant Session
    participant Database

    Router->>CommitDbSession: enter context
    CommitDbSession->>Session: create Session(engine)
    CommitDbSession-->>Router: yields session

    Router->>Session: session.add(resource)
    Router->>Session: session.flush()
    Database-->>Session: flush result

    alt Success
        CommitDbSession->>Session: session.commit()
        Session->>Database: COMMIT
    else Exception raised
        CommitDbSession->>Session: session.rollback()
        Session->>Database: ROLLBACK
        CommitDbSession->>Router: re-raises exception
    end

    CommitDbSession->>Session: session.close()
```

### 5.2 Identifier Assignment Sequence

```mermaid
sequenceDiagram
    participant Router
    participant IdentifierAssigner
    participant IdentifierGenerator
    participant Session
    participant Database

    Router->>IdentifierAssigner: assign_identifier(session, Dataset, instance)
    loop Up to max_retries
        IdentifierAssigner->>IdentifierGenerator: generate()
        IdentifierGenerator-->>IdentifierAssigner: "dset_Xk9mP..."
        IdentifierAssigner->>Session: session.get(Dataset, candidate)
        Session->>Database: SELECT WHERE identifier = ?
        Database-->>Session: None (no collision)
        IdentifierAssigner->>IdentifierAssigner: instance.identifier = candidate
        IdentifierAssigner-->>Router: returns
    end
    alt All retries exhausted
        IdentifierAssigner->>Router: raises IdentifierExhaustionError
    end
```

### 5.3 Soft-Delete Filtering Sequence (Proposed)

This replaces the current approach of opening hidden sessions inside `GetterDict.get()`.

```mermaid
sequenceDiagram
    participant Router
    participant Session
    participant QueryOptions
    participant Database
    participant Serializer

    Router->>QueryOptions: exclude_deleted()
    QueryOptions-->>Router: LoaderCriteria(date_deleted IS NULL)

    Router->>Session: exec(select(Dataset).options(exclude_deleted()))
    Session->>Database: SELECT ... WHERE date_deleted IS NULL (JOIN applied to relationships)
    Database-->>Session: Dataset rows with relationships pre-filtered
    Session-->>Router: list of Dataset instances

    Router->>Serializer: serialize(dataset_instance)
    Serializer-->>Router: response dict (no hidden DB calls)
```

### 5.4 Hard-Delete Sequence (Proposed)

```mermaid
sequenceDiagram
    participant CronJob
    participant HardDeleteService
    participant CommitDbSession
    participant Session
    participant Database

    CronJob->>HardDeleteService: hard_delete_older_than(timedelta(minutes=60))
    HardDeleteService->>CommitDbSession: enter context
    CommitDbSession->>Session: create Session

    loop For each AIoDConcept subclass
        HardDeleteService->>Session: exec(DELETE WHERE date_deleted < threshold)
        Session->>Database: DELETE statement (within transaction)
    end

    CommitDbSession->>Session: session.commit()
    Session->>Database: COMMIT (all deletes committed atomically)

    alt Any exception
        CommitDbSession->>Session: session.rollback()
        Session->>Database: ROLLBACK (no partial state)
    end
```

---

## 6. Error Handling and Edge Cases

### 6.1 Exception Class Hierarchy

A typed exception hierarchy ensures that callers can catch precisely the errors they care about, rather than catching broad `Exception` or relying on database constraint violations bubbling up as raw SQLAlchemy errors.

```mermaid
classDiagram
    class AIoDDatabaseError {
        <<base>>
        +str message
    }

    class IdentifierExhaustionError {
        +str model_class_name
        +int max_retries
    }

    class DialectResolutionError {
        +str expected
        +str actual
    }

    class SessionDetachedError {
        +str operation
    }

    AIoDDatabaseError <|-- IdentifierExhaustionError
    AIoDDatabaseError <|-- DialectResolutionError
    AIoDDatabaseError <|-- SessionDetachedError
```

### 6.2 Edge Case Handling

| Scenario | Current Behaviour | Proposed Behaviour |
|---|---|---|
| Primary key collision on insert | Unhandled `IntegrityError`, 500 response | `IdentifierAssigner` retries up to 5 times, then raises `IdentifierExhaustionError`, which maps to HTTP 503 |
| Session not committed, exception raised | Session closes without rollback, state may be ambiguous | `DbSession.__exit__` calls `rollback()` before `close()` unconditionally on exception |
| `Submission.assets` called on detached instance | Returns `None` silently via `Session.object_session()` | `SessionDetachedError` raised with the operation name |
| Hard-delete crashes mid-loop | Some concept types deleted, others not | Full `ROLLBACK` via `CommitDbSession`, no partial state |
| Wrong dialect env var on startup | Wrong `CHECK` constraint SQL generated silently | `DialectResolutionError` raised at engine initialization, server fails fast |

---

## 7. Concurrency and Connection Management

### 7.1 Connection Pool Configuration

The current `EngineSingleton` sets only `pool_recycle`. The proposed configuration exposes all relevant pool parameters through `config.default.toml`, with the following recommended defaults:

| Parameter | Default | Rationale |
|---|---|---|
| `pool_size` | 10 | Enough for moderate concurrent API load without exhausting MySQL's connection limit |
| `max_overflow` | 20 | Allows burst capacity up to 30 total connections before `PoolTimeout` |
| `pool_timeout` | 30 | Callers receive a Python exception after 30 seconds rather than hanging indefinitely |
| `pool_recycle` | 3600 | Existing value; prevents stale connections on long-lived deployments |
| `pool_pre_ping` | True | Sends `SELECT 1` before reuse to detect connections dropped by the MySQL server |

### 7.2 Session Thread Safety

Each request in FastAPI receives its own session instance via `get_session()`. Sessions are never shared across threads. The `EngineSingleton` is thread-safe because `create_engine` is called only once (guarded by the Borg shared-state pattern) and the resulting engine's connection pool handles concurrent access internally.

### 7.3 Alembic Migration Concurrency

Migrations are run as a single-threaded process against the live database before the API server starts. The migration chain uses `revision` and `down_revision` to enforce a linear order. Running two migration processes simultaneously would produce undefined behaviour; this is prevented at the infrastructure level by the Docker Compose startup sequence.

---

## 8. Database Schema

### 8.1 Core Entity Relationship Diagram

```mermaid
erDiagram
    aiod_entry {
        int identifier PK
        enum status
        datetime date_modified
        datetime date_created
    }

    aiod_concept {
        varchar identifier PK
        str platform
        str platform_resource_identifier
        datetime date_deleted
        int aiod_entry_identifier FK
    }

    ai_resource {
        varchar identifier PK
        varchar aiod_concept_id FK
    }

    ai_asset {
        varchar identifier PK
        varchar ai_resource_id FK
    }

    dataset {
        varchar identifier PK
        varchar ai_asset_id FK
    }

    user {
        varchar subject_identifier PK
    }

    group {
        int identifier PK
        varchar name
    }

    permission {
        int identifier PK
        int aiod_entry_identifier FK
        varchar user_identifier FK
        int group_identifier FK
        enum type_
    }

    submission {
        int identifier PK
        datetime request_date
        varchar requestee_identifier FK
    }

    review {
        int identifier PK
        enum decision
        datetime decision_date
        int submission_identifier FK
        varchar reviewer_identifier FK
    }

    platform {
        varchar name PK
    }

    aiod_concept ||--|| aiod_entry : "has"
    aiod_concept }o--|| platform : "sourced_from"
    ai_resource ||--|| aiod_concept : "is_a"
    ai_asset ||--|| ai_resource : "is_a"
    dataset ||--|| ai_asset : "is_a"
    aiod_entry ||--o{ permission : "governed_by"
    permission }o--o| user : "granted_to"
    permission }o--o| group : "granted_to"
    group }o--o{ user : "contains"
    submission }o--|| user : "requested_by"
    review }o--|| submission : "responds_to"
    review }o--|| user : "made_by"
```

### 8.2 Identifier Design

Every `AIoDConcept` subclass uses a string primary key of the form `{prefix}_{24 random alphanumeric characters}`. The prefix is at most four characters and is declared as `__abbreviation__` on each model class (for example, `dset` for Dataset, `mlmd` for MLModel). This makes identifiers human-readable in logs and API responses without requiring a join to resolve the resource type.

The current `identifier` columns use integer primary keys in intermediate tables (`AIResourceORM`, `AIAssetTable`, etc.) and are synchronized with the `AIoDConcept` string identifier via SQL `AFTER INSERT` triggers. This synchronization mechanism is retained as-is.

### 8.3 Constraint Inventory

| Table | Constraint | Type | Purpose |
|---|---|---|---|
| `aiod_concept` | `platform XOR platform_resource_identifier` | CHECK | Platform and identifier must both be set or both be null |
| `aiod_concept` | `platform = LOWER(platform)` | CHECK | Platform names are always lowercase |
| `aiod_concept` | `(platform, platform_resource_identifier, coalesce(date_deleted,'2000-01-01'))` | UNIQUE INDEX | Allows the same external resource to be re-imported after soft deletion |
| `named_relation` | `name = LOWER(name)` | CHECK | Enum-style names are always lowercase |
| `permission` | `user_identifier XOR group_identifier NOT NULL` | CHECK (proposed) | A permission targets exactly one of a user or a group |

### 8.4 Index Design

Beyond the constraints above, the following indexes support the most common query patterns:

| Table | Index Columns | Query Pattern |
|---|---|---|
| `aiod_concept` | `platform`, `platform_resource_identifier` | Connector upsert lookups |
| `aiod_concept` | `date_deleted` | Hard-delete batch queries |
| `named_relation` | `name` | Taxonomy term resolution |
| `aiod_entry` | `status` | Draft / published listing filters |

---

## 9. Security Considerations

### 9.1 Database Credential Handling

The connection URL is constructed using `sqlalchemy.engine.URL.create()` instead of Python string interpolation. This object's `__repr__` and `__str__` methods mask the password, ensuring it does not appear in log output, tracebacks, or debug tooling even when `echo=True` is enabled during development.

Credentials themselves are sourced from `config.default.toml`, which is excluded from version control via `.gitignore`. In production, the configuration values are injected through environment variables or a secrets manager mounted at the container level.

### 9.2 Permission Enforcement

The `PermissionType` enum defines a total ordering (`READ < WRITE < ADMIN`) enforced by `__lt__`. Authorization checks always specify a minimum required level using `at_least=PermissionType.WRITE` and so on, rather than equality. This means that an `ADMIN` user always satisfies a `WRITE` check without special-casing.

The proposed `Group` expansion adds no new attack surface because the `CHECK` constraint at the database level prevents any permission row from simultaneously referencing a user and a group.

### 9.3 Taxonomy Input Validation

Non-connector users are prevented from submitting arbitrary taxonomy terms by `FindByNameDeserializer`, which checks the `official` flag on `Taxonomy` rows. This check happens in the deserialization layer before any data reaches the ORM, so it applies uniformly across all endpoints that accept taxonomy fields.

Platform resource identifiers are validated against per-platform regex patterns (Zenodo, HuggingFace, OpenML) before storage. This prevents malformed external identifiers from entering the platform and simplifies connector reconciliation.

---

## 10. Performance and Scalability

### 10.1 Eliminating Hidden N+1 Queries

The most significant performance improvement in this rearchitecture is the removal of hidden database sessions from the serialization layer. The current `is_soft_deleted()` function inside `create_getter_dict` opens a new session for every related item it checks. On a list endpoint returning 50 datasets, each with 10 relationships, this generates up to 500 hidden database roundtrips per response.

The proposed `exclude_deleted()` loader criteria moves this filter to the query level. The database engine applies it in a single join during the initial `SELECT`, and the serialization layer receives already-filtered data without touching the database again.

### 10.2 Batched Asset Loading in Review

`Submission.assets` currently issues one `session.get()` per asset identifier. The proposed implementation groups identifiers by asset type prefix and issues one `SELECT ... WHERE identifier IN (...)` per type. For a typical submission involving five datasets and two ML models, this reduces seven queries to two.

### 10.3 Caching of Computed Mappings

`get_asset_type_by_abbreviation()` is decorated with `@functools.cache` in the existing code and is correctly retained. The mapping from abbreviation prefix to model class is computed once at startup and reused. The proposed identifier assigner uses this same cache when resolving the target model class for collision detection.

### 10.4 Scalability Boundary

The database layer itself is stateless at the application level: no in-memory query results are shared across requests. Horizontal scaling of the API tier requires only that all instances share the same MySQL database. The connection pool on each instance (10 base + 20 overflow = 30 max connections) must be factored into MySQL's `max_connections` setting when planning capacity for multiple replicas.

---

## 11. Maintainability and Documentation

### 11.1 Alembic Autogenerate

Enabling `target_metadata = SQLModel.metadata` in `alembic/env.py` allows engineers to run `alembic revision --autogenerate -m "description"` after any model change. Alembic compares the ORM metadata against the live schema and generates a migration stub that includes the detected differences. Engineers review and adjust the stub before committing. This replaces the current manual process where a developer must remember to write a migration and risk silent schema drift if they forget.

All ORM table-backed modules must be imported before `target_metadata` is referenced in `env.py` so that SQLAlchemy can register their table definitions. A module-level import block at the top of `env.py` will enumerate these explicitly.

### 11.2 Downgrade Discipline

A CI check will fail any pull request that adds a migration with a `pass`-only `downgrade()`. For migrations where a true structural downgrade is impractical (large data migrations, type changes with data loss), the `downgrade()` body must contain a comment explaining why a backup restore is required, and the migration must be tagged with a `# no-downgrade-required` marker that the CI check accepts as a valid excuse.

### 11.3 Abstract Resource Flag

The `__abstract_resource__` class variable replaces the structural heuristic in `non_abstract_subclasses()`. Each model class explicitly declares whether it is concrete or abstract. This is documented in the developer README as a requirement for any new model added to the hierarchy.

```python
# Correct declaration for a concrete model
class Dataset(AIoDAIConcept):
    __abstract_resource__ = False
    __abbreviation__ = "dset"

# Correct declaration for an intermediate abstract model
class AIAsset(AIResource):
    __abstract_resource__ = True
```

### 11.4 Coding Standards for This Layer

| Rule | Rationale |
|---|---|
| All write paths use `CommitDbSession` | Makes transaction boundary visible at the call site |
| All read paths use `DbSession` | Prevents accidental commits in read operations |
| Identifier assignment always via `IdentifierAssigner` | Centralizes collision handling |
| Soft-delete filtering always via `exclude_deleted()` | Prevents filtering logic from spreading into serializers or routers |
| New migrations must include a working `downgrade()` | Operational rollback capability |
| `__abstract_resource__` must be set on every new model class | Prevents silent exclusion from deletion and trigger logic |


Table	Index Columns	Query Pattern
`aiod_concept`	`platform`, `platform_resource_identifier`	Connector upsert lookups
`aiod_concept`	`date_deleted`	Hard-delete batch queries
`named_relation`	`name`	Taxonomy term resolution
`aiod_entry`	`status`	Draft / published listing filters

Component	In Scope
src/database/session.py	Yes
src/database/identifiers.py	Yes
src/database/model/serializers.py	Yes
src/database/model/helper_functions.py	Yes
src/database/model/named_relation.py	Yes
src/database/deletion/hard_delete.py	Yes
src/database/review.py	Yes
src/database/authorization.py	Yes
alembic/alembic/env.py and `versions/`	Yes
`src/routers/`	Partial (query call-sites only)
Connector layer	No
Authentication (Keycloak)	No

Scenario	Current Behaviour	Proposed Behaviour
Primary key collision on insert	Unhandled `IntegrityError`, 500 response	`IdentifierAssigner` retries up to 5 times, then raises `IdentifierExhaustionError`, which maps to HTTP 503
Session not committed, exception raised	Session closes without rollback, state may be ambiguous	`DbSession.__exit__` calls `rollback()` before `close()` unconditionally on exception
`Submission.assets` called on detached instance	Returns `None` silently via `Session.object_session()`	`SessionDetachedError` raised with the operation name
Hard-delete crashes mid-loop	Some concept types deleted, others not	Full `ROLLBACK` via `CommitDbSession`, no partial state
Wrong dialect env var on startup	Wrong `CHECK` constraint SQL generated silently	`DialectResolutionError` raised at engine initialization, server fails fast

Parameter	Default	Rationale
`pool_size`	10	Enough for moderate concurrent API load without exhausting MySQL's connection limit
`max_overflow`	20	Allows burst capacity up to 30 total connections before `PoolTimeout`
`pool_timeout`	30	Callers receive a Python exception after 30 seconds rather than hanging indefinitely
`pool_recycle`	3600	Existing value; prevents stale connections on long-lived deployments
`pool_pre_ping`	True	Sends `SELECT 1` before reuse to detect connections dropped by the MySQL server

Table	Constraint	Type	Purpose
`aiod_concept`	`platform XOR platform_resource_identifier`	CHECK	Platform and identifier must both be set or both be null
`aiod_concept`	`platform = LOWER(platform)`	CHECK	Platform names are always lowercase
`aiod_concept`	`(platform, platform_resource_identifier, coalesce(date_deleted,'2000-01-01'))`	UNIQUE INDEX	Allows the same external resource to be re-imported after soft deletion
`named_relation`	`name = LOWER(name)`	CHECK	Enum-style names are always lowercase
`permission`	`user_identifier XOR group_identifier NOT NULL`	CHECK (proposed)	A permission targets exactly one of a user or a group

Rule	Rationale
All write paths use `CommitDbSession`	Makes transaction boundary visible at the call site
All read paths use `DbSession`	Prevents accidental commits in read operations
Identifier assignment always via `IdentifierAssigner`	Centralizes collision handling
Soft-delete filtering always via `exclude_deleted()`	Prevents filtering logic from spreading into serializers or routers
New migrations must include a working `downgrade()`	Operational rollback capability
`__abstract_resource__` must be set on every new model class	Prevents silent exclusion from deletion and trigger logic

Uh oh!

Low-Level Design: AIOD REST API Database Layer Rearchitecture #742

Description

Low-Level Design: AIOD REST API Database Layer Rearchitecture

Table of Contents

1. Introduction

1.1 Purpose

1.2 Problem Statement

1.3 Scope

1.4 Functional Requirements

1.5 Non-Functional Requirements

2. Architectural Context

2.1 High-Level Component Overview

2.2 Current vs. Proposed Architecture

3. Class and Object Design

3.1 Session Management

3.2 Identifier Generation

3.3 Serialization Layer

3.4 Deletion Subsystem

3.5 Authorization

4. Design Patterns

4.1 Singleton (Engine Management)

4.2 Context Manager (Session Lifecycle)

4.3 Repository (Identifier Assignment)

4.4 Strategy (Serializer / Deserializer)

4.5 Decorator (Loader Criteria)

5. Module Interactions and Interfaces

5.1 Session Lifecycle Sequence

5.2 Identifier Assignment Sequence

5.3 Soft-Delete Filtering Sequence (Proposed)

5.4 Hard-Delete Sequence (Proposed)

6. Error Handling and Edge Cases

6.1 Exception Class Hierarchy

6.2 Edge Case Handling

7. Concurrency and Connection Management

7.1 Connection Pool Configuration

7.2 Session Thread Safety

7.3 Alembic Migration Concurrency

8. Database Schema

8.1 Core Entity Relationship Diagram

8.2 Identifier Design

8.3 Constraint Inventory

8.4 Index Design

9. Security Considerations

9.1 Database Credential Handling

9.2 Permission Enforcement

9.3 Taxonomy Input Validation

10. Performance and Scalability

10.1 Eliminating Hidden N+1 Queries

10.2 Batched Asset Loading in Review

10.3 Caching of Computed Mappings

10.4 Scalability Boundary

11. Maintainability and Documentation

11.1 Alembic Autogenerate

11.2 Downgrade Discipline

11.3 Abstract Resource Flag

11.4 Coding Standards for This Layer

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions