Skip to content

Conversation

@keys-i
Copy link

@keys-i keys-i commented Nov 5, 2025

Hey COMP3710 Teaching Team,

My name is Radhesh Goel (s49088276), enrolled in COMP3710.

This submission is a working implementation of a generative time-series model based on TimeGAN that synthesises limit order book (LOB) event sequences from the LOBSTER AMZN Level-10 dataset. It addresses Task 11 [Hard Difficulty] by training a model to produce realistic LOB sequences and evaluating them on a held-out test split.

Evaluation targets

  • Distribution similarity: target KL ≤ 0.1 for both spread and mid-price return distributions.
  • Visual similarity: target SSIM > 0.6 between heatmaps of generated and real Level-10 depth snapshots.

The report includes the model architecture and parameter count, the training strategy (full TimeGAN and ablations for adversarial-only and supervised-only losses), GPU type, VRAM, epochs, and total training time. It also presents 3–5 representative heatmaps comparing real and synthetic order books, with a brief error analysis highlighting where the synthetic LOBs perform well and where they fall short.

For reproducibility, I have provided a pinned environment.yml, a concise references file, and the exact scripts used for preprocessing, training, evaluation, and visualisation. These materials allow the full pipeline to be rebuilt and the reported metrics and figures to be replicated.

Thank you for your time and consideration.

Kind regards,
Radhesh Goel (s49088276)

keys-i added 30 commits October 2, 2025 13:18
Create empty modules, configs, and test shells; no implementations yet.
Clarify module purpose, responsibilities, and public API; add usage example and references. No functional changes.
Add environment.yml pinned to python=3.13.* (conda-forge, strict priority) with numpy>=2,<3, pandas>=2.2, scipy>=1.13, scikit-learn>=1.5, matplotlib>=3.9, jupyterlab, ipykernel. Refactor code into src/ (add __init__.py), update script imports to use the package, and rename any lib-shadowing files (e.g., matplotlib.py).
Adds CLI smoke test, core/raw10 features, chronological split, train-only scaling, and windowing.
Add --headerless-message/--headerless-orderbook flags, robust header normalization, train-only scaling, NaN/inf filtering, dtype control, meta accessors, and optional NPZ export. Includes improved errors and windowing checks.
Introduce summarize() and --summary/--peek to inspect message/orderbook tables. Keep headerless support with robust normalization; chronological splits; train-only scaling; NaN/inf cleaning; dtype control; NPZ export; inverse_transform; and metadata accessors.
…loats, split summaries)

- Add --pretty flag to print tidy console tables for train/val/test splits
- Show split shapes (num_seq × seq_len × num_features) and a small head/tail sample
- Right-align numeric columns; thousands separators; configurable precision
- CLI knobs: --head N --tail N --width 120 --precision 4
- Add quick feature stats (min/p25/median/mean/p75/max, std) for the selected feature set
- Purely display-layer changes; no impact on saved arrays or training
Add --style chat|box and --no-color; render directory, CSV summaries, preprocessing report, and sample window as message-like bubbles with aligned key–value tables. Keep headerless support, time-sort, decimation, quantile clipping, chronological splits, and train-only scaling unchanged.
Add --verbose and --meta-json; report memory footprint, time coverage, scaler parameters, clip bounds preview, and windowing math. Keep chat/box styles, headerless support, time-sort, decimation, quantile clipping, chronological splits, and train-only scaling.
Integrate tabulate for head/tail/describe and 2-col KV sections. Preserve table lines inside bubbles/boxes (no wrapping) and auto-fit inner width to widest table row. Retains headerless support, time sort, decimation, quantile clipping, chronological splits, train-only scaling, and verbose diagnostics.
Add ANSI color themes, chat/box message panels, and --table-style (github|grid|simple). Preserve tabulate tables inside panels without wrapping and auto-fit widths. Keep headerless support, time sort, decimation, quantile clipping, chronological splits, train-only scaling, verbose diagnostics, and dataset summary report.
Train a generative time series model on LOBSTER AMZN Level 10 data to
produce realistic limit order book sequences. Targets: KL divergence ≤0.1
for spread and midprice returns, and SSIM >0.6 for depth heatmaps. The
report records architecture and parameter count, training variants
(full, adversarial only, supervised only), GPU and VRAM, epochs, and
total training time. Includes 3–5 paired heatmaps with a short error
analysis.
Break out I/O, feature engineering, scaling, and windowing into dataset_helpers/ (io.py, features.py, scaling.py, windows.py). Keep public Dataset/loader logic in dataset.py and re-export via __init__.py for backward compatibility (from dataset import LOBSTERDataset still works). Updated imports, added basic tests/placeholders, and kept defaults/paths unchanged.
… preprocessing

Add header auto-detect (no flags needed), enforce canonical column order, and coerce dtypes.
Render one big panel per CSV with subpanels (shape/dtypes/describe/head/tail) via textui.
Expand preprocessing for GANs: advanced scalers (robust/quantile/power), optional PCA/ZCA whitening,
train-only window augmentations (jitter/scaling/time-warp), engineered features (rel_spread, microprice,
L5 imbalance, rolling stats, diffs/pct), chronological split with train-only scaling, and NPZ+meta saving.
…tor/Supervisor/Discriminator)

Implements minimal TimeGAN in PyTorch:
- GRU/LSTM-based Embedder/Recovery, Generator, Supervisor, Discriminator
- Canonical losses: recon, supervised, GAN (gen/disc), moment + latent feature matching
- Utilities: noise sampling, weight init, optim factory
- Pretrain steps (AE, SUP) and joint training helpers
Supports windows.npz or on-the-fly preprocessing via LOBSTERData.
Includes 3-phase schedule (AE -> SUP -> Joint), AMP toggle, grad clipping,
basic checkpoints, and moment-loss validation.
…eatmaps + stats)

Loads windows from NPZ or CSV via LOBSTERData, restores trained checkpoint, samples synthetic sequences,
prints per-feature mean/std and quick KL, and saves feature-line plots + depth heatmaps to --outdir.
Streamlined dataset.py by folding helpers inline and removing unused CLI/docs. Normalization now uses a continuous MinMax scaler across windows for stable ranges; I/O paths and outputs simplified without extra flags.
Rewrote monolithic functions into a Dataset class with clear init/load/transform methods. Improves readability, reuse, and testability with no external behavior changes.
…ix batch_generator

Introduce DataOptions wrapper with flags (--seq_len, --data_dir, --orderbook_filename, --no_shuffle, --keep_zero_rows, --splits, --log_level). Support ORDERBOOK_DEFAULT/SPLITS_DEFAULT fallbacks; accept proportions or cumulative cutoffs; replace prints with logging; add CLI entrypoint. Fix batch_generator index sampling and time=None handling; return constant T_mb; return windowed splits from load_data.
Introduce Options that forwards args after --dataset to DataOptions via argparse.REMAINDER. Attaches parsed DatasetOptions namespace at opts.dataset. Includes seed/run-name flags and supports programmatic argv. Minor polish: import REMAINDER and types, handle None -> [] for ds_argv.
… KL histogram

Introduce utilities for TimeGAN-LOB: extract_seq_lengths, sample_noise (supports RNG + optional mean/std via uniform with matched σ), minmax_scale/minmax_inverse over [N,T,F], and KL(real||fake) via histograms for 'spread' and 'mpr' with smoothing + optional plot. Adds strong shape/type guards, finite-range handling, and safe midprice log-returns.
…er/Recovery/Generator/Supervisor/Discriminator)

Implements GRU-based components with Xavier/orthogonal init, device/seed helpers, and typed handles. Sets BCEWithLogits-ready Discriminator and sigmoid-gated projections elsewhere. Preps for optional TemporalBackbone injection via config.
…nd generation API

Adds full wrapper (optimizers, ER pretrain, supervised, joint phases), checkpoint save/load, quick KL(spread) validation, and deterministic helpers. Integrates dataset batcher and utils (minmax, noise). Exposes encoder/recovery/generator/supervisor/discriminator and device/seed utilities.
…quick KL validation

Adds ER pretrain, supervised, and joint loops; Adam optimizers; save/load helpers; device/seed utils; and a generation API that inverse-scales to original feature space. Includes GRU-based Encoder/Recovery/Generator/Supervisor/Discriminator with Xavier/orthogonal init and BCEWithLogits-ready Discriminator.
Parses Options, loads datasets via load_data, constructs TimeGAN, and executes the full three-phase schedule with checkpoints. Keeps modules/dataset imports minimal to match current package layout.
Parses Options, loads data, restores TimeGAN from checkpoint, generates exactly len(test) rows, and saves to OUTPUT_DIR/gen_data.npy. Keeps API aligned with current dataset/modules helpers.
… model hyperparams

Adds DataOptions (seq-len, data-dir, orderbook-filename, splits, no-shuffle, keep-zero-rows) and ModulesOptions (batch-size, seq-len, z-dim, hidden-dim, num-layer, lr, beta1, w-gamma, w-g). Top-level Options forwards args via argparse.REMAINDER and returns opts.dataset / opts.modules namespaces for downstream loaders and trainers.
keys-i added 20 commits October 21, 2025 15:19
…ase training summary

Incorporates the five-component list (Encoder, Recovery, Generator, Supervisor, Discriminator) and a concise three-phase training in the project report. Based on prior HackMD draft refined before this commit.
…dependencies table

Introduces a linked ToC for quick navigation, expands project structure with brief per-file roles, and adds a version-pinned dependencies table with one-line use cases tailored to the TimeGAN LOB workflow.
…dependencies table

Introduces a linked ToC for quick navigation, expands project structure with brief per-file roles, and adds a version-pinned dependencies table with one-line use cases tailored to the TimeGAN LOB workflow.
…n placeholder

Introduces detailed LOBSTER AMZN L10 dataset description and chronological split strategy (train/val/test). Notes that references will be added in a forthcoming update.
Previously added a StyleGAN2/ADNI BibTeX by mistake. Replace with the TimeGAN for LOBSTER (AMZN L10) entry and update the project URL.
…ecture text

Embed modern HTML figure for the architecture PNG and rewrite component/flow sections for clarity and consistency. Remove training-specific notes from architecture and tighten wording.
Describe three-phase TimeGAN schedule (ER pretrain, Supervisor pretrain, Joint), loss design (MSE, BCE-with-logits, moment matching), metrics (KL on spread/returns, SSIM on heatmaps), and hardware/runtime setup (macOS M3 Pro, MLS/Metal).
Add kl_divergence_hist utility calls and a compact Rich table to display KL(spread) and KL(mpr) alongside SSIM when rendering real vs synthetic depth heatmaps.
…nor issues

Add clear module and function docstrings for CLI entrypoints; correct small typos, unsafe attribute access, and inconsistent phrasing. No functional changes.
Refine wording and formatting across the report; integrate quantitative tables (SSIM, KL(spread), KL(mpr), TempCorr, LatDist), heatmap figures, and latent-walk panels; add concise error analysis and style space discussion.
Condense narrative, remove redundancy, and clarify metrics and plots. Sync CLI docs with current flags (viz and latent-walk), standardize figure captions, and correct minor grammar.
Add clear module and function docstrings for CLI entrypoints; correct small typos, unsafe attribute access, and inconsistent phrasing. No functional changes.
Add clear module and function docstrings for CLI entrypoints; correct small typos, unsafe attribute access, and inconsistent phrasing. No functional changes.
Add clear module and function docstrings for CLI entrypoints; correct small typos, unsafe attribute access, and inconsistent phrasing. No functional changes.
Use robust Conda bootstrap in batch (no conda init; create/update env only if missing), set PROJECT_ROOT/PYTHONPATH, and add log files. Sync flags and entrypoints with current code: --num-iters, correct AMZN L10 filename, python -m src.viz.visualise. Add metrics CSV output, explicit viz out-dir, and latent-walk flags. Enable set -euo pipefail and improve status prints.
Switch reader to markdown+tex_math_dollars+raw_tex, set resource-path for images, and document working pandoc command (tectonic engine). Resolve gfm raw_tex incompatibility and ensure inline LaTeX and photos render correctly.
Converted loss table to an HTML <table> with MathJax-friendly \( … \) inline math and proper escaping (e.g., \gamma, \mathcal{L}). Ensures equations render correctly in GitHub/Docs. Updated notes column and headings accordingly.
Converted loss table to an HTML <table> with MathJax-friendly \( … \) inline math and proper escaping (e.g., \gamma, \mathcal{L}). Ensures equations render correctly in GitHub/Docs. Updated notes column and headings accordingly.
Reformatted the Model Architecture/Components block for GitHub: switched to MathJax-friendly inline math (\(…\)), cleaned up headings and lists, removed en/em dashes, and adjusted table/HTML so formulas and text render correctly.
Reformatted the Model Architecture/Components block for GitHub: switched to MathJax-friendly inline math (\(…\)), cleaned up headings and lists, removed en/em dashes, and adjusted table/HTML so formulas and text render correctly.
@keys-i keys-i force-pushed the topic-recognition branch 2 times, most recently from 8cd2b76 to 1291868 Compare November 10, 2025 00:46
@keys-i
Copy link
Author

keys-i commented Nov 11, 2025

Please disregard the README uploaded to Turnitin. That version was incomplete and has since been replaced by the current, complete submission.

@yexincheng
Copy link
Collaborator

This is an initial inspection, no action is required at this point

Recognition Problem : total : 20

  1. Solves problem: The solution is appropriate for the problem, reaching an SSIM of over 0.78 (Five trials were run; the presented results are from Trial 5). (5)
  2. Implementation functions: Good (3)
  3. Good design: Well-designed (1)
  4. Commenting: Clear and sufficient comments throughout the code. (1)
  5. Difficulty: Hard (10)

Note:

  • It is very clear to list arguments in the table, great job! I also like the file structure description part!
  • Some formula wasn't properly shown, might need to check the syntax.
  • Nice Interpretation of the results!
  • Different losses/metrics might have different ranges; it might be better to plot them in different figures or a nested figure.

@gayanku
Copy link
Collaborator

gayanku commented Nov 24, 2025

Marking

Good/OK/Fair Practice (Design/Commenting, TF/Torch Usage)
Good design and implementation.
Spacing and comments.
Header blocks.
Recognition Problem
Good solution to problem.
Driver Script present.
File structure present.
Good Usage & Demo & Visualisation & Data usage.
Module present.
Commenting present.
No Data leakage found.
Difficulty : Hard. Hard Difficulty : TimeGAN
Commit Log
Good Meaningful commit messages.
Good Progressive commits.
Documentation
Readme :Good.
Model/technical explanation :Good.
Description and Comments :Good.
Markdown used and PDF submitted.
Pull Request
Successful Pull Request (Working Algorithm Delivered on Time in Correct Branch).
No Feedback required.
Request Description is good.
TOTAL0

Marked as per the due date and changes after which aren't necessarily allowed to contribute to grade for fairness.
Subject to approval from Shakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants