Topic 14: TimeGAN for Synthetic LOB Sequences (LOBSTER AMZN Level-10) 49088276 #282

keys-i · 2025-11-05T20:03:20Z

Hey COMP3710 Teaching Team,

My name is Radhesh Goel (s49088276), enrolled in COMP3710.

This submission is a working implementation of a generative time-series model based on TimeGAN that synthesises limit order book (LOB) event sequences from the LOBSTER AMZN Level-10 dataset. It addresses Task 11 [Hard Difficulty] by training a model to produce realistic LOB sequences and evaluating them on a held-out test split.

Evaluation targets

Distribution similarity: target KL ≤ 0.1 for both spread and mid-price return distributions.
Visual similarity: target SSIM > 0.6 between heatmaps of generated and real Level-10 depth snapshots.

The report includes the model architecture and parameter count, the training strategy (full TimeGAN and ablations for adversarial-only and supervised-only losses), GPU type, VRAM, epochs, and total training time. It also presents 3–5 representative heatmaps comparing real and synthetic order books, with a brief error analysis highlighting where the synthetic LOBs perform well and where they fall short.

For reproducibility, I have provided a pinned environment.yml, a concise references file, and the exact scripts used for preprocessing, training, evaluation, and visualisation. These materials allow the full pipeline to be rebuilt and the reported metrics and figures to be replicated.

Thank you for your time and consideration.

Kind regards,
Radhesh Goel (s49088276)

Create empty modules, configs, and test shells; no implementations yet.

Clarify module purpose, responsibilities, and public API; add usage example and references. No functional changes.

Add environment.yml pinned to python=3.13.* (conda-forge, strict priority) with numpy>=2,<3, pandas>=2.2, scipy>=1.13, scikit-learn>=1.5, matplotlib>=3.9, jupyterlab, ipykernel. Refactor code into src/ (add __init__.py), update script imports to use the package, and rename any lib-shadowing files (e.g., matplotlib.py).

Adds CLI smoke test, core/raw10 features, chronological split, train-only scaling, and windowing.

Add --headerless-message/--headerless-orderbook flags, robust header normalization, train-only scaling, NaN/inf filtering, dtype control, meta accessors, and optional NPZ export. Includes improved errors and windowing checks.

Introduce summarize() and --summary/--peek to inspect message/orderbook tables. Keep headerless support with robust normalization; chronological splits; train-only scaling; NaN/inf cleaning; dtype control; NPZ export; inverse_transform; and metadata accessors.

…loats, split summaries) - Add --pretty flag to print tidy console tables for train/val/test splits - Show split shapes (num_seq × seq_len × num_features) and a small head/tail sample - Right-align numeric columns; thousands separators; configurable precision - CLI knobs: --head N --tail N --width 120 --precision 4 - Add quick feature stats (min/p25/median/mean/p75/max, std) for the selected feature set - Purely display-layer changes; no impact on saved arrays or training

Add --style chat|box and --no-color; render directory, CSV summaries, preprocessing report, and sample window as message-like bubbles with aligned key–value tables. Keep headerless support, time-sort, decimation, quantile clipping, chronological splits, and train-only scaling unchanged.

Add --verbose and --meta-json; report memory footprint, time coverage, scaler parameters, clip bounds preview, and windowing math. Keep chat/box styles, headerless support, time-sort, decimation, quantile clipping, chronological splits, and train-only scaling.

Integrate tabulate for head/tail/describe and 2-col KV sections. Preserve table lines inside bubbles/boxes (no wrapping) and auto-fit inner width to widest table row. Retains headerless support, time sort, decimation, quantile clipping, chronological splits, train-only scaling, and verbose diagnostics.

Add ANSI color themes, chat/box message panels, and --table-style (github|grid|simple). Preserve tabulate tables inside panels without wrapping and auto-fit widths. Keep headerless support, time sort, decimation, quantile clipping, chronological splits, train-only scaling, verbose diagnostics, and dataset summary report.

Train a generative time series model on LOBSTER AMZN Level 10 data to produce realistic limit order book sequences. Targets: KL divergence ≤0.1 for spread and midprice returns, and SSIM >0.6 for depth heatmaps. The report records architecture and parameter count, training variants (full, adversarial only, supervised only), GPU and VRAM, epochs, and total training time. Includes 3–5 paired heatmaps with a short error analysis.

Break out I/O, feature engineering, scaling, and windowing into dataset_helpers/ (io.py, features.py, scaling.py, windows.py). Keep public Dataset/loader logic in dataset.py and re-export via __init__.py for backward compatibility (from dataset import LOBSTERDataset still works). Updated imports, added basic tests/placeholders, and kept defaults/paths unchanged.

…ines & inverse-transform

… preprocessing Add header auto-detect (no flags needed), enforce canonical column order, and coerce dtypes. Render one big panel per CSV with subpanels (shape/dtypes/describe/head/tail) via textui. Expand preprocessing for GANs: advanced scalers (robust/quantile/power), optional PCA/ZCA whitening, train-only window augmentations (jitter/scaling/time-warp), engineered features (rel_spread, microprice, L5 imbalance, rolling stats, diffs/pct), chronological split with train-only scaling, and NPZ+meta saving.

…tor/Supervisor/Discriminator) Implements minimal TimeGAN in PyTorch: - GRU/LSTM-based Embedder/Recovery, Generator, Supervisor, Discriminator - Canonical losses: recon, supervised, GAN (gen/disc), moment + latent feature matching - Utilities: noise sampling, weight init, optim factory - Pretrain steps (AE, SUP) and joint training helpers

Supports windows.npz or on-the-fly preprocessing via LOBSTERData. Includes 3-phase schedule (AE -> SUP -> Joint), AMP toggle, grad clipping, basic checkpoints, and moment-loss validation.

…eatmaps + stats) Loads windows from NPZ or CSV via LOBSTERData, restores trained checkpoint, samples synthetic sequences, prints per-feature mean/std and quick KL, and saves feature-line plots + depth heatmaps to --outdir.

Streamlined dataset.py by folding helpers inline and removing unused CLI/docs. Normalization now uses a continuous MinMax scaler across windows for stable ranges; I/O paths and outputs simplified without extra flags.

Rewrote monolithic functions into a Dataset class with clear init/load/transform methods. Improves readability, reuse, and testability with no external behavior changes.

…ix batch_generator Introduce DataOptions wrapper with flags (--seq_len, --data_dir, --orderbook_filename, --no_shuffle, --keep_zero_rows, --splits, --log_level). Support ORDERBOOK_DEFAULT/SPLITS_DEFAULT fallbacks; accept proportions or cumulative cutoffs; replace prints with logging; add CLI entrypoint. Fix batch_generator index sampling and time=None handling; return constant T_mb; return windowed splits from load_data.

Introduce Options that forwards args after --dataset to DataOptions via argparse.REMAINDER. Attaches parsed DatasetOptions namespace at opts.dataset. Includes seed/run-name flags and supports programmatic argv. Minor polish: import REMAINDER and types, handle None -> [] for ds_argv.

… KL histogram Introduce utilities for TimeGAN-LOB: extract_seq_lengths, sample_noise (supports RNG + optional mean/std via uniform with matched σ), minmax_scale/minmax_inverse over [N,T,F], and KL(real||fake) via histograms for 'spread' and 'mpr' with smoothing + optional plot. Adds strong shape/type guards, finite-range handling, and safe midprice log-returns.

…er/Recovery/Generator/Supervisor/Discriminator) Implements GRU-based components with Xavier/orthogonal init, device/seed helpers, and typed handles. Sets BCEWithLogits-ready Discriminator and sigmoid-gated projections elsewhere. Preps for optional TemporalBackbone injection via config.

…nd generation API Adds full wrapper (optimizers, ER pretrain, supervised, joint phases), checkpoint save/load, quick KL(spread) validation, and deterministic helpers. Integrates dataset batcher and utils (minmax, noise). Exposes encoder/recovery/generator/supervisor/discriminator and device/seed utilities.

…quick KL validation Adds ER pretrain, supervised, and joint loops; Adam optimizers; save/load helpers; device/seed utils; and a generation API that inverse-scales to original feature space. Includes GRU-based Encoder/Recovery/Generator/Supervisor/Discriminator with Xavier/orthogonal init and BCEWithLogits-ready Discriminator.

Parses Options, loads datasets via load_data, constructs TimeGAN, and executes the full three-phase schedule with checkpoints. Keeps modules/dataset imports minimal to match current package layout.

Parses Options, loads data, restores TimeGAN from checkpoint, generates exactly len(test) rows, and saves to OUTPUT_DIR/gen_data.npy. Keeps API aligned with current dataset/modules helpers.

… model hyperparams Adds DataOptions (seq-len, data-dir, orderbook-filename, splits, no-shuffle, keep-zero-rows) and ModulesOptions (batch-size, seq-len, z-dim, hidden-dim, num-layer, lr, beta1, w-gamma, w-g). Top-level Options forwards args via argparse.REMAINDER and returns opts.dataset / opts.modules namespaces for downstream loaders and trainers.

…ase training summary Incorporates the five-component list (Encoder, Recovery, Generator, Supervisor, Discriminator) and a concise three-phase training in the project report. Based on prior HackMD draft refined before this commit.

…dependencies table Introduces a linked ToC for quick navigation, expands project structure with brief per-file roles, and adds a version-pinned dependencies table with one-line use cases tailored to the TimeGAN LOB workflow.

…n placeholder Introduces detailed LOBSTER AMZN L10 dataset description and chronological split strategy (train/val/test). Notes that references will be added in a forthcoming update.

Previously added a StyleGAN2/ADNI BibTeX by mistake. Replace with the TimeGAN for LOBSTER (AMZN L10) entry and update the project URL.

…ecture text Embed modern HTML figure for the architecture PNG and rewrite component/flow sections for clarity and consistency. Remove training-specific notes from architecture and tighten wording.

Describe three-phase TimeGAN schedule (ER pretrain, Supervisor pretrain, Joint), loss design (MSE, BCE-with-logits, moment matching), metrics (KL on spread/returns, SSIM on heatmaps), and hardware/runtime setup (macOS M3 Pro, MLS/Metal).

Add kl_divergence_hist utility calls and a compact Rich table to display KL(spread) and KL(mpr) alongside SSIM when rendering real vs synthetic depth heatmaps.

…nor issues Add clear module and function docstrings for CLI entrypoints; correct small typos, unsafe attribute access, and inconsistent phrasing. No functional changes.

Refine wording and formatting across the report; integrate quantitative tables (SSIM, KL(spread), KL(mpr), TempCorr, LatDist), heatmap figures, and latent-walk panels; add concise error analysis and style space discussion.

Condense narrative, remove redundancy, and clarify metrics and plots. Sync CLI docs with current flags (viz and latent-walk), standardize figure captions, and correct minor grammar.

Add clear module and function docstrings for CLI entrypoints; correct small typos, unsafe attribute access, and inconsistent phrasing. No functional changes.

Use robust Conda bootstrap in batch (no conda init; create/update env only if missing), set PROJECT_ROOT/PYTHONPATH, and add log files. Sync flags and entrypoints with current code: --num-iters, correct AMZN L10 filename, python -m src.viz.visualise. Add metrics CSV output, explicit viz out-dir, and latent-walk flags. Enable set -euo pipefail and improve status prints.

Switch reader to markdown+tex_math_dollars+raw_tex, set resource-path for images, and document working pandoc command (tectonic engine). Resolve gfm raw_tex incompatibility and ensure inline LaTeX and photos render correctly.

Converted loss table to an HTML <table> with MathJax-friendly \( … \) inline math and proper escaping (e.g., \gamma, \mathcal{L}). Ensures equations render correctly in GitHub/Docs. Updated notes column and headings accordingly.

Reformatted the Model Architecture/Components block for GitHub: switched to MathJax-friendly inline math (\(…\)), cleaned up headings and lists, removed en/em dashes, and adjusted table/HTML so formulas and text render correctly.

keys-i · 2025-11-11T01:41:16Z

Please disregard the README uploaded to Turnitin. That version was incomplete and has since been replaced by the current, complete submission.

yexincheng · 2025-11-18T13:59:54Z

This is an initial inspection, no action is required at this point

Recognition Problem : total : 20

Solves problem: The solution is appropriate for the problem, reaching an SSIM of over 0.78 (Five trials were run; the presented results are from Trial 5). (5)
Implementation functions: Good (3)
Good design: Well-designed (1)
Commenting: Clear and sufficient comments throughout the code. (1)
Difficulty: Hard (10)

Note:

It is very clear to list arguments in the table, great job! I also like the file structure description part!
Some formula wasn't properly shown, might need to check the syntax.
Nice Interpretation of the results!
Different losses/metrics might have different ranges; it might be better to plot them in different figures or a nested figure.

gayanku · 2025-11-24T14:09:21Z

Marking

Good/OK/Fair Practice (Design/Commenting, TF/Torch Usage)
	Good design and implementation.
	Spacing and comments.
	Header blocks.
Recognition Problem
	Good solution to problem.
	Driver Script present.
	File structure present.
	Good Usage & Demo & Visualisation & Data usage.
	Module present.
	Commenting present.
	No Data leakage found.
	Difficulty : Hard. Hard Difficulty : TimeGAN
Commit Log
	Good Meaningful commit messages.
	Good Progressive commits.
Documentation
	Readme :Good.
	Model/technical explanation :Good.
	Description and Comments :Good.
	Markdown used and PDF submitted.
Pull Request
	Successful Pull Request (Working Algorithm Delivered on Time in Correct Branch).
	No Feedback required.
	Request Description is good.
TOTAL		0

Marked as per the due date and changes after which aren't necessarily allowed to contribute to grade for fairness.
Subject to approval from Shakes

keys-i added 30 commits October 2, 2025 13:18

chore(init): scaffold project structure

848349e

Create empty modules, configs, and test shells; no implementations yet.

docs(module): improve top-level module docstring

3203bb0

Clarify module purpose, responsibilities, and public API; add usage example and references. No functional changes.

code(script): fixed formatting issues

a815817

feat(data): add LOBSTERData with headerless support

14b75d1

Adds CLI smoke test, core/raw10 features, chronological split, train-only scaling, and windowing.

mm(preprocess): persist scaler/PCA/ZCA as .pkl for reproducible pipel…

f9e98b6

…ines & inverse-transform

feat(train): add end-to-end TimeGAN trainer for LOBSTER windows

53ee1ea

Supports windows.npz or on-the-fly preprocessing via LOBSTERData. Includes 3-phase schedule (AE -> SUP -> Joint), AMP toggle, grad clipping, basic checkpoints, and moment-loss validation.

refactor(dataset): simplify loader and convert to class-based API

bc932cc

Rewrote monolithic functions into a Dataset class with clear init/load/transform methods. Improves readability, reuse, and testability with no external behavior changes.

feat(train): add CLI entrypoint to run TimeGAN end-to-end

1291868

Parses Options, loads datasets via load_data, constructs TimeGAN, and executes the full three-phase schedule with checkpoints. Keeps modules/dataset imports minimal to match current package layout.

feat(viz): add sampling script to generate and save synthetic LOB data

8cd2b76

Parses Options, loads data, restores TimeGAN from checkpoint, generates exactly len(test) rows, and saves to OUTPUT_DIR/gen_data.npy. Keeps API aligned with current dataset/modules helpers.

keys-i added 20 commits October 21, 2025 15:19

docs(readme): add Dataset and Data Splits sections; references sectio…

9628bcf

…n placeholder Introduces detailed LOBSTER AMZN L10 dataset description and chronological split strategy (train/val/test). Notes that references will be added in a forthcoming update.

fix(docs): correct wrong BibTeX entry to TimeGAN LOBSTER citation

7a85351

Previously added a StyleGAN2/ADNI BibTeX by mistake. Replace with the TimeGAN for LOBSTER (AMZN L10) entry and update the project URL.

docs(readme): add TimeGAN model architecture figure and refine archit…

9c3bcde

…ecture text Embed modern HTML figure for the architecture PNG and rewrite component/flow sections for clarity and consistency. Remove training-specific notes from architecture and tighten wording.

feat(viz): report KL scores for spread and midprice returns

a42e209

Add kl_divergence_hist utility calls and a compact Rich table to display KL(spread) and KL(mpr) alongside SSIM when rendering real vs synthetic depth heatmaps.

docs(code): add PEP 257 docstrings to predict.py and train.py; fix mi…

02e4a47

…nor issues Add clear module and function docstrings for CLI entrypoints; correct small typos, unsafe attribute access, and inconsistent phrasing. No functional changes.

docs: tighten report wording and structure

a358dfd

Condense narrative, remove redundancy, and clarify metrics and plots. Sync CLI docs with current flags (viz and latent-walk), standardize figure captions, and correct minor grammar.

docs(code): add PEP 257 docstrings to helpers/ ; fix minor issues

9e4b14c

Add clear module and function docstrings for CLI entrypoints; correct small typos, unsafe attribute access, and inconsistent phrasing. No functional changes.

docs(code): add PEP 257 docstrings to dataset.py; fix minor issues

49b0823

Add clear module and function docstrings for CLI entrypoints; correct small typos, unsafe attribute access, and inconsistent phrasing. No functional changes.

docs(code): add PEP 257 docstrings to modules.py; fix minor issues

f109e9e

Add clear module and function docstrings for CLI entrypoints; correct small typos, unsafe attribute access, and inconsistent phrasing. No functional changes.

docs: fix README PDF rendering with Pandoc

e9a861d

Switch reader to markdown+tex_math_dollars+raw_tex, set resource-path for images, and document working pandoc command (tectonic engine). Resolve gfm raw_tex incompatibility and ensure inline LaTeX and photos render correctly.

keys-i force-pushed the topic-recognition branch 2 times, most recently from 8cd2b76 to 1291868 Compare November 10, 2025 00:46

hanemma7moud added _Hard _TimeGAN labels Nov 10, 2025

yexincheng added the checked-1st label Nov 19, 2025

wangzhaomxy added the Uploaded_PDF label Nov 24, 2025

gayanku added the SECOND_MARK label Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Topic 14: TimeGAN for Synthetic LOB Sequences (LOBSTER AMZN Level-10) 49088276 #282

Topic 14: TimeGAN for Synthetic LOB Sequences (LOBSTER AMZN Level-10) 49088276 #282

Uh oh!

keys-i commented Nov 5, 2025 •

edited

Loading

Uh oh!

keys-i commented Nov 11, 2025 •

edited

Loading

Uh oh!

yexincheng commented Nov 18, 2025

Uh oh!

gayanku commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Topic 14: TimeGAN for Synthetic LOB Sequences (LOBSTER AMZN Level-10) 49088276 #282

Are you sure you want to change the base?

Topic 14: TimeGAN for Synthetic LOB Sequences (LOBSTER AMZN Level-10) 49088276 #282

Uh oh!

Conversation

keys-i commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keys-i commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yexincheng commented Nov 18, 2025

This is an initial inspection, no action is required at this point

Uh oh!

gayanku commented Nov 24, 2025

Marking

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

keys-i commented Nov 5, 2025 •

edited

Loading

keys-i commented Nov 11, 2025 •

edited

Loading