Replication package for Cross-Provider Replication of Memento-Skills Reflective Learning in Production (nyxCore Systems, 2026-Q2).
This repository is the public deliverable of a pre-registered observational study. It is the artefact a reviewer or independent researcher needs to reproduce every figure and statistical claim made in the resulting paper. It is not the production source of nyxCore — that lives at nyxCore-Systems/nyxcore-systems and is referenced here at a tagged commit for reproducibility.
| Phase | Window | This repo contains |
|---|---|---|
| Pre-registration locked | 2026-04-28 → 2026-05-01 | Frozen pre-registration, frozen DSGVO balancing test, schema-pipeline freeze pointer, empty data dirs |
| Data collection | 2026-05-01 → 2026-05-29 | no commits (data is captured into the production DB and snapshotted daily; not pushed here mid-flight) |
| Analysis | 2026-05-29 → 2026-06-05 | Anonymised dataset (CSV + Parquet), analysis scripts (Python + R), figure-generation notebooks |
| Manuscript draft | 2026-06-05 → 2026-06-19 | Manuscript in paper/ (LaTeX or Markdown source), regenerated figures |
| Public release | 2026-06-29 | arXiv preprint linked, repo tagged release/v1.0, dataset DOI minted on Zenodo or OSF |
Current phase: Pre-registration locked (skeleton). The data/ and analysis/ directories are empty by design — they will populate at the post-collection stages above.
The Memento-Skills paper (Zhou et al. 2026, arXiv:2603.18743) introduces a Read-Write Reflective Learning loop in which executable skills serve as evolving agent memory. The original paper validates the loop on a single LLM provider against benchmark suites (GAIA, HLE).
This study extends the empirical record along three dimensions the paper leaves open:
- Multi-provider skill transfer. Memento §3.1 limits experiments to Gemini-3.1-Flash. We run the loop across 8 providers (Anthropic, OpenAI, Google, Kimi, xAI, DeepSeek, OpenRouter meta, Ollama) and test cross-provider non-inferiority of evolved skill prompts.
- Production rather than benchmark workloads. No ground truth, only implicit user feedback. We measure whether the loop converges under that signal.
- Constitutional self-critique in the discovery branch. The Heuristica triad (Ipcha + Metis + Cael) is tested as an ablation against a Metis-only baseline.
Four primary hypotheses (H1–H4), pre-specified analyses, stop criteria, and explicit non-claims are documented in pre-registration/2026-q2-memento-replication.md.
- Oliver Baer (nyxCore Systems · Founder & Chief Architect) — primary investigator, system architecture, study design
- Lisa Welsch (nyxCore Systems / CKB · Core Systems Engineer) — co-investigator, skill-memory architecture, morphone theory (Welsch 2024)
- Martyna Kwiecień (nyxCore Systems / claritas-ai-consulting · Data Sovereignty Counsel) — co-investigator, DSGVO compliance, anonymisation protocol
.
├── README.md ← you are here
├── LICENSE ← MIT
├── CITATION.cff ← citeable metadata
├── pre-registration/
│ ├── 2026-q2-memento-replication.md ← frozen at study lock (link to commit)
│ └── dsgvo-balancing-test-2026-q2.md ← frozen at study lock
├── artefacts/
│ ├── source-snapshot-pointer.md ← Git tag of nyxcore-systems source as it ran the study
│ ├── schema-freeze.md ← exact persona_skills + skill_executions schema during the window
│ └── canonical-prompts.md ← Cael-judge prompt, classifyImplicitFeedback regex set, etc.
├── data/ ← anonymised dataset, post-collection
│ ├── persona_skills.csv ← (placeholder — empty pre-collection)
│ ├── skill_executions.csv ← (placeholder)
│ ├── skill_versions.csv ← (placeholder)
│ ├── skill_discovery_candidates.csv ← (placeholder)
│ └── persona_evaluations.csv ← (placeholder)
├── analysis/ ← reproducible figure-generation
│ ├── h1-utility-trajectory.py ← (placeholder)
│ ├── h2-cross-provider-transfer.py ← (placeholder)
│ ├── h3-heuristica-vs-metis.py ← (placeholder)
│ ├── h4-constitutional-layer.py ← (placeholder)
│ └── morphone-clusters.py ← exploratory §5.2
├── scripts/
│ ├── anonymize-export.sql ← runs against the production snapshot, produces /data
│ ├── k-anonymity-check.sql ← verification harness (k ≥ 5, n ≥ 10)
│ └── deviation-log.md ← every protocol deviation, with rationale
└── paper/ ← manuscript source, post-analysis
└── (empty)
This repo will mint a Zenodo DOI on the public release (2026-06-29). Until then, the canonical citation is:
Baer, O., Welsch, L., & Kwiecień, M. (2026). Cross-Provider Replication of Memento-Skills Reflective Learning in Production. nyxCore Systems. https://github.com/nyxCore-Systems/memento-replication-2026
A CITATION.cff is provided.
git clone https://github.com/nyxCore-Systems/memento-replication-2026.git
cd memento-replication-2026
python -m venv .venv && source .venv/bin/activate
pip install -r analysis/requirements.txt
python analysis/h1-utility-trajectory.py
python analysis/h2-cross-provider-transfer.py
python analysis/h3-heuristica-vs-metis.py
python analysis/h4-constitutional-layer.pyEvery figure in the published paper is regenerated from data/ by exactly one of these scripts. If you cannot reproduce a figure, that's a study failure — please open an issue.
The nyxCore-Systems platform code as it ran the study is tagged in the production repo:
- Repository:
nyxCore-Systems/nyxcore-systems - Tag:
study/2026-q2-memento-locked-v1.0(set at study lock, 2026-05-01) - Tag-pointing commit: see
artefacts/source-snapshot-pointer.md
The landingpage source (specifically the mnemo.nyxcore.cloud dashboard as displayed during the study) is similarly tagged in nyxCore-Systems/nyxcore-landingpage at the same tag.
All data in /data/ (post-collection) is anonymised before push. No tenant identity, no user identity, no free text. The full anonymisation protocol is in scripts/anonymize-export.sql and the verification harness in scripts/k-anonymity-check.sql. Both run on the production server before anything is exported to this repo.
The DSGVO Art. 6(1)(f) balancing test is in pre-registration/dsgvo-balancing-test-2026-q2.md.
While the study runs (2026-05-01 → 2026-05-29) the public dashboard at mnemo.nyxcore.cloud shows live aggregates (with the same anonymisation). This repo will not have intermediate commits during the collection window — that's by design, to prevent post-hoc analysis adjustments.
MIT. See LICENSE.
research@nyxcore.cloud — methodology questions, replication issues, collaboration enquiries.