Skip to content

nyxCore-Systems/memento-replication-2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

memento-replication-2026

Replication package for Cross-Provider Replication of Memento-Skills Reflective Learning in Production (nyxCore Systems, 2026-Q2).

This repository is the public deliverable of a pre-registered observational study. It is the artefact a reviewer or independent researcher needs to reproduce every figure and statistical claim made in the resulting paper. It is not the production source of nyxCore — that lives at nyxCore-Systems/nyxcore-systems and is referenced here at a tagged commit for reproducibility.

Status

Phase Window This repo contains
Pre-registration locked 2026-04-28 → 2026-05-01 Frozen pre-registration, frozen DSGVO balancing test, schema-pipeline freeze pointer, empty data dirs
Data collection 2026-05-01 → 2026-05-29 no commits (data is captured into the production DB and snapshotted daily; not pushed here mid-flight)
Analysis 2026-05-29 → 2026-06-05 Anonymised dataset (CSV + Parquet), analysis scripts (Python + R), figure-generation notebooks
Manuscript draft 2026-06-05 → 2026-06-19 Manuscript in paper/ (LaTeX or Markdown source), regenerated figures
Public release 2026-06-29 arXiv preprint linked, repo tagged release/v1.0, dataset DOI minted on Zenodo or OSF

Current phase: Pre-registration locked (skeleton). The data/ and analysis/ directories are empty by design — they will populate at the post-collection stages above.

What this repo answers

The Memento-Skills paper (Zhou et al. 2026, arXiv:2603.18743) introduces a Read-Write Reflective Learning loop in which executable skills serve as evolving agent memory. The original paper validates the loop on a single LLM provider against benchmark suites (GAIA, HLE).

This study extends the empirical record along three dimensions the paper leaves open:

  1. Multi-provider skill transfer. Memento §3.1 limits experiments to Gemini-3.1-Flash. We run the loop across 8 providers (Anthropic, OpenAI, Google, Kimi, xAI, DeepSeek, OpenRouter meta, Ollama) and test cross-provider non-inferiority of evolved skill prompts.
  2. Production rather than benchmark workloads. No ground truth, only implicit user feedback. We measure whether the loop converges under that signal.
  3. Constitutional self-critique in the discovery branch. The Heuristica triad (Ipcha + Metis + Cael) is tested as an ablation against a Metis-only baseline.

Four primary hypotheses (H1–H4), pre-specified analyses, stop criteria, and explicit non-claims are documented in pre-registration/2026-q2-memento-replication.md.

Investigators

  • Oliver Baer (nyxCore Systems · Founder & Chief Architect) — primary investigator, system architecture, study design
  • Lisa Welsch (nyxCore Systems / CKB · Core Systems Engineer) — co-investigator, skill-memory architecture, morphone theory (Welsch 2024)
  • Martyna Kwiecień (nyxCore Systems / claritas-ai-consulting · Data Sovereignty Counsel) — co-investigator, DSGVO compliance, anonymisation protocol

Contents

.
├── README.md                          ← you are here
├── LICENSE                            ← MIT
├── CITATION.cff                       ← citeable metadata
├── pre-registration/
│   ├── 2026-q2-memento-replication.md     ← frozen at study lock (link to commit)
│   └── dsgvo-balancing-test-2026-q2.md   ← frozen at study lock
├── artefacts/
│   ├── source-snapshot-pointer.md         ← Git tag of nyxcore-systems source as it ran the study
│   ├── schema-freeze.md                   ← exact persona_skills + skill_executions schema during the window
│   └── canonical-prompts.md               ← Cael-judge prompt, classifyImplicitFeedback regex set, etc.
├── data/                              ← anonymised dataset, post-collection
│   ├── persona_skills.csv             ← (placeholder — empty pre-collection)
│   ├── skill_executions.csv           ← (placeholder)
│   ├── skill_versions.csv             ← (placeholder)
│   ├── skill_discovery_candidates.csv ← (placeholder)
│   └── persona_evaluations.csv        ← (placeholder)
├── analysis/                          ← reproducible figure-generation
│   ├── h1-utility-trajectory.py       ← (placeholder)
│   ├── h2-cross-provider-transfer.py  ← (placeholder)
│   ├── h3-heuristica-vs-metis.py      ← (placeholder)
│   ├── h4-constitutional-layer.py     ← (placeholder)
│   └── morphone-clusters.py           ← exploratory §5.2
├── scripts/
│   ├── anonymize-export.sql           ← runs against the production snapshot, produces /data
│   ├── k-anonymity-check.sql          ← verification harness (k ≥ 5, n ≥ 10)
│   └── deviation-log.md               ← every protocol deviation, with rationale
└── paper/                             ← manuscript source, post-analysis
    └── (empty)

How to cite

This repo will mint a Zenodo DOI on the public release (2026-06-29). Until then, the canonical citation is:

Baer, O., Welsch, L., & Kwiecień, M. (2026). Cross-Provider Replication of Memento-Skills Reflective Learning in Production. nyxCore Systems. https://github.com/nyxCore-Systems/memento-replication-2026

A CITATION.cff is provided.

How to reproduce (post-public-release)

git clone https://github.com/nyxCore-Systems/memento-replication-2026.git
cd memento-replication-2026
python -m venv .venv && source .venv/bin/activate
pip install -r analysis/requirements.txt
python analysis/h1-utility-trajectory.py
python analysis/h2-cross-provider-transfer.py
python analysis/h3-heuristica-vs-metis.py
python analysis/h4-constitutional-layer.py

Every figure in the published paper is regenerated from data/ by exactly one of these scripts. If you cannot reproduce a figure, that's a study failure — please open an issue.

Production source freeze

The nyxCore-Systems platform code as it ran the study is tagged in the production repo:

The landingpage source (specifically the mnemo.nyxcore.cloud dashboard as displayed during the study) is similarly tagged in nyxCore-Systems/nyxcore-landingpage at the same tag.

Privacy / DSGVO

All data in /data/ (post-collection) is anonymised before push. No tenant identity, no user identity, no free text. The full anonymisation protocol is in scripts/anonymize-export.sql and the verification harness in scripts/k-anonymity-check.sql. Both run on the production server before anything is exported to this repo.

The DSGVO Art. 6(1)(f) balancing test is in pre-registration/dsgvo-balancing-test-2026-q2.md.

Live during data collection

While the study runs (2026-05-01 → 2026-05-29) the public dashboard at mnemo.nyxcore.cloud shows live aggregates (with the same anonymisation). This repo will not have intermediate commits during the collection window — that's by design, to prevent post-hoc analysis adjustments.

Licence

MIT. See LICENSE.

Contact

research@nyxcore.cloud — methodology questions, replication issues, collaboration enquiries.

About

Pre-registered cross-provider replication of Memento-Skills Reflective Learning in production. Methodology-first, DSGVO-anonymised, 28-day window, 8 LLM providers.

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors