Skip to content

Auto-install LFS transfer agent so clone/submodule-add work without manual setup #63

Description

@thevinchi

Auto-install LFS standalone-transfer-agent on first invocation so cloning / submodule-adding LFS repos works out of the box

Summary

When a user runs git clone s3://... (or git submodule add s3://...) against a repo that uses LFS, the initial checkout fails with:

batch request: ssh: Could not resolve hostname s3: Name or service not known: exit status 255

The README (LFS → Clone the repo) acknowledges this and instructs the user to recover with two manual steps:

cd lfs-repo-clone
git-lfs-s3 install
git reset --hard main

This surfaces the integration's seam in the user's face on first contact. It's especially painful for git submodule add, where:

  • The "fail, fix, retry" recipe is harder to apply (recovery requires deinit / re-add or a careful manual sequence).
  • A user can't even git submodule add cleanly without first running it with GIT_LFS_SKIP_SMUDGE=1, then running git-lfs-s3 install inside the submodule, then git lfs pull. That's three workarounds for one operation.

I'd like to propose git-remote-s3 install the LFS standalone-transfer-agent config automatically the first time it runs against a repo, so git clone / git submodule add work end-to-end without manual setup.

Happy to follow up with a PR once you confirm the direction. Related: #62 fixes the gitdir-resolution side of the submodule UX; this issue addresses the configuration side.

Why .lfsconfig won't work

The natural-looking answer is "let repo owners commit a .lfsconfig". This is not an option: git-lfs intentionally excludes lfs.customtransfer.<name>.path and lfs.standalonetransferagent from .lfsconfig's allowed keys. From git-lfs-config(5):

The set of keys allowed in this file is restricted for security reasons.

Allowing those keys in a file that's checked in would let a malicious repo execute an arbitrary binary on git checkout. So configuration has to live in the user's local/global git config — which is exactly the git-lfs-s3 install step.

Why auto-install during the remote helper's lifecycle is feasible

When git clone (or the clone phase of git submodule add) invokes git-remote-s3, git sets GIT_DIR to the new repo's gitdir before invoking the helper. The helper runs to completion (capabilities → list → fetch → unbundle) before git proceeds to checkout, which is when the LFS smudge filter runs. So the helper has a clean window to write to the local config and have it take effect before LFS needs it.

A git config --add invoked as a subprocess from the helper inherits GIT_DIR and writes to the right config file — including for submodules, where GIT_DIR resolves through the gitlink to <parent>/.git/modules/<path>/.

Proposed behavior

In S3Remote.__init__ (or cmd_capabilities), call a small helper:

def _maybe_install_lfs_config():
    """Set the LFS standalone-transfer-agent in the local repo's git config
    if not already configured. No-op if already set (to anything) so we never
    stomp a user's existing setup. Disable with GIT_REMOTE_S3_AUTO_INSTALL_LFS=0."""
    if os.environ.get("GIT_REMOTE_S3_AUTO_INSTALL_LFS", "1").lower() in ("0", "false", "no"):
        return
    try:
        existing = subprocess.check_output(
            ["git", "config", "--get", "lfs.standalonetransferagent"],
            text=True, stderr=subprocess.DEVNULL,
        ).strip()
        if existing:
            return  # already configured (by us or someone else); don't touch
    except subprocess.CalledProcessError:
        pass
    subprocess.run(
        ["git", "config", "--add", "lfs.customtransfer.git-lfs-s3.path", "git-lfs-s3"],
        check=False,
    )
    subprocess.run(
        ["git", "config", "--add", "lfs.standalonetransferagent", "git-lfs-s3"],
        check=False,
    )

Properties:

  • Idempotent: subsequent fetches don't re-add or duplicate entries.
  • Non-stomping: if a user has configured a different agent, we leave it alone.
  • Per-repo only: never writes to global config.
  • Harmless on non-LFS repos: the standalone-transfer-agent only fires on files matched by .gitattributes, so setting it in a repo without LFS has no runtime effect.
  • Opt-out: GIT_REMOTE_S3_AUTO_INSTALL_LFS=0 skips it entirely.

git-lfs-s3 install stays as-is for users who want explicit control or cross-tool scripting.

What this enables

# Today
git clone s3://bucket/lfs-repo  # ← fails on smudge
cd lfs-repo
git-lfs-s3 install
git reset --hard main           # ← required to retry smudge

# After this change
git clone s3://bucket/lfs-repo  # ← just works
# Today
GIT_LFS_SKIP_SMUDGE=1 git submodule add s3://bucket/lfs-repo path
cd path && git-lfs-s3 install && git lfs pull && cd ..
git submodule absorbgitdirs   # or whatever's needed to finish init

# After this change
git submodule add s3://bucket/lfs-repo path  # ← just works

The README's "Clone the repo" section under LFS can lose its workaround.

Open questions:

  1. Scope of opt-out: env var is the lightest knob; alternatively a git-remote-s3.auto-install-lfs git config. Preference?
  2. Detect LFS first? I lean toward "always try, idempotent, harmless if not LFS" rather than inspecting bundle contents. Inspecting adds complexity for marginal benefit. Agree/disagree?
  3. Documentation tone: should git-lfs-s3 install remain documented as the canonical setup, with auto-install treated as a transparent convenience, or should the README lead with auto-install?

If the direction looks right, I'm happy to follow up with a PR including tests (one covering the "config not set → install" path, one covering "existing config preserved", and one integration-ish test that exercises S3Remote.__init__ against a temp git repo).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions