Skip to content

file_read find_files() traverses hidden directories (.venv, .git, etc.) causing token explosion #405

@ranyakhemiri

Description

@ranyakhemiri

Bug

When file_read is called with a directory path, find_files() uses os.walk() and filters hidden files (file.startswith(".")) but does not filter hidden directories. This means os.walk descends into .venv, .git, __pycache__, node_modules, etc.

Impact

A .venv directory (~156MB) caused the tool to collect ~39M tokens worth of file content, crashing the agent with:

prompt is too long: 1069752 tokens > 200000 maximum

Reproduction

  1. Create a project with a .venv directory (any standard Python venv)
  2. Call file_read(path=".", mode="find") or file_read(path="data/", mode="view") where the agent resolves to a directory containing .venv
  3. The tool walks into .venv and returns thousands of files

Root Cause

In find_files(), the directory branch:

for root, _dirs, files in os.walk(pattern):
    if not recursive and root != pattern:
        continue
    for file in sorted(files):
        if not file.startswith("."):  # Skip hidden files
            matching_files.append(os.path.join(root, file))

Hidden files are skipped, but _dirs is never modified in-place. Per Python docs, modifying dirnames in-place is the standard way to prune os.walk traversal.

Possible Fixes

Several approaches could work — leaving it to maintainers to decide:

  • Minimal: Apply the same hidden-item logic to directories: dirs[:] = [d for d in dirs if not d.startswith(".")] — this alone catches .venv, .git, .mypy_cache, etc.
  • Configurable: Add a FILE_READ_SKIP_DIRS env var (consistent with existing FILE_READ_RECURSIVE_DEFAULT pattern)
  • Gitignore-aware: Use .gitignore rules to determine which directories to skip
  • Guard: Add a max file count or total size limit to prevent runaway traversals

Environment

  • strands-tools: latest (main)
  • Python 3.12
  • macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions