Skip to content

Concurrent uploads with the same filename overwrite and delete each other's work #1018

@YizukiAme

Description

@YizukiAme

Bug Description

/marker/upload derives its temporary storage path solely from the original multipart filename, so every request uploading report.pdf uses the same ./uploads/report.pdf path. Two overlapping requests with the same name race on the same file:

  1. The later request truncates and rewrites the earlier request's input
  2. One request can end up converting the other user's file
  3. Whichever request finishes first deletes the shared path via os.remove(upload_path), breaking the other

In a multi-user deployment this causes cross-request data corruption and sporadic conversion failures.

Root Cause

marker/scripts/server.py L25-26, L145-158:

UPLOAD_DIRECTORY = "./uploads"
...
upload_path = os.path.join(UPLOAD_DIRECTORY, file.filename)
with open(upload_path, "wb+") as upload_file:
    file_contents = await file.read()
    upload_file.write(file_contents)
...
results = await _convert_pdf(params)
os.remove(upload_path)

Steps to Reproduce

  1. Start the marker server
  2. Send two concurrent POST requests to /marker/upload with the same filename report.pdf but different contents
  3. Observe that one request converts the wrong file, and the first to finish deletes the input for the second

Expected Behavior

Each upload should be stored independently with a unique per-request temporary filename. Concurrent uploads with the same original filename should not interfere.

Suggested Fix

Use tempfile.NamedTemporaryFile or a UUID-based name under UPLOAD_DIRECTORY, and clean up in a finally block:

import uuid, os

safe_name = f"{uuid.uuid4().hex}.pdf"
upload_path = os.path.join(UPLOAD_DIRECTORY, safe_name)
try:
    # ... write, convert ...
finally:
    if os.path.exists(upload_path):
        os.remove(upload_path)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions