Skip to content

Invalid Input Error after manual mbox import #402

Description

@estroger34

I got the following error message after importing from an mbox file. The suggested solution did not help, as I didn't know how/where to enable ignore_errors. Neither import-mbox nor build-cache have such an argument.

Import complete.
  Imported:           E:\eml_to_mbox\tropos.mbox
  Processed:      158997 messages
  Added:          158997 messages
  Updated:        0 messages
  Skipped:        0 messages
  Labels updated: 0 messages
  Errors:         0
Warning: cache rebuild failed: create view sqlite_db.attachments: Invalid Input Error: CSV Error on Line: 3447
Original Line: 8183,239360,Plakat - DFP.pdf
Invalid unicode (byte sequence mismatch) detected.

Possible Solution: Enable ignore errors (ignore_errors=true) to skip this row

  file = E:/msgvault/.cache-tmp-3540984679/attachments.csv
  delimiter = , (Auto-Detected)
  quote = " (Auto-Detected)
  escape = \0 (Auto-Detected)
  new_line = Single-Line File (Auto-Detected)
  header = true (Set By User)
  skip_rows = 0 (Auto-Detected)
  comment = \0 (Auto-Detected)
  date_format =  (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding = 0
  sample_size = 20480
  ignore_errors = false
  all_varchar = 0

Run 'msgvault build-cache' to retry.
time=2026-06-18T06:01:25.827+02:00 level=INFO msg="msgvault exit" run_id=daa5abaecea4 outcome=ok

I did find a solution in the similar issue #95. It would be nice, though, to suggest this in the error message or give more details on "ignore errors = true".

Running msgvault repair-encoding found and fixed multiple attachments.filename issues and cache rebuild did work afterwards.

Thanks for a great piece of software!

Scanning messages for invalid UTF-8...
Scanned 100000 messages...
No messages needed repair
Scanning message_recipients display names for invalid UTF-8...
Scanned 100000 message_recipients display names...
Scanned 200000 message_recipients display names...
Scanned 300000 message_recipients display names...
Scanned 400000 message_recipients display names...
Scanned 500000 message_recipients display names...
Scanning participants display names for invalid UTF-8...
Scanning labels.name for invalid UTF-8...
Scanning attachments.filename for invalid UTF-8...
Repaired 73 attachments.filename values
Scanning conversations.title for invalid UTF-8...
Scanning conversations.source_conversation_id for invalid UTF-8...
Scanning participants.email_address for invalid UTF-8...
Scanning participants.domain for invalid UTF-8...

=== Repair Summary ===
  Filenames:     73
  Total fields:  73
Full rebuild: clearing existing cache...
Building cache...
  messages...               done (425ms)
  message_recipients...     done (209ms)
  message_labels...         done (3ms)
  attachments...            done (37ms)
  participants...           done (40ms)
  labels...                 done (3ms)
  sources...                done (3ms)
  conversations...          done (145ms)
  Total:                    866ms

Analytics cache rebuilt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions