Skip to content

Improve ZIP subtype detection and naming#69

Merged
PrzemyslawKlys merged 2 commits intomasterfrom
codex/zip-subtype-detection
Apr 6, 2026
Merged

Improve ZIP subtype detection and naming#69
PrzemyslawKlys merged 2 commits intomasterfrom
codex/zip-subtype-detection

Conversation

@PrzemyslawKlys
Copy link
Copy Markdown
Member

What changed

  • improved ZIP-derived subtype detection so apk, jar, epub, and related formats are recognized as their real type instead of collapsing to generic ZIP
  • expanded friendly-name coverage and declared-type comparison so operator-facing output and extension validation stay aligned
  • added regression tests covering subtype detection and naming

Why

ZIP-derived container formats were still easy to flatten back to zip, which made reporting less precise and could cause declared-type mismatch noise.

Impact

FileInspectorX now returns more accurate type labels for ZIP-derived formats, which gives downstream systems cleaner detection, comparison, and reporting data.

Validation

  • dotnet test FileInspectorX.Tests\\FileInspectorX.Tests.csproj

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@claude
Copy link
Copy Markdown

claude bot commented Apr 5, 2026

PR Review: Improve ZIP subtype detection and naming

Overall this is a clean, well-motivated change. The problem being solved is real, the implementation is mostly sound, and the tests are a good start. A few things worth addressing before merge:


Bug / Logic concern

FileInspector.DeclaredComparison.cs — early-return guard relocated

The original code had an early-return guard immediately after populating cmp from baseCmp:

if (detected == null || string.IsNullOrEmpty(decl))
    return cmp;

The new code inserts the guessOnly block before that guard, so the guard is now further down. This is safe at runtime because the new block's own null-checks (!string.IsNullOrEmpty(detGuess)) prevent execution when detected == null. However, it makes the control flow harder to reason about at a glance — someone reading the method will see the guard after the block and wonder if it can be reached with a null detected. Consider keeping the early-return guard in its original position and putting the new guessOnly logic after it:

if (detected == null || string.IsNullOrEmpty(decl))
    return cmp;

// New guessOnly block here — detected is guaranteed non-null
if (!string.IsNullOrEmpty(detGuess) && ...)
{ ... }

Test gap

Detect_Apk_ByZipSubtype_UsesSubtypeMime_And_DeclaredComparison — only APK is exercised

The PR description calls out apk, jar, epub, and related formats, but only APK has an end-to-end detection test. Given that TryGuessZipSubtype presumably has separate entry-content checks per format, a regression test for at least one other subtype (JAR via META-INF/MANIFEST.MF, EPUB via mimetype) would make the coverage match the stated scope.


Test assertion oddity

In the same test:

Assert.Equal("zip", res!.Extension);       // base type stays "zip"
Assert.Equal("apk", res.GuessedExtension); // subtype in GuessedExtension
Assert.Equal("application/vnd.android.package-archive", res.MimeType); // subtype MIME

The MimeType is now the subtype MIME while Extension stays "zip". This is intentional per the new code path, but the combination (Extension = "zip", MimeType = APK MIME) could surprise callers who expect MimeType to correspond to Extension. A brief comment in the test or in FileInspector.cs explaining this intentional split would help future readers.


Minor: duplicate xap handling

"xap" appears in both the new subtype switch in FriendlyNames.cs and in the existing extension-first switch. Since the subtype switch runs first the extension branch is unreachable for xap. Not a bug, just dead code worth removing to keep the two paths non-overlapping.


Minor: missing space in label string

In FileInspector.cs, the pattern detGuess! + "(guess)" (no space before the parenthesis) was already present in the original code, and the new block copies it. This is a pre-existing style inconsistency; low priority, but worth a one-line cleanup while the code is being touched.


What's good

  • The fallback logic in FileInspector.cs (MimeMaps.TryGetByExtension when guessMime is empty) is a sensible defensive layer.
  • The FriendlyNames expansion is a straightforward, easy-to-audit lookup table.
  • The theory-driven GetTypeLabel_Returns_Friendly_Label_For_Additional_Specialized_Types test is a nice pattern for covering many labels concisely.
  • The PR is scoped tightly; no unrelated changes.

Summary: The early-return relocation is the only issue worth fixing before merging. The test coverage gap (JAR/EPUB) and the xap dead code are nice-to-have improvements. Everything else is minor.

@PrzemyslawKlys PrzemyslawKlys merged commit 8e5d14c into master Apr 6, 2026
4 checks passed
@PrzemyslawKlys PrzemyslawKlys deleted the codex/zip-subtype-detection branch April 6, 2026 13:38
@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

PR Review: Improve ZIP subtype detection and naming

Good improvement overall — the detection pipeline for ZIP-derived formats is meaningfully more precise, and the added tests cover the key cases. A few things worth addressing:


Bug / Potential Regression

xap removed from extension switch without a full safety net (FriendlyNames.cs)

The case "xap" entry was removed from the extension-based switch and is now handled only by the subtype switch. If a .xap file is detected without going through the ZIP subtype path (e.g., a corrupted or truncated archive that fails the inner ZIP parse), GetTypeLabel will fall through the subtype switch with no match and then miss the extension switch entry — returning null instead of "Silverlight application package (XAP)". Since xap is a legitimate extension, it should be retained in the extension switch as a fallback.


Dead Code

detLabel assignment in the new CompareDeclared block is never observed (FileInspector.cs, lines ~129–136)

if (!string.IsNullOrEmpty(detGuess) &&
    !Equivalent(decl, det) &&
    Equivalent(decl, detGuess!))
{
    det = detGuess!;
    detLabel = detGuess! + " (guess)";  // ← never used
}

After this block, det == detGuess and Equivalent(decl, det) is guaranteed true, so mismatch will be false and the reason will always be "match"detLabel only appears in the mismatch branch ($"decl:{decl} vs det:{detLabel}"). The detLabel assignment can be removed.


Redundant Null Check

In CompareDeclaredDetailed (FileInspector.DeclaredComparison.cs, ~line 38):

if (!string.IsNullOrEmpty(decl) &&       // ← redundant
    !string.IsNullOrEmpty(detGuess) &&
    ...)

There's already an early return if (detected == null || string.IsNullOrEmpty(decl)) return cmp; four lines above, so decl is guaranteed non-empty here. Minor, but removes noise.


Missing Test Coverage

IPA subtype MIME fallback is untested. TryGuessZipSubtype explicitly sets mime = null for the IPA case. The new fallback then calls MimeMaps.TryGetByExtension("ipa", ...). If MimeMaps doesn't have an IPA entry, the resulting MimeType will silently remain application/zip. A test similar to the APK/JAR ones would pin this behavior and catch any MimeMaps gap.

Also, epub and odt/ods/odp (detected via the mimetype embedded file) are not covered by new tests, though they are lower priority.


Minor Nit

Inconsistent brace style in test file. The new tests use Allman style while the surrounding existing tests use K&R/inline style. The reformatting of Detect_Docx_ByZipRefinement mixes this within the same method. Not blocking, but a one-time pass to normalize the file would reduce future diff noise.


Summary

Severity Item
Low-medium xap removed from extension fallback — potential silent regression
Low Dead detLabel assignment in CompareDeclared new block
Low Redundant !string.IsNullOrEmpty(decl) guard
Low IPA MIME fallback behavior untested
Nit Mixed brace style in test file

The core logic — promoting guessMime to the detection result and upgrading DetectedExtension in the detailed comparison — is correct and well-reasoned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant