-
Notifications
You must be signed in to change notification settings - Fork 9
feat(backend): restrict allowed filenames #5778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Code Review: feat(backend): restrict allowed filenamesSummaryThis PR adds robust filename validation to prevent issues with cross-platform compatibility. The implementation is well-structured with comprehensive test coverage. Overall, this is a solid implementation with just a few minor suggestions for improvement. ✅ Strengths
🔍 Issues & Suggestions1. Potential Character Encoding Issue (Minor)Location: The validation checks Suggestion: Consider whether you want to validate byte length instead: if (filename.toByteArray(Charsets.UTF_8).size > 255) {
throw UnprocessableEntityException(
"Invalid filename '$filename' in category '$category': Filenames may not exceed 255 bytes when encoded as UTF-8",
)
}However, this might be overly restrictive for the use case. If the files are primarily stored in S3 and downloaded via HTTP (which your PR description suggests), character length may be sufficient. This deserves consideration/discussion. 2. Special Windows Reserved Names Not Checked (Low Priority)Location: Windows has reserved filenames (CON, PRN, AUX, NUL, COM1-9, LPT1-9) that cannot be used regardless of extension. These would cause issues on Windows systems. Example problematic filenames: Reference: Your own documentation link mentions this: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file Suggestion: Add validation like: private val WINDOWS_RESERVED_NAMES = setOf(
"CON", "PRN", "AUX", "NUL",
"COM1", "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9",
"LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", "LPT9"
)
// In validateFilename():
val nameWithoutExtension = filename.substringBeforeLast('.', filename).uppercase()
if (nameWithoutExtension in WINDOWS_RESERVED_NAMES) {
throw UnprocessableEntityException(
"Invalid filename '$filename' in category '$category': This is a reserved name on Windows systems.",
)
}That said, this may be lower priority if files are typically accessed via web download rather than Windows filesystems. 3. Trailing Dots and Spaces (Low Priority)Location: Windows automatically strips trailing dots and spaces from filenames, which can cause confusion. For example, Suggestion: if (filename.endsWith('.') || filename.endsWith(' ')) {
throw UnprocessableEntityException(
"Invalid filename '$filename' in category '$category': Filenames may not end with dots or spaces.",
)
}4. Test Assertions Could Be More SpecificLocation: The tests use Suggestion: @Test
fun `filename with asterisk should fail validation`() {
val fileMapping = createFileMapping("sequences", listOf("file*.txt"))
val exception = assertThrows<UnprocessableEntityException> {
validator.validateFilenameCharacters(fileMapping)
}
assertTrue(exception.message!!.contains("forbidden characters"))
}5. Edge Case: Filename with Only Dots (Minor)Location: Filenames like Suggestion: Add validation: if (filename.all { it == '.' }) {
throw UnprocessableEntityException(
"Invalid filename '$filename' in category '$category': Filenames may not consist only of dots.",
)
}📊 Performance ConsiderationsNo concerns. The validation is O(n) where n is the filename length, which is acceptable given the 255-character limit. 🔒 Security ConsiderationsGood: The PR properly prevents path traversal characters ( Consider: If these filenames are ever used in shell commands, URL paths, or Content-Disposition headers, additional validation might be needed, but that should be handled at those specific points, not here. 📝 Documentation & Conventions✅ Follows conventional commit format: 🧪 Testing RecommendationsConsider adding integration tests that actually attempt to:
The unit tests are excellent, but integration tests would provide additional confidence. 🎯 VerdictRecommendation: Approve with optional follow-ups The core functionality is solid and ready to merge. The suggestions above are mostly edge cases that could be addressed in follow-up PRs if deemed necessary. The most important suggestion to consider is #1 (byte length vs character length) as it directly relates to the documented restriction. Great work on the comprehensive test coverage! 🎉 |
resolves #3999
This PR adds backend validation to prevent filenames to contain characters that are not allowed in some of the major operating systems, ASCII control characters and be longer than 255 characters. For the reason mentioned in #3999 (comment), this is not taking the restrictions of
Content-Disposition'sfilename=into account.Screenshot
PR Checklist
*in the name.🚀 Preview: https://filename-restriction.loculus.org