Skip to content

Conversation

@mrk-its
Copy link

@mrk-its mrk-its commented Nov 12, 2019

Motivation:

S3FS backend do not work well if bucket contains file structure without proper directory markers, even in non-strict mode. This patch skips few directory checks for strict=False mode (for default, strict mode old behaviour should be preserved) and makes my integration tests happy.

It does the same as #51 but in few more places.

It should fix #55, #52 and #57 in non-strict mode

@mrk-its mrk-its force-pushed the non_strict_mode_improvements branch from 4986e8f to 5581a9e Compare November 12, 2019 23:02
@mrk-its mrk-its changed the title be less strict in non-strict mode Be less strict in non-strict mode Nov 12, 2019
@mrk-its
Copy link
Author

mrk-its commented Nov 12, 2019

@willmcgugan could you take a look, please?

@mrk-its
Copy link
Author

mrk-its commented Nov 12, 2019

Also, because we don't care about directory markers in non-strict mode I think about replacing makedir / makedirs implementations with 'do nothing' implementation. This way we can write code working for many filesystem protocols, but without creating not necessary directory markers on S3 (for example we can precede creation of file with makedirs(path, recreate=True) - it will create required directories on sftp / file / ftp filesystems and do nothing on s3 in non-strict mode). What do you think about that?

@willmcgugan
Copy link
Member

@mrk-its I don't see the benefit in that, apart from avoiding the extra work of creating directories. Unless there is some major bottleneck there, I would prefer if directories created in non-strict mode where still there when opened in strict mode.

@mrk-its
Copy link
Author

mrk-its commented Nov 17, 2019

@willmcgugan On my production S3 buckets I simply do not have these directory markers at all (Instead I see a lot of empty files with suffix _$folder$ - other way of marking directories, by Apache Hadoop) and I don't see any benefits having another placeholder files, especially in non-strict mode. But I probably can live with them (simply ignore my latest comment).

What about changes in this PR?

@davidparks21
Copy link

I also have this issue, I am working with a shared bucket where creating extra meta information objects in the bucket would be an unfortunate complication, and one certainly not to be followed by other folks accessing it via the CLI tools.

@nivm
Copy link

nivm commented Feb 17, 2020

@willmcgugan
I understand that you don't see the benefit in that, apart from avoiding the extra work of creating directories.

However, you don't always control the S3 bucket that you connect to.
Meaning you only have read access to that bucket.

What is the down side to get one of this PR's in #60 or #51 ?

@willmcgugan
Copy link
Member

@nivm I have a backlog of PRs and issues to look through, but fundamentally the problem is satisfying everyone's use case. It may not even be possible, given how S3 isn't quite a real filesystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error globbing on S3FS

4 participants