Skip to content

Conversation

@martindurant
Copy link
Member

Fixes #988

@ischurov , can you try this?

@ischurov
Copy link

ischurov commented Oct 7, 2025

Now I have this output (for the same example as here):

s3fs.__version__='2024.10.0+31.ge8f64a8'
['<test bucket>/a/', '<test bucket>/b', '<test bucket>/b/', '<test bucket>/c/']

Note that the output still differs from what I see if I do not invoke s3.find before s3.ls:

s3fs.__version__='2024.10.0+31.ge8f64a8'
['<test bucket>/a', '<test bucket>/b', '<test bucket>/c']

I don't know whether it is expected behavior or not.

Also, my initial problem with missing directories is still present.

@martindurant
Copy link
Member Author

Hmm, I'll try again when I have the time. I agree the two outputs should be the same... Whether ls() should show the placeholders, I'm not sure, since there're not "within" the listed directory (i.e., "<listed_path>/..."). I should definitely have tests around this!

@martindurant
Copy link
Member Author

@ischurov , the latest commit is my reproducer, so now I should have something concrete I can fix. Part of the problem, is that s3fs will not create any file ending in "/". That seems like a sane, if accidental, safeguard, but also inconvenient for some people.

@martindurant martindurant changed the title Prevent duplicated entries in find() in presence of directory markers Prevent duplicated entries in find() in presence of directory placeholders Oct 17, 2025
@martindurant
Copy link
Member Author

@ischurov : I think I have it. Can you please test for your usecase?

@ischurov
Copy link

ischurov commented Oct 22, 2025

Now I have this output if I have s3.find before s3.ls:

s3fs.__version__='2024.10.0+33.g3ade880'
['<test bucket>/a', '<test bucket>/b']

As before, if I do not have s3.find before s3.ls, the output is this:

s3fs.__version__='2024.10.0+33.g3ade880'
['<test bucket>/a', '<test bucket>/b', '<test bucket>/c']

So, duplication seems to be fixed, but my initial problem (disappearing directory) now reproduces in this settings (where we didn't see it previously).

Here is the directory bucket contents:

(0) $ aws s3 ls s3://<test bucket>/ --recursive
2025-10-06 14:50:36          0 a/
2025-10-08 19:56:42         29 a/hello.txt
2025-10-06 14:50:41          0 b/
2025-10-06 14:51:53     980388 b/Introductie pinda en ei bij baby's.pdf
2025-10-06 14:50:44          0 c/

One possible reason for disappearing of c/ in this case is that it is empty folder, though, we still see it if don't have s3.find before s3.ls, and also in my initial incident (in production settings) also non-empty folders disappeared.

@martindurant
Copy link
Member Author

OK, I've updated the test, so that if fails in the way you specify - I'll have to look again.

@martindurant
Copy link
Member Author

I tried again :)

@ischurov
Copy link

Thanks! Now my second testcase passes, but the initial one (where I first noticed that directory disappears) still fails. So my expectation that there should be the same cause is incorrect, and I need to make a clean repro of my initial testcase so it can be debugged

@ischurov
Copy link

Added the testcase, reproduces at 9a911bb.

@martindurant
Copy link
Member Author

Thanks, I'll incorporate that as an additional test here and then fix it. But probably I won't work on it now until Monday, if someone else wants to try a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

glob with ** leads to incorrect listings_cache state (missing and duplicated directories listings)

2 participants