Books dataset#21
Conversation
| # Books Dataset | ||
|
|
||
| The books.json is a subset from the openlibrary [books datasets](https://openlibrary.org/developers/dumps) | ||
|
|
There was a problem hiding this comment.
we would need to add the CC0 1.0 universal license here I think: https://openlibrary.org/help/faq/using#ownership
There was a problem hiding this comment.
@Haroenv To the best of my knowledge when it comes to CC0 1.0 universal license following rules apply.
- You may use the dataset for commercial purposes.
- No need to cite or reference the license.
- Attribution is optional, not required.
There was a problem hiding this comment.
@Haroenv if you insist will add a copy in the folder. Do advice.
There was a problem hiding this comment.
Thanks for digging in on the licensing, Ankur. Based on your research I agree with you.
|
Hey @originalankur, thanks for the PR. I had a look at the content of the file, and I'm afraid some of the books might contain sensitive content (at least one suspicious case of doxxing, and mentions of child pornography), that we don't really want in our public list of data. I cleaned the list and shrinked the number of books to ~24k rather than ~33k (which also puts the file size at 49MB, right below the suggested 50MB github limit). Can you pull it in to replace your version, please? |
|
@pixelastic Thank you for cleaning the data, I should have thought of this. I will update the PR. Thanks Tim. |
|
Hey @originalankur ping me once you've updated the PR and I'll merge it. Thanks. |
Extracted from open library dataset.