-
Notifications
You must be signed in to change notification settings - Fork 13
Implement tofile on tensors to reduce data write time by 40% #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #210 +/- ##
==========================================
+ Coverage 76.83% 76.92% +0.08%
==========================================
Files 40 40
Lines 4922 4992 +70
Branches 980 996 +16
==========================================
+ Hits 3782 3840 +58
- Misses 856 864 +8
- Partials 284 288 +4 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
|
cc @iksnagreb |
|
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
|
@titaiwangms @gramalingam this is ready for review, thanks. |
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
| """Return the bytes of the tensor.""" | ||
| return self._evaluate().tobytes() | ||
|
|
||
| def tofile(self, file) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether tofile() makes sense to LazyTensor. hmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought it's not even real until it's evaluated. Intuitively, not very suitable with tofile(), which we want to write it to disk. But I guess in general expectation, we want all tensors have this method. It's understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually useful: even when the tensor is lazily evaluated, we still want to avoid tobytes() making a copy of the tensor data before writing to file. The screenshots on the PR description are showing lazy tensors.
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
d527ac8 to
6036735
Compare
Signed-off-by: Justin Chu <[email protected]>



This PR introduces the
tofilemethod on tensors (similarly named as the one on numpy arrays), which allows for faster write and lower memory usage on external data by bypassing tobytes().Compatibility with existing
TensorProtocols is maintained in the external data module by usingtofileonly when it is available in the class. TheTorchTensorclass in PyTorch exporter should be updated accordingly to leverage the new logic when saving.Note that io time to disk is reduced by 40% below.
Note
TensorProtocol is not updated because we do isinstance() checks on external implementations (PyTorch). Adding the method in the protocol will cause isinstance check to fail on those implementations that have not added the tofile method.
Reference: https://github.com/microsoft/onnxscript/pull/2241/files/b2381658492510a9bcc8c0a8574db7368e33bceb
Before:
After:
Fix #207