Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Comparison with the zarr format? #527

Open
julioasotodv opened this issue Sep 19, 2024 · 3 comments
Open

[Question] Comparison with the zarr format? #527

julioasotodv opened this issue Sep 19, 2024 · 3 comments

Comments

@julioasotodv
Copy link

Hi,

I know that safetensors are widely used nowadays in HF, and the comparisons made in this repo's README file make a lot of sense.

However, I am now surprised to see that there is no comparison with zarr, which is probably the most widely used format to store tensors in an universal, compressed and scalable way.

Is there any particular reason why safetensors was created instead of just using zarr, which has been around for longer (and has nice benefits such as good performance in object storage reads and writes)?

Thank you!

@User21T
Copy link

User21T commented Nov 5, 2024

Hello.

I don't represent Hugging Face or its position on the issue.
However, I think the main reason why creating safetensors was better than using zarr is that the latter is just an universal format to store any kind of tensor. Meanwhile, safetensors was specifically designed to store Machine Learning models and work within HF ecosystem.
It guarantees better performance, security and ML-specific types integration (Bfloat16, Fp8).

If I'm wrong, please correct me.

@julioasotodv
Copy link
Author

Thanks for the answer! I believe that zarr offers the same and more than safetensors (chunking, different compressions and others) except perhaps some of the specific dtypes such as bf16.

Thank you!

@User21T
Copy link

User21T commented Nov 16, 2024

You're welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants