Evaluating minio alternatives
So after recent kerfuffle with Minio effectively deprecating OSS offering to focus on extracting money from AI bubble many hobbyist and small companies alike were left looking for solutions.
Personally I used it in few places, from home backup to build cache for CI/CD and few smaller work tasks so I will be evaluating replacement for these categories:
- ease of management from CI/CD - minio here is very friendly, you can generate any user/password and just set it so migrating from any existing system is pretty easy
- ease of setup - this is not looking from perspective of big setups, for those Ceph+RADOS is pretty much the only sensible option.
- ease of migration
- features related to backup safety like permissions or ILM features
- multi-site capability
- monitoring
- chance of getting rug-pulled again
What not be evaluted:
- web hosting features
The setup I initially used under minio was per-backup user that had read/write permissions + separately set ILM rules that kept the deleted files for a month so in case of encryption ransomware attack or any other case where attacker gets the backup credentials and starts to mess stuff up. The server was then replicated by replication builtin into minio into server at OVH, both being essentially independent servers.
Ceph + RADOSGW
Ceph is distributed storage system that provides multiple ways of accessing it, you can run VMs on it, you can share it via Swift or S3 protocols, even with little effort mount it as FUSE compatible file system. This is the big daddy of comparision - complex, eats RAM like there is no tomorrow (the doc say 4-5GB per OSD (effectively per drive). that is a lie. expect 8-10GB per hard drive (ours had 2TB NVME) in cluster.
S3 Implementation
The S3 implementation is the most complex and feature-full out of all competitors. It has lifecycle management, fine grained bucket ACLs, and most of the stuff you want for S3 regardless of what you use.
You have few options to create multi-zone setup but it's reasonably high amount of work, not something that can be run in like 2 commands like is the case for Minio
Hosting and management
It's big, and it's complex. There are pieces of software that make it easier (for example it is integrated as storage option for Proxmox), but the complexity is not worth it unless you really need the other features (like using it to store virtual machine images) or have a half a rack to fill with (relatively beefy) storage servers. Very sensitive to network speed, pretty much requires 10Gbit networking for anything sensible so single cross-DC cluster is neither recommended by docs nor would work with any decency.
It also has erasure coding so bigger setups waste less space on redundancy, nodes are automatically rebalanced if one drive is down.
Yo cannot set access key/secret, they always get auto-generated, so any current S3-targetting setup will need to get complete key rotation on all devices. But ACL system means you can just create a superuser that can sync the data off the old system to new via RCLONE or similar tools
No real web UI so you have to use CLI tools.
It has wide array of accessible metrics but mostly requires using their own tools
Very little chance for rugpull, tho now sadly owned by RedHat and one of first moves of them were pulling a bunch of docs behind the enterprise subscription. Now that IBM bought them it appears to be better and they are more interested in selling support for it to enterprises than to tormenting small companies that can't afford them in the first place.
GarageHQ
S3 implementation.
It's... there. There is no complex ACL of any kind, bucket ACL is either you have access, or you do not. There is no option for path based permissions or S3 like bucket ACLS, permissions are just read/write/ownership.
There is no ILM of any kind, so if key leaks and something deletes your data, it's done.
It has its own clustering that is made to be tolerant for high ping.
Migrating data is problematic because there is no * access policy so you'd need to pre-create buckets then add access to all of them to a migration user before you start shuffling data around
Hosting and management
Initial setup is very easy, put some initial setup in config file and off you go. Same config file is also used for CLI tool so overall aside from not having builtin web UI it's very user friendly. Putting multiple machines in cluster just requires you to enter their layout (amount of storage it can take, whether they are same or different location etc) and execute a plan, after which the shards will be rebalanced, which is similar to Ceph, but cross-DC setups are actually supported and it's designed for it.
It does de-duplicate data blocks unlike Minio
There is option to import key/secret pairs but only in garage's own format so you still need to re-key every node if you want to migrate to it.
It exposes Prometheus monitoring endpoint but it doesn't have per-bucket stats, tho those are easily accesible by REST/JSON API of the server
There is a 3rd party web UI available.
Given authors are vehemently against Minio's and similar companies business tactics and there is no CLA required for contribution it's the safest it can be
RustFS
S3 implementation
Appears to be relatively rich, supporting outright S3 ACLS, I've been able to just use same that I used for minio... except it doesn't work... and there is no docs about the syntax of the ACLs for RustFS.
There is some support for IAM but it is hard to judge it because the documentation is clearly not finished
It appears to have both clustering and option to mirror cross-region but again, docs say it's there but nothing about how to set it up.
The project appears to be in flux, a lot of the features present in UI/code are not documented or working poorly, there are also other weird issues, like I had to go to external S3 client to actually clean off bucket off files, web UI said it removed files but it didn't remove directories and I couldn't force it to do that.
Trying to set up bucket replication also ended up with nondescriptive errors.
Hosting and management
Similarly, very easy setup, it also comes with it's own web UI but no builtin CLI.
It does allow freeform access key/secret setup which makes migrating existing clients easy
There is also an option to export and import ZIP with all of the policies and users set up which is nice and unexpected feature.
Monitoring comes as option to send the data to external opentelemetry connector so for small setups it is a PITA to setup and I have not tested it yet.
Documentation is definitely under-baked, it described features (or supposed features, I didn't check it all)
The project uses CLA and has rugpull code already prepared so I'd be wary. I plan on using it on my OVH little machine as secondary storage for backups and see how the project develops.
SeaweedFS
It looked like Ceph level of complexity with some more bottlenecks (central server directing where shards go instead of algorithm based one like for Ceph) so I had little reason to check it on small scale
Summary
I went with GarageHQ - despise missing features it's a mature project with good documentation, RustFS really is in the early development phase and it has some serious bugs showing up (and to be fair to authors - getting fixed pretty quickly). I am eager to see where it goes but CLA and apparently code being already prepared to have commercial version are immediate red flags to similar business model that minio pulled of
I will run RustFS on secondary node with sync going there via rclone, just to have some redundancy and to use the ILM feature.