I really want to like SpiderOak, especially when you consider the following features:
- Whole cloud de-duplication – All of the data you backup to spideroak, regardless of the source is de-duplicated
- The ability to share files in your cloud with others
- ‘Zero-knowledge’ encryption
- Cross platform client
- Support of open source
However, I keep finding problems that prevent me from using it as my primary backup software. As with BackBlaze I did some testing with Backup Bouncer v0.2.0 to see how the latest version of SpiderOak (v3.6.9680) fairs with the meta-data that Mac OS X generates. Results follow.
sh-3.2# ./bbouncer verify -d /Volumes/Src ../Dst Verifying: basic-permissions ... FAIL (Critical) Verifying: timestamps ... FAIL (Critical) Verifying: symlinks ... stat: ./symlink1: stat: No such file or directory FAIL (Critical) Verifying: symlink-ownership ... FAIL Verifying: hardlinks ... FAIL (Important) Verifying: resource-forks ... Sub-test: on files ... FAIL (Critical) Sub-test: on hardlinked files ... FAIL (Important) Verifying: finder-flags ... FAIL (Critical) Verifying: finder-locks ... FAIL Verifying: creation-date ... FAIL Verifying: bsd-flags ... FAIL Verifying: extended-attrs ... Sub-test: on files ... FAIL (Important) Sub-test: on directories ... FAIL (Important) Sub-test: on symlinks ... FAIL Verifying: access-control-lists ... Sub-test: on files ... FAIL (Important) Sub-test: on dirs ... FAIL (Important) Verifying: fifo ... FAIL Verifying: devices ... FAIL Verifying: combo-tests ... Sub-test: xattrs + rsrc forks ... FAIL Sub-test: lots of metadata ... FAIL
As you can see, SpiderOak fails all of the backup-bouncer tests. Combine this with the password issues I’ve mentioned previously and it looks like SpiderOak still has a ways to go before I can seriously consider using it to house my data.
How is de-duplication supposed to work in combination with local encryption?
Wuala for example offers local encryption but cannot offer de-duplication at the same time …
The only way encryption can be done with deduplication is with a known encryption key.
This doesn’t really mean it is insecure, just that the encryption key can be generated from the unencrypted data and stored along with the backup.
lets say you take a hash of an unencrypted chunk and use that to encrypt the chunk of data. You then encrypt the hash using your personal encryption key and upload the result with the encrypted chunk to the server. When restoring, that encrypted key is returned with the chunk so the client can decrypt the data.
Because the chunk is always encrypted with the same key on any system, the encryption key will always be the same even if the server never knows it and the encrypted chunk will always be the same for the same unencrypted chunk input. The encrypted version of the chunk can then be deduplicated as if it were unencrypted because it is essentially the same as the unencrypted version as far as the server knows or cares.
As for not storing meta-data, it is kept outside of the file and isn’t really part of it. Backup software could check it and store it if it wanted to, but I don’t think this is a priority as the creation date of a file is no where near as important as its actual contents.