Say I mount some cloud storage (Amazon Cloud Drive in my case) with a FUSE client at /mnt/cloud. But because reading and writing files directly to /mnt/cloud is slow because it has to go over the internet, I want to cache the files that I'm reading from and writing to cloud storage. Since I might be writing a lot of data at a time, the cache should sit on my disk and not in RAM. But I don't want to replicate the entire cloud storage on my disk, because my disk may be too small.
So I want to have a cached view into /mnt/cloud mounted at /mnt/cloud_cache, which uses another path, say /var/cache/cloud as the caching location.
If I now read /mnt/cloud_cache/file, I want the following to happen:
Check whether file is cached at /var/cache/cloud/file.
- If cached: Check
filein cache is up-to-date by fetching modtime and/or checksum from/mnt/cloud. If it's up-to-date, serve the file from the cache, otherwise go to 2. - If not cached or cache is out-of-date: Copy
/mnt/cloud/fileto/var/cache/cloud/fileand serve it from the cache.
When I write to /mnt/cloud_cache/file, I want this to happen:
- Write to
/var/cache/cloud/fileand record in a journal thatfileneeds to be written back to/mnt/cloud - Wait for writing to
/var/cache/cloud/fileto be done and/or previous write backs to/mnt/cloudto be completed - Copy
/var/cache/cloud/fileto/mnt/cloud
I have the following requirements and constraints:
- Free and open source
- Ability to set cache an arbitrary cache location
- Ability to cache an arbitrary location (probably some FUSE mount point)
- Transparent caching, i.e. using
/mnt/cloud_cacheis transparent to the caching mechanism and works like any other mounted file system - Keeping a record of what needs to be written back (the cache might get a lot of data that needs to be written back to the original storage location over the course of days)
- Automatic deletion of cached files that have been written back or have not been accessed in a while
- Consistency (i.e. reflecting external changes to
/mnt/cloud) isn't terribly important, as I will probably have only one client accessing/mnt/cloudat a time, but it would be nice to have.
I've spent quite some time looking for existing solutions, but haven't found anything satisfactory.
- FS-Cache and CacheFS (https://www.kernel.org/doc/Documentation/filesystems/caching/fscache.txt) seems to only work with
nfsorafsfile systems and I don't know how to make it cache another FUSE file system or any general directory. - bcache (https://bcache.evilpiepirate.org/) seems to only work with block devices, i.e. couldn't cache another FUSE file system
- gcsfuse (https://github.com/GoogleCloudPlatform/gcsfuse) I think this does exactly what I want, but it's integrated with Google Cloud Storage. To make it work in general, I would have to hack it and change any accesses to GCS to local file accesses in the given mount-point or accesses to Amazon Cloud Drive