37

Say I mount some cloud storage (Amazon Cloud Drive in my case) with a FUSE client at /mnt/cloud. But because reading and writing files directly to /mnt/cloud is slow because it has to go over the internet, I want to cache the files that I'm reading from and writing to cloud storage. Since I might be writing a lot of data at a time, the cache should sit on my disk and not in RAM. But I don't want to replicate the entire cloud storage on my disk, because my disk may be too small.

So I want to have a cached view into /mnt/cloud mounted at /mnt/cloud_cache, which uses another path, say /var/cache/cloud as the caching location.

If I now read /mnt/cloud_cache/file, I want the following to happen:

Check whether file is cached at /var/cache/cloud/file.

  1. If cached: Check file in cache is up-to-date by fetching modtime and/or checksum from /mnt/cloud. If it's up-to-date, serve the file from the cache, otherwise go to 2.
  2. If not cached or cache is out-of-date: Copy /mnt/cloud/file to /var/cache/cloud/file and serve it from the cache.

When I write to /mnt/cloud_cache/file, I want this to happen:

  1. Write to /var/cache/cloud/file and record in a journal that file needs to be written back to /mnt/cloud
  2. Wait for writing to /var/cache/cloud/file to be done and/or previous write backs to /mnt/cloud to be completed
  3. Copy /var/cache/cloud/file to /mnt/cloud

I have the following requirements and constraints:

  • Free and open source
  • Ability to set cache an arbitrary cache location
  • Ability to cache an arbitrary location (probably some FUSE mount point)
  • Transparent caching, i.e. using /mnt/cloud_cache is transparent to the caching mechanism and works like any other mounted file system
  • Keeping a record of what needs to be written back (the cache might get a lot of data that needs to be written back to the original storage location over the course of days)
  • Automatic deletion of cached files that have been written back or have not been accessed in a while
  • Consistency (i.e. reflecting external changes to /mnt/cloud) isn't terribly important, as I will probably have only one client accessing /mnt/cloud at a time, but it would be nice to have.

I've spent quite some time looking for existing solutions, but haven't found anything satisfactory.

4
  • 2
    Curious if you ever found a solution? Looking for a similar cache'ing layer with similar requires as your own. Commented Jan 20, 2017 at 16:28
  • 1
    bitbucket.org/nikratio/s3ql does pretty much what I want. However, unfortunately, it doesn't play too nicely with Amazon Cloud Drive in particular (mainly ACD's fault by lack of a good Linux client) Commented Jan 24, 2017 at 20:16
  • I've used s3ql in the past myself, but having migrated over to ACD for my files seemed to limit it's use with that provider. Did run into problems with data consistency with s3ql when data collections > 2TB. RClone seems promising but missing that vital caching piece. Commented Jan 25, 2017 at 21:33
  • If you are seriously interested in that - we can write it in C++, using tmpfs and stat. Commented Mar 6, 2017 at 15:08

3 Answers 3

10

Try using catfs, a generic fuse caching filesystem I'm currently working on.

3
  • 1
    From what I'm seeing up to now it works like a charm. Thanks a lot! Commented May 22, 2019 at 14:06
  • How is this different to the NFS middle layer solution below? Commented Jul 10, 2021 at 6:14
  • Catfs looks very nice. It is written in rust, that might be a reason why it is not debianized yet. An important thing: we are talking about 3 directories: 1) what do you cache 2) where are your cache files 3) where do you want the mount point. That was not very clear on the first spot, which one is what. Anyways I think it is best usable as a fast cache before slow fuse filesystems. | Bts, does it cache file attributes? Commented May 28 at 12:03
4

It is possible to use FS-Cache/CacheFS to cache a fuse-mounted system, by adding an NFS indirection inbetween: If your fuse mount is on /fusefs, then share it to yourself on nfs by writing this in /etc/exportfs:

/fusefs localhost(fsid=0)

Now you can do this:

mount -t nfs -o fsc localhost:/fusefs /nfs systemctl start cachefilesd

and /nfs will offer cached access to /fusefs.

I'm using this approach with sshfs as the back FS, it works nicely.

(Unfortunately, this only speeds up access of file contents; file metadata is not cached so stat and open are still slow).

3
  • 2
    What if sshfs connection fails? Wouldn't the nfs folder be treated as an empty folder and delete all files in the cached folder? Commented Jul 10, 2021 at 5:57
  • Does FS-Cache/CacheFS provide write cache? Commented Aug 18, 2021 at 14:30
  • There is a large problem, namely that nfs is... well... I just could not make is working. Enterprise guys went a little bit too far. Commented May 28 at 14:31
1

This is an ignorant sort of answer, since I haven't access to an Amazon cloud directory with which to test it. But in the "How to Do It" spirit: setup Amazon cloud to serve NFS, then login remotely to that NFS server using cachefilesd.

"Easier said than done..."

1
  • Nice idea, but the question asks for a general solution that will work with other could storage providers (which I'm also looking for). Commented Mar 21, 2024 at 13:50

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.