0

I am working with a remote parallel file system (CephFS), mounted at /mnt/mycephfs/, which contains a large dataset of small files (200 GB+). My application trains on these files, but reading directly from /mnt/mycephfs/ is slow due to parallel file system contention and network latency.

I am looking for a FUSE-based solution that can: 1. Take a list of files required by the application. 2. Prefetch and cache these files into a local mount point (e.g., /mnt/prefetched/) without replicating the entire remote storage (as my local RAM and disk space are limited).

The desired behavior: • If a file (e.g., /mnt/mycephfs/file) is already cached at /mnt/prefetched/file, it should be served from the cache. • If not cached, the solution should fetch the file (along with other files from the prefetch list), cache it at /mnt/prefetched/, and then serve it from there.

Are there existing tools or frameworks that support this kind of selective caching and prefetching using FUSE?

3
  • 1
    "without replicating the entire remote storage" -- If there is enough free RAM for your particular set of files then vmtouch -l may be useful. This is not an answer because you explicitly asked for FUSE. Commented Dec 11, 2024 at 7:53
  • 1
    This question is similar to: How can I transparently cache any directory or mounted file system for reads and write back?. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. Commented Dec 11, 2024 at 13:22
  • Although there are similarities, there are no prefetching component in the mentioned question @larsks Commented Dec 11, 2024 at 16:16

1 Answer 1

0

I don't think you need any FUSE for that (and anyways, FUSE is not very conductive to high-performance operations).

Instead, just mount your Ceph storage on /a, copy the paths you know you'll need to a local directory /b, and use OverlayFS, with /a as backing storage, and b/ as upper layer.

Reads will then go to a/ only if the file isn't present in b/.

But before I did that, I'd check whether just using your normal Ceph mount, and reading all relevant files (but not copying them anywhere, e.g., just doing cat /a/filename > /dev/null) is sufficient to make the kernel buffer their content in RAM, transparently.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.