I am working with a remote parallel file system (CephFS), mounted at /mnt/mycephfs/, which contains a large dataset of small files (200 GB+). My application trains on these files, but reading directly from /mnt/mycephfs/ is slow due to parallel file system contention and network latency.
I am looking for a FUSE-based solution that can: 1. Take a list of files required by the application. 2. Prefetch and cache these files into a local mount point (e.g., /mnt/prefetched/) without replicating the entire remote storage (as my local RAM and disk space are limited).
The desired behavior: • If a file (e.g., /mnt/mycephfs/file) is already cached at /mnt/prefetched/file, it should be served from the cache. • If not cached, the solution should fetch the file (along with other files from the prefetch list), cache it at /mnt/prefetched/, and then serve it from there.
Are there existing tools or frameworks that support this kind of selective caching and prefetching using FUSE?
vmtouch -lmay be useful. This is not an answer because you explicitly asked for FUSE.