... 25000 small files
my first inclination would be that general system overhead and file system I/O, even though it is not disk I/O because you are running from tmpfs, is the main culprit causing variability, and also possibly along with the kerberos security setup... adding security to improve performance, said no one ever.
But, unless you run a real-time kernel to rule out variability, you will see some variability. And then to conclude it is because NFS v3 vs v4 or any other NFS parameter is likely to be mostly not true and you will be chasing your tail.
You did not mention many other important things -
- are you the only other one on the system (both nfs-server and nfs-client)
- is there any other network load;
- the network layout and hardware involved and is there a possibility that is having an affect? You set up the network and have full control of the switches/routers, or was that all set up and managed by people you've never met
I am not familiar with debian, but in RHEL there is tuned and the default profile is throughput-performance which * Broadly applicable tuning that provides excellent performance across a variety of common server workloads.* Other profiles that may be of benefit are latency-performance, network-latency, or network-throughput. That's the best I can tell you there, you would need to research more into it : https://www.redhat.com/sysadmin/linux-tuned-tuning-profiles
https://www.redhat.com/sysadmin/linux-tuned-tuning-profiles
you said 25,000 files less than 10kb each. That's about 250MB total of stuff, in my opinion too small use as a reference and unless you have both systems identical in hardware and identical in operating system setup and configuration with the only variable being NFS v3 versus v4.0/4.1/4.2 you will be chasing your tail with numbers.
For what it's worth, I tried to understand if there's a performance difference between NFS v3 versus v4.0/4.1/4.2 and udp versus tcp for the mount protocol. Using RHEL 8.8 on identical servers, having xeon 24-core cpu's and 768gb of ram, with a single file of test.tar that was ~25gb in size being copied via rsync -P to show speed, and using tmpfs to help rule out disk i/o, I saw no difference in performance, and this was over 100gbps InfiniBand using a Mellanox switch with just me on the servers and network in a lab environment. All different NFS parameters were able to achieve an observed max of around 490 MB/s and an average around 470 MB/s after the few minutes of time it took to copy - what was interesting is that after a reboot the first time the copy could be as slow as 340 MB/s, but doing subsequent copies I would eventually peak at 490 MB/s. That was as far as I was willing to go and spend time on this to understand if I should or should not use NFS v3 with UDP versus v4.2 and TCP in a cluster setup; I did find proto=rdma instead of TCP netted somewhere between 5% and 15% speed improvement on my one file copy; so proto=rdma is best. On a 1gbps network all that i've found is on the nfs-server side async versus sync to cause any significant speed improvement, on infiniband it had no affect. Could some combination of your NFS parameters be causing NFS v4.2 to be slower than v3, possibly I don't know. But to ask is NFS v4.2 slower than v3 based on what I've tried to understand I would say no and per documented improvements of nfs v.2 from v3 it is supposed to be all around better. And there is also pnfs supported in nfs v4.
I have also found significant speed differences between RHEL 7.9 and 8.8 in regards to infiniband and NFS; a scp in RHEL 8.8 will hit 1.0 GB/s versus < 600 MB/sec in RHEL 7.9, and similar differences in NFS where RHEL 8.8 is better, which is nfs-utils-1.3 versus nfs-utils-2.whatever in rhel8.8. I don't know debian, bookworm or bullseye, if you're not running identical operating systems and the latest NFS version... all that can matter in my opinion. i would like to see more NFS v4.2 performance numbers published, as what an admin should expect to see to know if things are configured properly or if there's a problem.