Troubleshooting performance of VDO and NFS

Oct 20, 2018 · 7 min read

In setting up a local virtualization environment a little while back, I thought I’d try the recently GA’d VDO capabilities in the RHEL 7.5 kernel. These include data compression and de-duplication natively in the linux kernel (through a kernel module). This was Red Hat’s efforts behind the Permabit acquisition. Considering a virtualization data store is a prime candidate for a de-duplication use-case, I was anxious to reclaim some of my storage budget 🙂 . I was also curious to see what the extra overhead was like (if any), and understand the general performance characteristics of VDO.

I found VDO quite easy to setup, I followed this guide basically verbatim. Given my NFS virtualization back-end stores mostly the same OS images, I was happy to see excellent deduplication stats on my VDO device:

[root@nfs vms]# vdostats --si
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo0          2.5T    239.3G      2.3T   9%           60%

After a few months and several VMs spawned later, I noticed some slowness whenever I was doing high IO work like database updates and copying several gigs of data to disks all at once. (When my Satellite server downloads 50+ GB for a new content repo, and index’s it for example). My other VMs would notice a bit of a slowdown during this time. Given I hadn’t done much for tuning in this environment, it was probably time to look into it. I’ve also been debating upgrading my home lab to 10G networking and this seemed to line up with what I was seeing for storage performance. I thought I finally was being bottlenecked by the network, given I’ve got an SSD array in my NFS server, with a 4-port 1GB NIC in LACP. But before I went crazy buying 10G networking gear, I looked at tuning what I had.

The official documentation is fairly good at explaining the performance characteristics of VDO, and what you might want to tune. I also went into NFS server/client tuning as I hadn’t done much for this either. Given there’s a few things at play here (disk hardware performance, network performance, VDO optimization, NFS optimization) I quickly went down a few rabbit holes and realised I needed to do some basic benchmarking and baselining so I could understand which areas in this stack were actually performing well, and which ones were candidates for more tuning. In addition to the VDO tuning docs, here’s what I used for reference:

Firstly, it’s important to troubleshoot things in isolation, and use a benchmarking method that’s complimentary to isolation as well. I used the iperf3 utility for network benchmarking and fio utility for disk benchmarking. With this I’d do a series of sequential read, sequential write, random read, and random read and write tests, and these would be done both on local disk filesystems and over the network filesystem. For reference, here’s the fio commands:

# Sequential read
# fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=read --size=500m --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting

# Sequential write
# fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=write --size=500m --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting

# Random read
# fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=randread --size=500m --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reporting

# Random read and write
# fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=randrw --size=500m --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reporting

After reading the above guides and doing some basic investigating and benchmarking, this is what I ended up tuning first:

Overall the networking looked alright. I saw some dropped packets, but I’ve been doing a fair amount of cable pulling, stop/starting hosts, and VPN up/down. The NICs on NFS server and clients weren’t using their full ring buffer, so I changed this. I don’t think this was much of a candidate for the dropped packets, but this tuning couldn’t hurt.

[root@nfs]# ethtool -g eno1
Ring parameters for eno1:
Pre-set maximums:
RX: 2047
RX Mini: 0
RX Jumbo: 0
TX: 511
Current hardware settings:
RX: 200
RX Mini: 0
RX Jumbo: 0
TX: 511

# ethtool -G eno1 rx 2047
# ethtool -G eno2 rx 2047
# ethtool -G eno3 rx 2047
# ethtool -G eno4 rx 2047

# ethtool -g eno1
Ring parameters for eno1:
Pre-set maximums:
RX: 2047
RX Mini: 0
RX Jumbo: 0
TX: 511
Current hardware settings:
RX: 2047
RX Mini: 0
RX Jumbo: 0
TX: 511


[root@curie]# ethtool -g enp1s0
Ring parameters for enp1s0:
Pre-set maximums:
RX: 511
RX Mini: 0
RX Jumbo: 0
TX: 511
Current hardware settings:
RX: 200
RX Mini: 0
RX Jumbo: 0
TX: 511

# ethtool -G enp2s0 rx 511
# ethtool -G enp1s0 rx 511

# ethtool -g enp1s0
Ring parameters for enp1s0:
Pre-set maximums:
RX: 511
RX Mini: 0
RX Jumbo: 0
TX: 511
Current hardware settings:
RX: 511
RX Mini: 0
RX Jumbo: 0
TX: 511

I also increased the default number of NFS threads on the NFS server. Considering I’ve got 15+ VMs, each VM looks to use 2-3 nfs threads depending on number of disks, tuning the default of 8 to 20 should help for concurrent disk activity:

# egrep COUNT /etc/sysconfig/nfs
RPCNFSDCOUNT=20

Similarly, the VDO device I created only had 1 thread allocated in several places, so I upped these as well and doubled the cache size:

vdo modify --all --vdoLogicalThreads=4 --vdoPhysicalThreads=4 --vdoBioThreads=6 --vdoCpuThreads=6 --vdoAckThreads=2 --blockMapCacheSize=256M

After these changes, I still wasn’t seeing any significant performance change. I was getting fairly abysmal speeds even on a local SSD filesystem on the NFS server, not even going over the network. I started to isolate this, and started to suspect a hardware/SSD tuning related issue. After updating my DL360p 420i storage controller firmware, making sure the RAID controller cache was disabled, SSD smart path was on, it started to dawn on me. Previously, these six SSD drives had been used in a RAID5 configuration that saw a ton of heavy disk IO. I had dedicated this host to an OpenStack environment and had done several builds hammering these disks. RAID5 is not an optimal SSD configuration, parity calculation is expensive, and this would add a ton of disk IO and disk wear that wouldn’t be present in a RAID0 or RAID10 configuration. Essentially, I’ve got worn SSDs. I needed to do a secure erase of these SSDs to return their cells to as close to original factory condition as one could get. These disks are approx 3 years old and haven’t had this done yet.

After doing an enhanced secure erase, I saw my local disk storage speeds come back up about 5 fold. This was more in line with the newer SSDs in other servers. Doing disk tests over the network saw the same speed increase. So it looks like my problem was entirely hardware related :). As I’m now turning on all the VMs, I’m seeing a much quicker response when doing the high IO activities. There’s more than 16 NFS threads consistently in use now and I’m monitoring to see the change in VDO related performance. I need to research a utility to get accurate VDO stats, I think this likely will be with a PCP module. But at first glance, with not much concrete data to go on yet, I *think* the VDO tuning has helped as well.

While I learned a bit about VDO and NFS performance tuning, it looks like I might need to spend that 10G networking budget on new SSDs instead. There’s diminishing life left in these.