I was facing a huge issue with one of ESXI host. During backup, I was losing network connections to VMs (avg 5 - 10 mins) and sometimes VMs ended with disk consolidation issue. vMotion failed intermittently too. Initially, I thought it was a network card issue, eventually I ruled out network card failure. Then I started to check storage connectivity and HBA performance; looking for error and fail messages in vmkernel log file.
Login to ESXI host using SSH. Run following command to find active HBA.
esxcli storage san fc list
Adapter: vmhba2
Port ID: 011100
Node Name: 20:00:00:05:1e:fa:f4:82
Port Name: 10:00:00:05:1e:fa:f4:82
Speed: 8 Gbps
Port Type: NPort
Port State: ONLINE
Adapter: vmhba4
Port ID: 011100
Node Name: 20:00:00:05:1e:fa:f8:2b
Port Name: 10:00:00:05:1e:fa:f8:2b
Speed: 8 Gbps
Port Type: NPort
Port State: ONLINE
Now check vmkernel log using following command:
grep -i 'failed' /var/log/vmkernel.log | less
If you see errors like following, then consult following KB to understand what the error means. You will be able to find details using the code highlighted as red.
https://kb.vmware.com/s/article/1029039
2020-10-05T06:23:15.484Z cpu1:1587135)NMP: nmp_ThrottleLogForDevice:3302: Cmd 0x2a (0x43b5c1643380, 1782124) to dev "naa.60060e801054bf60056fd48600000009"
on path "vmhba2:C0:T3:L29" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
In my case, it means outdated HBA firmware. Usually, it is hardware failure not a firmware issue. I would suggest to replace the SFP before replacing the HBA card. If the issue is not resolved, you need to replace the HBA card.
If you replace the HBA card, you need to reconfigure fiber switch and SAN to connect with existing LUN.