Overview
This guide provides comprehensive techniques for analyzing NFS performance issues using tshark and tcpdump. These tools help identify bottlenecks, latency issues, and problematic operations in NFS environments.
Basic Capture and Analysis
Initial Capture
# Capture all NFS traffic (default port 2049)
sudo tcpdump -i any "port 2049" -w capture.pcap
# Capture with size limit and rotation
sudo tcpdump -i any "port 2049" -w capture.pcap -C 100 -W 10
# -C 100: Max file size 100MB
# -W 10: Keep 10 files in rotation
Basic RPC Service Response Time (SRT) Analysis
# NFSv3 analysis
tshark -r capture.pcap -z rpc,srt,100003,3 -q
# NFSv4 analysis
tshark -r capture.pcap -z rpc,srt,100003,4 -qExample:
$ sudo tshark -r capture.pcap -z rpc,srt,100003,3 -q
Running as user "root" and group "root". This could be dangerous.
===================================================================
RPC SRT Statistics:
Filter: nfs.procedure_v3
Index Procedure Calls Min SRT Max SRT Avg SRT Sum SRT
1 GETATTR 6 0.000479 0.000857 0.000607 0.003641
2 SETATTR 1 0.000210 0.000210 0.000210 0.000210
3 LOOKUP 20 0.000235 0.000689 0.000496 0.009927
4 ACCESS 1 0.000326 0.000326 0.000326 0.000326
7 WRITE 43 0.001164 0.003525 0.002044 0.087903
8 CREATE 20 0.001284 0.003368 0.001761 0.035220
==================================================================
[vastdata@MetaSpaceWorker1 pandas]$Advanced Analysis Techniques
1. Per-Client Analysis
# Extract unique client IPs
tshark -r capture.pcap -T fields -e ip.src | sort | uniq > clients.txt
# Analyze performance per client
for client in $(cat clients.txt); do
echo "=== Client: $client ==="
tshark -r capture.pcap -Y "ip.src == $client" -z rpc,srt,100003,3 -q
done
2. Time-Based Analysis
# Split capture into time windows (e.g., 60-second intervals)
editcap -i 60 capture.pcap split.pcap
# Analyze each time window
for file in split_*.pcap; do
echo "=== Analyzing $file ==="
tshark -r $file -z rpc,srt,100003,3 -q
doneExample:
$ editcap -i 2 capture.pcap split.pcap
$ ls -lah
total 74M
drwxrwxrwx. 2 78910 vastdata 4.0K Jun 28 22:40 .
drwxr-xr-x. 2 78910 2000 4.0K Jun 28 08:17 ..
-rwx------. 1 root root 37M Jun 28 22:35 capture.pcap
-rwx------. 1 vastdata vastdata 28 Jun 28 22:39 clients.txt
-rwx------. 1 vastdata vastdata 3.8M Jun 28 22:40 split_00000_20250628223541.pcap
-rwx------. 1 vastdata vastdata 6.6M Jun 28 22:40 split_00001_20250628223544.pcap
-rwx------. 1 vastdata vastdata 13M Jun 28 22:40 split_00002_20250628223545.pcap
-rwx------. 1 vastdata vastdata 14M Jun 28 22:40 split_00003_20250628223547.pcap
$ for file in split_*.pcap; do echo "=== Analyzing $file ===" ; tshark -r $file -z rpc,srt,100003,3 -q ; done
=== Analyzing split_00000_20250628223541.pcap ===
===================================================================
RPC SRT Statistics:
Filter: nfs.procedure_v3
Index Procedure Calls Min SRT Max SRT Avg SRT Sum SRT
1 GETATTR 2 0.000506 0.000857 0.000682 0.001363
2 SETATTR 1 0.000210 0.000210 0.000210 0.000210
3 LOOKUP 4 0.000235 0.000689 0.000521 0.002083
4 ACCESS 1 0.000326 0.000326 0.000326 0.000326
7 WRITE 6 0.001164 0.002074 0.001588 0.009530
8 CREATE 4 0.001500 0.002014 0.001794 0.007175
==================================================================
=== Analyzing split_00001_20250628223544.pcap ===
===================================================================
RPC SRT Statistics:
Filter: nfs.procedure_v3
Index Procedure Calls Min SRT Max SRT Avg SRT Sum SRT
1 GETATTR 2 0.000479 0.000715 0.000597 0.001194
3 LOOKUP 3 0.000331 0.000545 0.000456 0.001368
7 WRITE 8 0.001349 0.003525 0.002489 0.019909
8 CREATE 3 0.001487 0.001676 0.001561 0.004683
==================================================================
=== Analyzing split_00002_20250628223545.pcap ===
===================================================================
RPC SRT Statistics:
Filter: nfs.procedure_v3
Index Procedure Calls Min SRT Max SRT Avg SRT Sum SRT
1 GETATTR 2 0.000524 0.000560 0.000542 0.001084
3 LOOKUP 7 0.000408 0.000589 0.000540 0.003779
7 WRITE 16 0.001447 0.002854 0.002111 0.033779
8 CREATE 7 0.001411 0.003368 0.001907 0.013350
==================================================================
=== Analyzing split_00003_20250628223547.pcap ===
===================================================================
RPC SRT Statistics:
Filter: nfs.procedure_v3
Index Procedure Calls Min SRT Max SRT Avg SRT Sum SRT
3 LOOKUP 6 0.000375 0.000524 0.000450 0.002697
7 WRITE 13 0.001433 0.002790 0.001899 0.024685
8 CREATE 6 0.001284 0.002185 0.001669 0.010012
==================================================================
$3. Operation-Specific Deep Dive
# Focus on slow operations (e.g., operations taking >5ms)
tshark -r capture.pcap -Y "rpc.time > 0.005" -T fields \
-e frame.time -e ip.src -e ip.dst -e rpc.procedure -e rpc.time
# Analyze specific operations (e.g., WRITE operations)
tshark -r capture.pcap -Y "nfs.procedure_v3 == 7" \
-T fields -e rpc.time \
| awk '{if ($1 != "") {sum+=$1; count++}} END {print "Avg WRITE latency:", sum/count*1000, "ms"}'
# If you need write sizes too, extract them separately
tshark -r capture.pcap -Y "nfs.procedure_v3 == 7 and nfs.count3" \
-T fields -e nfs.count3 \
| awk '{sum+=$1; count++} END {print "Avg WRITE size:", sum/count, "bytes"}'Example:
[vastdata@MetaSpaceWorker1 pandas]$ tshark -r capture.pcap -Y "rpc.time > 0.002" -T fields -e frame.time -e ip.src -e ip.dst -e rpc.procedure -e rpc.time
Jun 28, 2025 22:35:43.152428000 MST 172.200.204.4 172.200.13.56 8 0.002014000
Jun 28, 2025 22:35:43.156568000 MST 172.200.204.4 172.200.13.56 7 0.002074000
Jun 28, 2025 22:35:44.437504000 MST 172.200.204.4 172.200.13.56 7 0.002314000
Jun 28, 2025 22:35:44.438338000 MST 172.200.204.4 172.200.13.56 7 0.002865000
Jun 28, 2025 22:35:44.688833000 MST 172.200.204.4 172.200.13.56 7 0.003491000
Jun 28, 2025 22:35:44.689515000 MST 172.200.204.4 172.200.13.56 7 0.003525000
Jun 28, 2025 22:35:45.014094000 MST 172.200.204.4 172.200.13.56 7 0.002762000
Jun 28, 2025 22:35:45.261662000 MST 172.200.204.4 172.200.13.56 7 0.002367000
Jun 28, 2025 22:35:45.265180000 MST 172.200.204.4 172.200.13.56 7 0.002714000
Jun 28, 2025 22:35:45.265602000 MST 172.200.204.4 172.200.13.56 7 0.002242000
Jun 28, 2025 22:35:45.443772000 MST 172.200.204.4 172.200.13.56 7 0.002455000
Jun 28, 2025 22:35:45.448950000 MST 172.200.204.4 172.200.13.56 7 0.002020000
Jun 28, 2025 22:35:46.063459000 MST 172.200.204.4 172.200.13.56 7 0.002854000
Jun 28, 2025 22:35:46.205786000 MST 172.200.204.4 172.200.13.56 8 0.003368000
Jun 28, 2025 22:35:46.806685000 MST 172.200.204.4 172.200.13.56 8 0.002269000
Jun 28, 2025 22:35:46.811204000 MST 172.200.204.4 172.200.13.56 7 0.002390000
Jun 28, 2025 22:35:47.158509000 MST 172.200.204.4 172.200.13.56 7 0.002472000
Jun 28, 2025 22:35:47.159744000 MST 172.200.204.4 172.200.13.56 7 0.002548000
Jun 28, 2025 22:35:48.093065000 MST 172.200.204.4 172.200.13.56 7 0.002093000
Jun 28, 2025 22:35:48.482774000 MST 172.200.204.4 172.200.13.56 7 0.002099000
Jun 28, 2025 22:35:48.903600000 MST 172.200.204.4 172.200.13.56 8 0.002185000
Jun 28, 2025 22:35:48.905586000 MST 172.200.204.4 172.200.13.56 8 0.002174000
Jun 28, 2025 22:35:48.910427000 MST 172.200.204.4 172.200.13.56 7 0.002790000
Jun 28, 2025 22:35:48.911325000 MST 172.200.204.4 172.200.13.56 7 0.002135000
Jun 28, 2025 22:35:48.913962000 MST 172.200.204.4 172.200.13.56 7 0.002353000
$ tshark -r capture.pcap -Y "nfs.procedure_v3 == 7" -T fields -e frame.time -e rpc.time -e nfs.count3 | awk '{sum+=$2; count++} END {print "Avg WRITE latency:", sum/count*1000, "ms"}'
Avg WRITE latency: 28000 ms
$4. File Handle Analysis
# Extract file handles and their operation counts
tshark -r capture.pcap -T fields -e nfs.fh.hash -e nfs.procedure_v3 \
| sort | uniq -c | sort -rn | head -20
# Track specific file handle performance
tshark -r capture.pcap -Y "nfs.fh.hash == 0x12345678" \
-T fields -e frame.time -e rpc.procedure -e rpc.timeExample:
$ tshark -r capture.pcap -T fields -e nfs.fh.hash -e nfs.procedure_v3 \
| sort | uniq -c | sort -rn | head -20
1036
43 7
20 3
6 1
3 0xa801c973 7
3 0x88ca7993 7
3 0x5548932f 7
3 0x53468be7 7
3 0x4e8f3d35 7
3 0x2a64906b 7
2 0xfe88312d 7
2 0xe4e452e0 7
2 0xe05e698e 7
2 0xcbe67073 7
2 0x8defaa86 7
2 0x88120cc1 7
2 0x66137b8e 7
2 0x55b0388d 7
2 0x3b9a9f47 7
2 0x2edd1be7 7
$ tshark -r capture.pcap -Y "nfs.fh.hash == 0xa801c973" \
-T fields -e frame.time -e rpc.procedure -e rpc.time
Jun 28, 2025 22:35:45.256039000 MST 8 0.001610000
Jun 28, 2025 22:35:45.259295000 MST 7
Jun 28, 2025 22:35:45.262466000 MST 7
Jun 28, 2025 22:35:45.263360000 MST 7
$5. Error Analysis
# Find all NFS errors
tshark -r capture.pcap -Y "nfs.status3 != 0" \
-T fields -e frame.time -e ip.src -e nfs.procedure_v3 -e nfs.status3
# Common error codes:
# 2 = NOENT (No such file)
# 13 = ACCES (Permission denied)
# 28 = NOSPC (No space left)
# 70 = STALE (Stale file handle)
# See RFS1813 for full list
# https://www.ietf.org/rfc/rfc1813.txt6. Latency Distribution Analysis
# Create latency histogram
tshark -r capture.pcap -T fields -e rpc.time | \
awk '{
if ($1 < 0.001) bucket["<1ms"]++
else if ($1 < 0.005) bucket["1-5ms"]++
else if ($1 < 0.010) bucket["5-10ms"]++
else if ($1 < 0.050) bucket["10-50ms"]++
else bucket[">50ms"]++
} END {
for (b in bucket) print b, bucket[b]
}' | sortExample:
$ tshark -r capture.pcap -T fields -e rpc.time | \
awk '{
if ($1 < 0.001) bucket["<1ms"]++
else if ($1 < 0.005) bucket["1-5ms"]++
else if ($1 < 0.010) bucket["5-10ms"]++
else if ($1 < 0.050) bucket["10-50ms"]++
else bucket[">50ms"]++
} END {
for (b in bucket) print b, bucket[b]
}' | sort
1-5ms 63
<1ms 1176
$7. Compound Operations Analysis (NFSv4)
# Analyze NFSv4 compound operations
tshark -r capture.pcap -Y "nfs" -T fields \
-e nfs.procedure_v4 -e nfs.opcode -e rpc.time \
| awk '{ops[$2]++; time[$2]+=$3} END {
for (op in ops) print op, ops[op], time[op]/ops[op]*1000 "ms"
}'
Performance Metrics Script
Create a comprehensive analysis script:
#!/bin/bash
# nfs_perf_analyze.sh
PCAP_FILE=$1
OUTPUT_DIR="nfs_analysis_$(date +%Y%m%d_%H%M%S)"
mkdir -p $OUTPUT_DIR
echo "Analyzing NFS performance from $PCAP_FILE..."
# Overall statistics
echo "=== Overall RPC Statistics ===" > $OUTPUT_DIR/overall_stats.txt
tshark -r $PCAP_FILE -z rpc,srt,100003,3 -q >> $OUTPUT_DIR/overall_stats.txt
# Top talkers
echo "=== Top Clients by Request Count ===" > $OUTPUT_DIR/top_clients.txt
tshark -r $PCAP_FILE -T fields -e ip.src | sort | uniq -c | \
sort -rn | head -20 >> $OUTPUT_DIR/top_clients.txt
# Slow operations
echo "=== Operations Slower than 10ms ===" > $OUTPUT_DIR/slow_ops.txt
tshark -r $PCAP_FILE -Y "rpc.time > 0.010" -T fields \
-e frame.time -e ip.src -e ip.dst -e nfs.procedure_v3 -e rpc.time \
>> $OUTPUT_DIR/slow_ops.txt
# Error summary
echo "=== NFS Errors Summary ===" > $OUTPUT_DIR/errors.txt
tshark -r $PCAP_FILE -Y "nfs.status3 != 0" -T fields -e nfs.status3 | \
sort | uniq -c | sort -rn >> $OUTPUT_DIR/errors.txt
# Latency percentiles
echo "=== Latency Percentiles ===" > $OUTPUT_DIR/latency_percentiles.txt
tshark -r $PCAP_FILE -T fields -e rpc.time | sort -n | \
awk '{
arr[NR] = $1
} END {
print "P50:", arr[int(NR*0.5)]*1000, "ms"
print "P90:", arr[int(NR*0.9)]*1000, "ms"
print "P95:", arr[int(NR*0.95)]*1000, "ms"
print "P99:", arr[int(NR*0.99)]*1000, "ms"
print "P99.9:", arr[int(NR*0.999)]*1000, "ms"
}' >> $OUTPUT_DIR/latency_percentiles.txt
echo "Analysis complete. Results saved in $OUTPUT_DIR/"Example:
$ sudo bash ./nfs_perf_analyze.sh capture.pcap
Analyzing NFS performance from capture.pcap...
Analysis complete. Results saved in nfs_analysis_20250628_230245/
$ tree -ah nfs_analysis_20250628_230245/
nfs_analysis_20250628_230245/
├── [ 37] errors.txt
├── [ 92] latency_percentiles.txt
├── [ 778] overall_stats.txt
├── [ 36] slow_ops.txt
└── [ 81] top_clients.txt
0 directories, 5 files
$ cat nfs_analysis_20250628_230245/errors.txt
=== NFS Errors Summary ===
20 2
$ cat nfs_analysis_20250628_230245/latency_percentiles.txt
=== Latency Percentiles ===
P50: 0 ms
P90: 0 ms
P95: 1.164 ms
P99: 2.353 ms
P99.9: 3.368 ms
$ cat nfs_analysis_20250628_230245/overall_stats.txt
=== Overall RPC Statistics ===
===================================================================
RPC SRT Statistics:
Filter: nfs.procedure_v3
Index Procedure Calls Min SRT Max SRT Avg SRT Sum SRT
1 GETATTR 6 0.000479 0.000857 0.000607 0.003641
2 SETATTR 1 0.000210 0.000210 0.000210 0.000210
3 LOOKUP 20 0.000235 0.000689 0.000496 0.009927
4 ACCESS 1 0.000326 0.000326 0.000326 0.000326
7 WRITE 43 0.001164 0.003525 0.002044 0.087903
8 CREATE 20 0.001284 0.003368 0.001761 0.035220
==================================================================
$ cat nfs_analysis_20250628_230245/slow_ops.txt
=== Operations Slower than 10ms ===
$ cat nfs_analysis_20250628_230245/top_clients.txt
=== Top Clients by Request Count ===
896 172.200.13.56
343 172.200.204.4
$Real-time Monitoring
# Monitor NFS operations in real-time
$ sudo tshark -i any -f "port 2049" -Y "rpc.time" -T fields \
-e frame.time_epoch -e ip.src -e nfs.procedure_v3 -e rpc.time | \
awk '{
proc_name[1]="GETATTR"; proc_name[3]="LOOKUP"; proc_name[7]="WRITE";
proc_name[6]="READ"; proc_name[8]="CREATE"; proc_name[12]="REMOVE";
if ($3 && $4) { # Only process if we have both procedure and time
op = proc_name[$3] ? proc_name[$3] : "PROC_"$3;
latency = $4 * 1000; # Convert to ms
printf "[%s] %s -> %s: %.2fms\n",
strftime("%H:%M:%S", $1), $2, op, latency
if (latency > 10) print " *** HIGH LATENCY ***"
}
}'
Capturing on 'any'
83 [23:07:41] 172.200.204.4 -> LOOKUP: 0.61ms
[23:07:41] 172.200.204.4 -> CREATE: 2.54ms
[23:07:41] 172.200.204.4 -> WRITE: 1.13ms
[23:07:41] 172.200.204.4 -> WRITE: 2.52ms
[23:07:41] 172.200.204.4 -> WRITE: 2.49ms
[23:07:41] 172.200.204.4 -> GETATTR: 0.56ms
[23:07:41] 172.200.204.4 -> LOOKUP: 0.47ms
[23:07:41] 172.200.204.4 -> LOOKUP: 0.39ms
[23:07:41] 172.200.204.4 -> CREATE: 1.67ms
[23:07:41] 172.200.204.4 -> WRITE: 1.46ms
[23:07:41] 172.200.204.4 -> WRITE: 2.41ms
[23:07:41] 172.200.204.4 -> GETATTR: 0.47ms
[23:07:41] 172.200.204.4 -> CREATE: 1.84ms
[23:07:41] 172.200.204.4 -> WRITE: 2.21ms
[23:07:41] 172.200.204.4 -> WRITE: 1.79ms
[23:07:41] 172.200.204.4 -> LOOKUP: 0.67ms
[23:07:41] 172.200.204.4 -> CREATE: 2.46ms
[23:07:41] 172.200.204.4 -> WRITE: 2.43ms
[23:07:41] 172.200.204.4 -> WRITE: 2.00ms
[23:07:41] 172.200.204.4 -> WRITE: 2.34ms
[23:07:42] 172.200.204.4 -> GETATTR: 0.88ms
[23:07:42] 172.200.204.4 -> GETATTR: 0.70ms
[23:07:42] 172.200.204.4 -> LOOKUP: 0.32ms
...Visualization Preparation
# Export data for visualization (e.g., in Grafana or Excel)
tshark -r capture.pcap -T fields \
-e frame.time_epoch -e nfs.procedure_v3 -e rpc.time \
-E header=y -E separator=, > nfs_timeseries.csv
# Create per-minute aggregates
tshark -r capture.pcap -T fields -e frame.time_epoch -e rpc.time | \
awk '{
minute = int($1/60)*60
sum[minute] += $2
count[minute]++
} END {
for (m in sum) {
avg_ms = (sum[m]/count[m])*1000
print strftime("%Y-%m-%d %H:%M:00", m), avg_ms, count[m]
}
}' | sort > nfs_per_minute.csvAdvanced Filtering Examples
# Complex filter: Find CREATE operations from specific subnet taking >5ms
tshark -r capture.pcap -Y "ip.src == 192.168.1.0/24 && nfs.procedure_v3 == 8 && rpc.time > 0.005"
# Find operations on large files (>1MB READs/WRITEs)
tshark -r capture.pcap -Y "(nfs.procedure_v3 == 6 || nfs.procedure_v3 == 7) && nfs.count3 > 1048576"
# Track specific user operations (requires uid mapping)
tshark -r capture.pcap -Y "nfs.uid3 == 1000"
This comprehensive guide provides the tools and techniques needed to effectively troubleshoot NFS performance issues using packet analysis.