CNode Memory Consumption

Design Considerations

The VAST AI OS memory configuration strives to provide the best performance, scale, and resiliency possible. These are conflicting requirements:

Scale - Provide the largest possible memory for the core OS to support the largest possible number of clients, active protocol requests, and the many configuration objects in the system.
Performance - Be able to serve as many configurations and metadata objects from the node memory to provide the lowest response times. This is, of course, while keeping the CNodes stateless.
Resiliency - The cluster has two services that run on a single CNode:
- Cluster Leader - one of the CNodes is automatically elected to run this service.
  All CNodes must run with enough free memory to be elected as the leader at any given moment.
- VMS (VAST Management Server) - one of the CNodes is automatically elected to run the management service (this configuration can be customized with the dedicated VMS configuration).
  All CNodes must run with enough free memory to be elected to run VMS at any given moment.

To conclude, the memory configuration must accommodate a state in which a single CNode will run both the core OS, the cluster leader and VMS.

This explanation refers to CNodes but is relevant to EBox nodes as well in the case of an EBox cluster. For an EBox cluster, each CNode is also a DNode. The core OS memory consumption covers both CNode and DNode services.

CNode Memory Configuration

The CNode memory consists of the following:

Component	Description
VAST Core OS	The core OS, including the data ingest (protocols support), erasure coding, cluster services, DB, replication, global namespace, DataEngine, etc. This will consume: ~200GB on a 256GB CNode ~310GB on a 384GB CNode
Cluster Leader	A single CNode will also run the leader service. The leader consumes ~10-15GB.
VMS	A single CNode will also run the VMS service, which includes the management service itself, surrounding web services, and the PostgreSQL service. The VMS consumes ~10GB.
Linux OS	The Linux OS - the kernel and the surrounding Linux services.
Vendor-specific, other	Some server-specific HW monitoring services.
Free	This reserved memory acts as a buffer for dynamic runtime allocations, protecting the CNode from out-of-memory (OOM) errors that would trigger a panic and system reboot.

The following diagram visualizes how a 256GB CNode memory will be allocated:

A few notes:

VAST Platform and Leader memory are pre-allocated at node startup and take a few minutes. This explains why node memory consumption typically goes up during cluster power-on or after node failovers.
Pre-allocates all the memory it needs at node startup, which takes a few minutes.
VMS memory allocates/deallocates memory as ongoing operations are executed (monitoring tasks, etc.), which consume additional memory by themselves, and also trigger PostgreSQL queries, which by themselves require additional memory.
The design goal is to leave ~10-15% (~25-30GB in 256GB CNodes, 35-55GB in 384GB CNodes) free for the Linux OS and vendor-specific or any other custom services running in the server.
Additional third-party services, if any, will be installed and may consume memory that is outside of the standard calculations, which is why it’s important to coordinate any such requirement with VAST Data.

Troubleshooting Memory Consumption

There can be occasions in which memory consumption becomes higher than expected. In such cases, a VMS alarm may be triggered:

CNode cnode-128-1 (172.16.128.1) [Rack-CB1-U-bottom] memory usage reached to 98.0%

Information to collect and review

When such issues occur, the following information should be captured

Collect two debug bundles within 24 hours
If not possible to collect a full bundle, collect at least the outputs of:
- atop - click m, p
- atop -m 1 1 > atop_memory_report.txt
- atop -pm 1 1 > atop_grouped_memory.txt
- System-wide memory information:
  - date > memory_report.txt
  - cat /proc/meminfo >> memory_report.txt
  - vmstat -s >> memory_report.txt
  - sudo slabtop -o | head -n 20 >> memory_report.txt
- Process-specific memory information:
  - Collect the following information for the relevant processes - mostly the top memory-consuming processes (the core VAST OS processes, VMS, etc.)Eyal Traitel
  - pmap -x [PID] >> memory_report.txt
  - Leader process:
    - ssh `find-leader`
    - ps -aux | grep aaaa-bbbbccccdddd | awk '{print $2}' | head -1
- Docker statistics
  - docker stats --no-stream >> memory_report.txt
- Check for previous process kills
  - dmesg -T | grep -E -i "killed|oom|out of memory" >> memory_report.txt
- Automated analysis
  - Run luna analyze and luna analyze vms_memory