Troubleshooting deviceCpuTempHigh Alarm on Versa Appliances : Versa Support

Overview

The deviceCpuTempHigh alarm (YANG identity: device-cpu-temp-high) is raised by VOS when one or more CPU cores exceed the platform's high-temperature threshold. The alarm is Critical severity and is sent to Syslog, SNMP, and Analytics. Left unresolved, sustained high CPU temperatures can cause hardware throttling, instability, or permanent damage to the appliance.

Affected Platforms

This alarm is supported on the following Versa and OEM appliance models:

Platform Family	Models
Versa CSG (Intel)	CSG350, CSG355, CSG365, CSG730, CSG750, CSG770, CSG780, CSG1300, CSG1500, CSG2500, CSG3300, CSG3500
Versa CSG (AMD)	CSG5000, CSG5200, CSG5250
Versa CSX	CSX4100, CSX4300, CSX4500, CSX8300, CSX8500
Dell VEP	VEP1425 (V210), VEP1445 (V220), VEP1485 (V240), VEP4600 (V910/V930)
Dell XR/R-Series	XR5610 (V920/V950), R7515 (V2800), R7615 (V2900)
Advantech FWA	FWA-1010, FWA-1012 (2C/4C/8C), FWA-1320, FWA-2320, FWA-AAL1010, FWA-5020, FWA-5070, FWA-L5070
Lanner	NCA-1515
Caswell	CAN0261, CAF0262, CAD0263
ADI	80500-0163, 80500-0214, 90500-0151
Nexcom	DTA1152BC4
Datacom	DM8630
Riverbed	CX-580, CX-780, CX-3080, EX-6080

Note: This alarm is not supported on platforms not listed above (e.g., virtual/cloud appliances, CSG100/CSG200 series). On unsupported platforms VOS logs CPU Temp Alarm Not Supported at startup and the alarm will never fire.

Temperature Thresholds

VOS automatically selects the threshold at startup based on the detected chassis type:

CPU Architecture	Threshold	Platforms
Intel (default)	70 °C	All platforms not listed below
AMD	80 °C	CSG5000/5200/5250, CSG1300/1500/2500, Dell R7515, R7615, XR5610

The alarm is raised as Critical when any single core exceeds the threshold. It clears automatically once all cores return below the threshold (with a 10-second soak time). VOS checks temperatures every 60 seconds.

Symptoms

Alarm appears in Versa Director as deviceCpuTempHigh with Critical severity
Alarm message format: Device CPU Temperature crossed <threshold> on core(s) <core-ids>
Clear message: Device CPU Temperature normal
Director health dashboard shows a hardware/equipment fault
Possible secondary symptoms: application latency, packet drops, VOS process restarts due to CPU throttling

Diagnostic Commands

1. VOS CLI — Check Sensor Data

admin@device> show device sensors

This is the primary command. It runs /usr/bin/sensors internally and displays all thermal sensor readings through the VOS layer.

For temperature-only output:

admin@device> show device sensors temperature

Expected output — Intel platforms (Core temp format):

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +52.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:       +49.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:       +51.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:       +50.0°C  (high = +80.0°C, crit = +100.0°C)

Expected output — AMD platforms (Tccd format, e.g. CSG5000/5200/5250):

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +63.4°C
Tccd1:        +58.8°C
Tccd2:        +57.5°C

If any core temperature is at or above the Versa threshold (70 °C for Intel / 80 °C for AMD), the alarm is expected. Temperatures significantly above these values indicate an active cooling problem.

2. VOS CLI — Check Active Alarms

admin@device> show alarms

Verify the device-cpu-temp-high alarm is present and note the affected core IDs from the alarm message.

3. IPMI Sensor Check (for IPMI-enabled platforms)

From the appliance shell:

sudo ipmitool sdr | grep -i temp

Or for a broader view:

sudo ipmitool sensor | grep -i temp

This provides raw BMC readings including inlet air temperature, CPU package temperature, and other thermal zones.

Triage Steps

Confirm the affected cores — Note the core IDs from the alarm message, e.g. Device CPU Temperature crossed 70 on core(s) 0 2.
Check current sensor readings — Run show device sensors and record current temperatures. Determine whether the alarm is still active or intermittent.
Check CPU utilization— High CPU load under sustained traffic or a software fault can drive temperatures up:
```
admin@device> show system resources
```
Inspect physical environment — Verify ambient room temperature is within spec (typical operating range: 0–40 °C). Ensure all fans are operational and airflow is not obstructed; check for dust buildup on inlet/exhaust vents.
Check fan alarms — A deviceCpuTempHighalarm frequently accompanies a fan failure. Run:
```
sudo ipmitool sdr | grep -i fan
```
A status of ns (not present) or cr (critical) on any fan should be resolved first.
Review system event log— Check BMC event log for prior thermal events:
```
sudo ipmitool sel list | grep -i temp
```
Verify VOS startup log — At boot, VOS logs the detected chassis and threshold:
CHASSIS:[<id>] <name> CPU HighTemp Alarm Thresh <N>C
Confirm the correct threshold is applied for the platform.

Root Cause & Resolution

Cause	Resolution
Fan failure / fan not detected	Resolve fan alarm first — see KB: Troubleshooting Fan Not Detected Alarm on Versa Appliances. RMA fan module if needed.
Blocked airflow / dust accumulation	Clean inlet and exhaust vents; ensure rack spacing meets deployment guidelines
High ambient temperature	Improve room cooling; verify datacenter/equipment room is within operating spec
Sustained high CPU utilization	Investigate traffic load, offload policies, or runaway processes driving CPU above normal
Thermal paste degradation (older appliances)	Re-apply CPU thermal compound; escalate to TAC for field service guidance
Incorrect chassis detection at boot	Check VOS startup log for chassis module ID; open TAC case if chassis is misidentified

Alarm Auto-Clear Behavior

The alarm clears automatically once all monitored CPU cores drop back below the threshold. The OAM module checks temperatures every 60 seconds. After conditions normalize, allow up to 2 minutes for the alarm to self-clear in Director.

To manually clear a stale alarm in Director: Director → Monitor → Alarms → Select the alarm → Clear Alarm.

SNMP / Syslog Reference

SNMP trap name: deviceCpuTempHigh
Syslog tag: device-cpu-temp-high
Event type: equipmentAlarm
Alarm type value: 99 (VOS YANG enum)
Destinations: Analytics, Syslog, SNMP (all enabled by default)
Soak time: 10 seconds

Escalation

If temperatures remain elevated after addressing airflow, fan status, and CPU load, contact Versa Technical Assistance Center (TAC) with:

Output of show device sensors
Output of show alarms
Output of sudo ipmitool sdr (if shell access is available)
Output of sudo ipmitool sel list
Appliance model, serial number, and VOS software version
Description of ambient environment and any recent changes to the deployment

Troubleshooting deviceCpuTempHigh Alarm on Versa Appliances