Overview
The deviceCpuTempHigh alarm (YANG identity: device-cpu-temp-high) is raised by VOS when one or more CPU cores exceed the platform's high-temperature threshold. The alarm is Critical severity and is sent to Syslog, SNMP, and Analytics. Left unresolved, sustained high CPU temperatures can cause hardware throttling, instability, or permanent damage to the appliance.
Affected Platforms
This alarm is supported on the following Versa and OEM appliance models:
| Platform Family | Models |
|---|---|
| Versa CSG (Intel) | CSG350, CSG355, CSG365, CSG730, CSG750, CSG770, CSG780, CSG1300, CSG1500, CSG2500, CSG3300, CSG3500 |
| Versa CSG (AMD) | CSG5000, CSG5200, CSG5250 |
| Versa CSX | CSX4100, CSX4300, CSX4500, CSX8300, CSX8500 |
| Dell VEP | VEP1425 (V210), VEP1445 (V220), VEP1485 (V240), VEP4600 (V910/V930) |
| Dell XR/R-Series | XR5610 (V920/V950), R7515 (V2800), R7615 (V2900) |
| Advantech FWA | FWA-1010, FWA-1012 (2C/4C/8C), FWA-1320, FWA-2320, FWA-AAL1010, FWA-5020, FWA-5070, FWA-L5070 |
| Lanner | NCA-1515 |
| Caswell | CAN0261, CAF0262, CAD0263 |
| ADI | 80500-0163, 80500-0214, 90500-0151 |
| Nexcom | DTA1152BC4 |
| Datacom | DM8630 |
| Riverbed | CX-580, CX-780, CX-3080, EX-6080 |
Note: This alarm is not supported on platforms not listed above (e.g., virtual/cloud appliances, CSG100/CSG200 series). On unsupported platforms VOS logs CPU Temp Alarm Not Supported at startup and the alarm will never fire.
Temperature Thresholds
VOS automatically selects the threshold at startup based on the detected chassis type:
| CPU Architecture | Threshold | Platforms |
|---|---|---|
| Intel (default) | 70 °C | All platforms not listed below |
| AMD | 80 °C | CSG5000/5200/5250, CSG1300/1500/2500, Dell R7515, R7615, XR5610 |
The alarm is raised as Critical when any single core exceeds the threshold. It clears automatically once all cores return below the threshold (with a 10-second soak time). VOS checks temperatures every 60 seconds.
Symptoms
- Alarm appears in Versa Director as deviceCpuTempHigh with Critical severity
- Alarm message format:
Device CPU Temperature crossed <threshold> on core(s) <core-ids> - Clear message:
Device CPU Temperature normal - Director health dashboard shows a hardware/equipment fault
- Possible secondary symptoms: application latency, packet drops, VOS process restarts due to CPU throttling
Diagnostic Commands
1. VOS CLI — Check Sensor Data
admin@device> show device sensors
This is the primary command. It runs /usr/bin/sensors internally and displays all thermal sensor readings through the VOS layer.
For temperature-only output:
admin@device> show device sensors temperature
Expected output — Intel platforms (Core temp format):
coretemp-isa-0000 Adapter: ISA adapter Core 0: +52.0°C (high = +80.0°C, crit = +100.0°C) Core 1: +49.0°C (high = +80.0°C, crit = +100.0°C) Core 2: +51.0°C (high = +80.0°C, crit = +100.0°C) Core 3: +50.0°C (high = +80.0°C, crit = +100.0°C)
Expected output — AMD platforms (Tccd format, e.g. CSG5000/5200/5250):
k10temp-pci-00c3 Adapter: PCI adapter Tctl: +63.4°C Tccd1: +58.8°C Tccd2: +57.5°C
If any core temperature is at or above the Versa threshold (70 °C for Intel / 80 °C for AMD), the alarm is expected. Temperatures significantly above these values indicate an active cooling problem.
2. VOS CLI — Check Active Alarms
admin@device> show alarms
Verify the device-cpu-temp-high alarm is present and note the affected core IDs from the alarm message.
3. IPMI Sensor Check (for IPMI-enabled platforms)
From the appliance shell:
sudo ipmitool sdr | grep -i temp
Or for a broader view:
sudo ipmitool sensor | grep -i temp
This provides raw BMC readings including inlet air temperature, CPU package temperature, and other thermal zones.
Triage Steps
- Confirm the affected cores — Note the core IDs from the alarm message, e.g.
Device CPU Temperature crossed 70 on core(s) 0 2. - Check current sensor readings — Run
show device sensorsand record current temperatures. Determine whether the alarm is still active or intermittent. - Check CPU utilization— High CPU load under sustained traffic or a software fault can drive temperatures up:
admin@device> show system resources
- Inspect physical environment — Verify ambient room temperature is within spec (typical operating range: 0–40 °C). Ensure all fans are operational and airflow is not obstructed; check for dust buildup on inlet/exhaust vents.
- Check fan alarms — A
deviceCpuTempHighalarm frequently accompanies a fan failure. Run:sudo ipmitool sdr | grep -i fan
A status ofns(not present) orcr(critical) on any fan should be resolved first. - Review system event log— Check BMC event log for prior thermal events:
sudo ipmitool sel list | grep -i temp
- Verify VOS startup log — At boot, VOS logs the detected chassis and threshold:
CHASSIS:[<id>] <name> CPU HighTemp Alarm Thresh <N>C
Confirm the correct threshold is applied for the platform.
Root Cause & Resolution
| Cause | Resolution |
|---|---|
| Fan failure / fan not detected | Resolve fan alarm first — see KB: Troubleshooting Fan Not Detected Alarm on Versa Appliances. RMA fan module if needed. |
| Blocked airflow / dust accumulation | Clean inlet and exhaust vents; ensure rack spacing meets deployment guidelines |
| High ambient temperature | Improve room cooling; verify datacenter/equipment room is within operating spec |
| Sustained high CPU utilization | Investigate traffic load, offload policies, or runaway processes driving CPU above normal |
| Thermal paste degradation (older appliances) | Re-apply CPU thermal compound; escalate to TAC for field service guidance |
| Incorrect chassis detection at boot | Check VOS startup log for chassis module ID; open TAC case if chassis is misidentified |
Alarm Auto-Clear Behavior
The alarm clears automatically once all monitored CPU cores drop back below the threshold. The OAM module checks temperatures every 60 seconds. After conditions normalize, allow up to 2 minutes for the alarm to self-clear in Director.
To manually clear a stale alarm in Director: Director → Monitor → Alarms → Select the alarm → Clear Alarm.
SNMP / Syslog Reference
- SNMP trap name:
deviceCpuTempHigh - Syslog tag:
device-cpu-temp-high - Event type: equipmentAlarm
- Alarm type value: 99 (VOS YANG enum)
- Destinations: Analytics, Syslog, SNMP (all enabled by default)
- Soak time: 10 seconds
Escalation
If temperatures remain elevated after addressing airflow, fan status, and CPU load, contact Versa Technical Assistance Center (TAC) with:
- Output of
show device sensors - Output of
show alarms - Output of
sudo ipmitool sdr(if shell access is available) - Output of
sudo ipmitool sel list - Appliance model, serial number, and VOS software version
- Description of ambient environment and any recent changes to the deployment