Overview

The deviceCpuTempHigh alarm (YANG identity: device-cpu-temp-high) is raised by VOS when one or more CPU cores exceed the platform's high-temperature threshold. The alarm is Critical severity and is sent to Syslog, SNMP, and Analytics. Left unresolved, sustained high CPU temperatures can cause hardware throttling, instability, or permanent damage to the appliance.


Affected Platforms

This alarm is supported on the following Versa and OEM appliance models:

Platform FamilyModels
Versa CSG (Intel)CSG350, CSG355, CSG365, CSG730, CSG750, CSG770, CSG780, CSG1300, CSG1500, CSG2500, CSG3300, CSG3500
Versa CSG (AMD)CSG5000, CSG5200, CSG5250
Versa CSXCSX4100, CSX4300, CSX4500, CSX8300, CSX8500
Dell VEPVEP1425 (V210), VEP1445 (V220), VEP1485 (V240), VEP4600 (V910/V930)
Dell XR/R-SeriesXR5610 (V920/V950), R7515 (V2800), R7615 (V2900)
Advantech FWAFWA-1010, FWA-1012 (2C/4C/8C), FWA-1320, FWA-2320, FWA-AAL1010, FWA-5020, FWA-5070, FWA-L5070
LannerNCA-1515
CaswellCAN0261, CAF0262, CAD0263
ADI80500-0163, 80500-0214, 90500-0151
NexcomDTA1152BC4
DatacomDM8630
RiverbedCX-580, CX-780, CX-3080, EX-6080

Note: This alarm is not supported on platforms not listed above (e.g., virtual/cloud appliances, CSG100/CSG200 series). On unsupported platforms VOS logs CPU Temp Alarm Not Supported at startup and the alarm will never fire.


Temperature Thresholds

VOS automatically selects the threshold at startup based on the detected chassis type:

CPU ArchitectureThresholdPlatforms
Intel (default)70 °CAll platforms not listed below
AMD80 °CCSG5000/5200/5250, CSG1300/1500/2500, Dell R7515, R7615, XR5610

The alarm is raised as Critical when any single core exceeds the threshold. It clears automatically once all cores return below the threshold (with a 10-second soak time). VOS checks temperatures every 60 seconds.


Symptoms

  • Alarm appears in Versa Director as deviceCpuTempHigh with Critical severity
  • Alarm message format: Device CPU Temperature crossed <threshold> on core(s) <core-ids>
  • Clear message: Device CPU Temperature normal
  • Director health dashboard shows a hardware/equipment fault
  • Possible secondary symptoms: application latency, packet drops, VOS process restarts due to CPU throttling

Diagnostic Commands

1. VOS CLI — Check Sensor Data

admin@device> show device sensors

This is the primary command. It runs /usr/bin/sensors internally and displays all thermal sensor readings through the VOS layer.

For temperature-only output:

admin@device> show device sensors temperature

Expected output — Intel platforms (Core temp format):

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +52.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:       +49.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:       +51.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:       +50.0°C  (high = +80.0°C, crit = +100.0°C)

Expected output — AMD platforms (Tccd format, e.g. CSG5000/5200/5250):

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +63.4°C
Tccd1:        +58.8°C
Tccd2:        +57.5°C

If any core temperature is at or above the Versa threshold (70 °C for Intel / 80 °C for AMD), the alarm is expected. Temperatures significantly above these values indicate an active cooling problem.

2. VOS CLI — Check Active Alarms

admin@device> show alarms

Verify the device-cpu-temp-high alarm is present and note the affected core IDs from the alarm message.

3. IPMI Sensor Check (for IPMI-enabled platforms)

From the appliance shell:

sudo ipmitool sdr | grep -i temp

Or for a broader view:

sudo ipmitool sensor | grep -i temp

This provides raw BMC readings including inlet air temperature, CPU package temperature, and other thermal zones.


Triage Steps

  1. Confirm the affected cores — Note the core IDs from the alarm message, e.g. Device CPU Temperature crossed 70 on core(s) 0 2.
  2. Check current sensor readings — Run show device sensors and record current temperatures. Determine whether the alarm is still active or intermittent.
  3. Check CPU utilization— High CPU load under sustained traffic or a software fault can drive temperatures up:
    admin@device> show system resources
  4. Inspect physical environment — Verify ambient room temperature is within spec (typical operating range: 0–40 °C). Ensure all fans are operational and airflow is not obstructed; check for dust buildup on inlet/exhaust vents.
  5. Check fan alarms — A deviceCpuTempHighalarm frequently accompanies a fan failure. Run:
    sudo ipmitool sdr | grep -i fan
    A status of ns (not present) or cr (critical) on any fan should be resolved first.
  6. Review system event log— Check BMC event log for prior thermal events:
    sudo ipmitool sel list | grep -i temp
  7. Verify VOS startup log — At boot, VOS logs the detected chassis and threshold:
    CHASSIS:[<id>] <name> CPU HighTemp Alarm Thresh <N>C
    Confirm the correct threshold is applied for the platform.

Root Cause & Resolution

CauseResolution
Fan failure / fan not detectedResolve fan alarm first — see KB: Troubleshooting Fan Not Detected Alarm on Versa Appliances. RMA fan module if needed.
Blocked airflow / dust accumulationClean inlet and exhaust vents; ensure rack spacing meets deployment guidelines
High ambient temperatureImprove room cooling; verify datacenter/equipment room is within operating spec
Sustained high CPU utilizationInvestigate traffic load, offload policies, or runaway processes driving CPU above normal
Thermal paste degradation (older appliances)Re-apply CPU thermal compound; escalate to TAC for field service guidance
Incorrect chassis detection at bootCheck VOS startup log for chassis module ID; open TAC case if chassis is misidentified

Alarm Auto-Clear Behavior

The alarm clears automatically once all monitored CPU cores drop back below the threshold. The OAM module checks temperatures every 60 seconds. After conditions normalize, allow up to 2 minutes for the alarm to self-clear in Director.

To manually clear a stale alarm in Director: Director → Monitor → Alarms → Select the alarm → Clear Alarm.


SNMP / Syslog Reference

  • SNMP trap name: deviceCpuTempHigh
  • Syslog tag: device-cpu-temp-high
  • Event type: equipmentAlarm
  • Alarm type value: 99 (VOS YANG enum)
  • Destinations: Analytics, Syslog, SNMP (all enabled by default)
  • Soak time: 10 seconds

Escalation

If temperatures remain elevated after addressing airflow, fan status, and CPU load, contact Versa Technical Assistance Center (TAC) with:

  1. Output of show device sensors
  2. Output of show alarms
  3. Output of sudo ipmitool sdr (if shell access is available)
  4. Output of sudo ipmitool sel list
  5. Appliance model, serial number, and VOS software version
  6. Description of ambient environment and any recent changes to the deployment