https://www.ibm.com/developerworks/linux/library/l-openpower-firmware-ipmi/index.html




This article describes the capabilities and features of OpenPOWER hardware and reliability, availability, and serviceability (RAS) components. This article also describes different OpenPOWER firmware features that use Intelligent Platform Management Interface (IPMI).

Firmware plays a critical role in the initialization and booting processes in servers. IBM Power Systems servers are developed based on the IBM® POWER® processor architecture and can be categorized into the following types:

  • IBM PowerVM®) based systems where the IBM POWER Hypervisor™ firmware is the hypervisor
  • IBM Power® non-virtualization based systems that are running under Open Power Abstraction Layer (OPAL) firmware

Based on the management controller, there are two types of Power non-virtualization systems:

  • Flexible service processor (FSP) based systems, where FSP is an IBM PowerPC® fourth-generation processor that manages the system.
  • Baseboard management controller (BMC) or OpenPOWER based system, where BMC is an ASPEED Technology (AST) 2400/2500 system on chip (SOC) that manages the system.

This article focuses on OpenPOWER BMC-based Power non-virtualization systems that are running in the OPAL mode. The firmware for the OpenPOWER systems is a combination of the BMC firmware component and the host firmware component.

  • BMC firmware stack is used to manage the server mostly for out-of-band or remote system operations such as power control operations, firmware flashing, field-replaceable unit (FRU) inventory information, local area network (LAN) configuration, and sensor readings.
  • OpenPOWER host firmware is a system initialization and boot time firmware for Power Systems servers. It contains the following subcomponents:
    • Self-boot engine
    • Hostboot (system initialization firmware for POWER)
    • Skiboot (OPAL boot and runtime firmware for POWER)
    • Skiroot kernel
    • Petitboot (kexec based bootloader)
    • On Chip Controller (OCC) firmware
    • Op-build (buildroot overlay for OpenPOWER)

The host firmware is the main boot firmware that stores in the processor NOR (PNOR) flash chip, and is used to initialize the host processor and make the system boot until runtime state after which the boot process is taken over by the Linux® operating system. OpenPOWER systems firmware has the following boot sequence after you power on the system:

  1. The hostboot firmware initializes the processor and memory subsystem.
  2. The Skiboot firmware (OPAL) initializes Peripheral Component Interconnect (PCI) and the devices that are outside the chip subsystem.
  3. The hostboot and Skiboot firmware generates a flatten-device-tree. 
    Device tree is a binary tree structure, where all the devices are nodes, and different features of the devices are property value pairs that are exposed to the Linux kernel to configure different device drivers.
  4. After the host firmware (OPAL) completes the booting and generation of the device tree, Skiroot is loaded. Skiroot is a combination of the Linux kernel and boot loader (petitboot). Petitboot is a kexec-based boot loader that can detect and load the boot devices. Skiboot (OPAL) firmware also contains an OPAL application programming interface (API) which are runtime hardware abstraction calls used by the Linux operating system to interact with low-level hardware.

The important OpenPOWER firmware IPMI features that are used to monitor, control, and debug the OpenPOWER system are described here.

  • System power control
  • Sensors
  • Thermal and Power management
  • System event log (SEL)
  • Field-replaceable unit (FRU) data
  • Boot progress codes
  • Serial-over LAN (SOL) console
  • System time and boot management
  • Firmware update procedure
  • Virtual Universal Asynchronous Receiver/Transmitter (VUART) to video graphics array (VGA) rendering
  • IPMI lock and unlock feature
  • IPMI PNOR reprovision feature

For all the IPMI features described in this article, invoke the following out-of-band ipmitool command as the base string when you run the commands for all operations:

$IPMI_CMD = ipmitool -I lanplus -H $BMCIP -U $USER -P $PASSWORD
$BMCIP     BMC IP address
$USER      BMC IPMI Username
$PASSWORD  BMC IPMI Password
$IPMI_CMD  IPMI tool command invocation
$CMD = $IPMI_CMD $Operation

System power control

All system power control operations are controlled by the BMC. The different power control operations are power status, power on, power off, power cycle, and power reset.

  • Power status: To know the power status of the OpenPOWER systems, run the following command:
    $IPMI_CMD chassis power status
  • Power on: The system can be powered on by using one of the following methods:
    • Run the following command to power on the system.
      $IPMI_CMD chassis power on
    • Push the power on button.
    • Wake on local area network (LAN).

    To track the power status during the boot process of an OpenPOWER system, the host status sensor provides a system Advanced Configuration and Power Interface (ACPI) power state. This sensor is set to the following states during booting.

    • Soft off: Before power is on (that is, when the system is in off state)
    • Legacy on: When the hostboot firmware boots (that is, when hostboot firmware is able to communicate with the BMC)
    • Working state: When OPAL starts the Linux kernel.

    More details of sensors are provided in the Sensors section.

  • Power off: The system can be powered off by using one of the following methods:
    • Graceful shutdown: Push the power button for less than 3 seconds or run the IPMI power soft command. BMC sends a System Management Software (SMS) attention (graceful operating system shutdown request) to the host operating system. Wait for the host status sensor to be set to the soft off state before BMC trigger hardware power off.
      $IPMI_CMD chassis power soft

      Before BMC issues hardware power off, it must set the host status sensor to the soft off state.

    • Forced power off: Push the power button for more than 3 seconds or run the following IPMI power off command:
      $IPMI_CMD chassis power off

      On forced power off, the system immediately goes to the standby state irrespective of the operating system status, and BMC sets the host status sensor to the softoff state.

  • Power reset: This operation is a hardware power off followed by the IPMI power on sequence.
    $IPMI_CMD chassis power reset
  • Power cycle: When the system is in the working state, the power cycle operation reboots the system. It is a graceful operating system shutdown followed by an IPMI power reset.
    $IPMI_CMD chassis power cycle
  • Power policy: Based on the system power policy, BMC must restore to a particular power state when power supply is lost. What this means is that a server can be powered on automatically according to this power policy setting. To list the available supported power policies, run the following command:
    $IPMI_CMD chassis power policy list

    To set to a particular power policy, run the following command:

    $IPMI_CMD chassis power policy <always-on/always-off/previous>

Sensors

Sensors are software or hardware physical entities that are used to monitor the health of different system hardware devices or software components. There are two types of sensors.

  • A virtual sensor that does not exist physically. However, it represents the status of some software or hardware operation such as processor/Dual Inline Memory Module (DIMM) functioning or host status.
    $IPMI_CMD sensor list

    or

    $IPMI_CMD sdr elist

    The following example shows the virtual sensor Host Status usage.

    During system run-time, the sensor status shows as S0/G0: WORKING, and when the system is in standby state or powered off state, the sensor status shows as S5/G2: SOFT-OFF.

    HOST STATUS      | 04H | OK  | 35.0 | S0/G0: WORKING
    HOST STATUS      | 04H | OK  | 35.0 | S5/G2: SOFT-OFF
  • A physical sensor on the motherboard that reads the status of hardware devices such as processor/DIMM temperatures and frequencies.

    CPU Temp         | 64h | ok  |  3.0 | 42 degrees C
    Membuf Temp 0    | 65h | ok  |  7.0 | 42 degrees C
    DIMM Temp 0      | 69h | ok  | 32.0 | 28 degrees C
    DIMM Temp 1      | 6Ah | ns  | 32.1 | No Reading

System thermal or power management

This feature is mainly used to monitor and control system power and thermal levels. BMC keeps monitoring certain temperatures (processor, DIMM, and so on) with the help of the OCC and sets the fan speed based on the current readings. You can use the power reading and capping functions to read and maintain the power level of the system. You can run the following commands to deal with platform power limits:

$IPMI_CMD dcmi power
power <command>
reading       Get power related readings from the system
get_limit     Get the configured power limits
set_limit     Set a power limit option
activate      Activate the set power limit
deactivate    Deactivate the set power limit
  • Run the following command to find the platform power readings:

    $IPMI_CMD dcmi power reading
    Instantaneous power reading:                   237 Watts
    Minimum during sampling period:                233 Watts
    Maximum during sampling period:                240 Watts
    Average power reading over sample period:      236 Watts
    IPMI timestamp:                                Mon Nov  7 13:20:22 2016
    Sampling period:                               00000010 Seconds.
    Power reading state is:                        activated
  • Run the following command to find the active power limit. The following output shows that the current power limit is 1000 watts.

    $IPMI_CMD dcmi power get_limit
    Current Limit State: Power Limit Active
    Exception actions:   Log Event to SEL
    Power Limit:         1000 Watts
    Correction time:     1000 milliseconds
    Sampling period:     10 seconds
  • Run the following command to set the active power limit:

    $IPMI_CMD dcmi power set_limit limit 1050
    Current Limit State: Power Limit Active
    Exception actions:   Log Event to SEL
    Power Limit:         1050 Watts
    Correction time:     1000 milliseconds
    Sampling period:     10 seconds
  • Run the following command to activate the power limit:

    $IPMI_CMD  dcmi power activate
    Power limit successfully activated
  • Run the following command to deactivate the power limit:

    $IPMI_CMD dcmi power deactivate
    Power limit successfully deactivated

System event log

The BMC is the master of system event log repository and maintains all firmware system event logs. All low-level events in the system firmware happen during system boot and runtime. Also, all the critical failures in the hardware and firmware are logged in the BMC System event log repository. 64 KB of system event log and extended system event log data can be stored.

$IPMI_CMD SEL LIST
1 | 11/07/2016 | 01:44:49 | TEMPERATURE #0X30 | UPPER CRITICAL GOING HIGH | ASSERTED
2 | 11/07/2016 | 01:45:08 | VOLTAGE #0X60 | LOWER CRITICAL GOING LOW  | ASSERTED
3 | 11/07/2016 | 01:45:25 | MEMORY #0X53 | CORRECTABLE ECC | ASSERTED

Boot progress

System boot progress can be tracked by seeing the SOL console where hostboot and OPAL emit their appropriate progress codes or by updating system firmware progress and operating system boot sensors to their respective boot states (that is, processor initialization, motherboard initialization, and PCI initialization).

Hostboot progress codes:

3.42145|Ignoring boot flags, incorrect version 0x0
3.68858|ISTEP  6. 3
4.13466|ISTEP  6. 4
4.13535|ISTEP  6. 5
15.27375|HWAS|PRESENT> DIMM[03]=AAAAAAAAAAAAAAAA
15.27375|HWAS|PRESENT> Membuf[04]=CCCC000000000000
15.27376|HWAS|PRESENT> Proc[05]=C000000000000000
28.57643|ISTEP  6. 6
28.66284|ISTEP  6. 7
28.66337|ISTEP  6. 8
28.69192|ISTEP  6. 9
31.32017|ISTEP  6.10
31.36741|ISTEP  6.11
33.17372|ISTEP  6.12
33.17531|ISTEP  6.13
33.17579|ISTEP  7. 1

OPAL or Skiboot progress codes:

[   44.095518490,5] OPAL skiboot-5.6.0-158-ga1e0a047b2a0 starting...
[   44.095526941,7] initial console log level: memory 7, driver 5
[   44.095530349,6] CPU: P8 generation processor (max 8 threads/core)
[   44.095533485,7] CPU: Boot CPU PIR is 0x0068 PVR is 0x004d0200
[   44.095536954,7] CPU: Initial max PIR set to 0x1fff
[   44.095993908,7] OPAL table: 0x300dc240 .. 0x300dc730, branch table: 0x30002000
[   44.095999625,7] Assigning physical memory map table for unused
[   44.096003854,7] FDT: Parsing fdt @0xff00000
[   44.099952132,6] CHIP: Initialised chip 0 from xscom@3fc0000000000
[   44.100047333,5] CHIP: Chip ID 0000 type: P8 DD2.0
[   44.100050694,7] XSCOM: Base address: 0x3fc0000000000
[   44.100059469,7] XSTOP: XSCOM addr = 0x2010c82, FIR bit = 31
[   44.100063231,6] MFSI 0:0: Initialized
[   44.100065694,6] MFSI 0:2: Initialized
[   44.100068188,6] MFSI 0:1: Initialized
[   44.100451213,5] LPC: LPC[000]: Initialized, access via XSCOM @0xb0020
[   44.100459497,7] LPC: Default bus on chip 0x0
[   44.100585382,6] MEM: parsing reserved memory from node /ibm,hostboot/reserved-memory
[   44.100602925,7] HOMER: Init chip 0

System inventory or vital product data (VPD)

System VPD is stored on the BMC in the IPMI FRU inventory. The BMC collects the FRU data of the hardware that is directly connected to the BMC [for example, backplane, power supply, and voltage regulator module (VRM)]. Hostboot updates the processor, centaur, and DIMM VPD. OPAL updates Peripheral Component Interconnect Express (PCIe) VPD. By running the following command, you can collect the complete system VPD including data that contains manufacturing information, product name, serial, and part numbers.
$IPMI_CMD fru print

The example FRU description for the processor is as follows:

FRU Device Description : CPU (ID 1)
Board Mfg Date         : Mon Jan  1 05:30:00 1996
Board Mfg              : IBM
Board Product          : PROCESSOR MODULE
Board Serial           : YA1932735603
Board Part Number      : 00UM003
Board Extra            : ECID:019A007301180718050A000000C031C2
Board Extra            : EC:20

SOL console

SOL is a mechanism that redirects the input and output of a serial port of the remote system over LAN IP. BMC provides the IPMI SOL console with the help of Ethernet and serial ports that are attached to it. You can see the boot progress and failure messages on the console screen. SOL console is the main interface between the overall system and the user. By running the following command, you can connect to the SOL console: 
$IPMI_CMD sol activate

Also, BMC has 32 KB of circular log buffer that is assigned for SOL console data, which you can get from the BMC busy box.

# cat /extlog/sollog/
/extlog/sollog/SOLHostCapture.log    /extlog/sollog/SOLHostCapture.log.1  /extlog/sollog/archive/
 
# cat /extlog/sollog/SOLHostCapture.log.1
 17.88935|ISTEP 11.12
 17.89009|ISTEP 11.13
 17.89083|ISTEP 12. 1
 17.98714|ISTEP 12. 2
 18.08120|ISTEP 12. 3
 18.11037|ISTEP 12. 4
 18.47909|ISTEP 12. 5
 18.48009|ISTEP 13. 1
 18.56525|ISTEP 13. 2
 18.64082|ISTEP 13. 3
 18.64194|ISTEP 13. 4
 18.67988|ISTEP 13. 5
 18.68156|ISTEP 13. 6
 19.57066|ISTEP 13. 7
 19.74316|ISTEP 13. 8
 19.89404|ISTEP 13. 9

When the system is at a checkstop (for example, when the processor is not able to complete any instructions for some time or it is in an impossible state) the corresponding console log data is archived for further debugging and analysis.

System time management

System time is maintained in the Real-Time Clock (RTC), which is controlled by the BMC. When the BMC boots the system, it sets its own time by reading the current Inter-Integrated Circuit (I2C) RTC time. The host (hostboot or OPAL) or the user can read or write the RTC time by running the following IPMI commands. BMC sets or obtains the current RTC time.

$IPMI_CMD sel time get
03/12/2017 09:32:12
$IPMI_CMD sel time set “03/12/2017 09:34:12”
03/12/2017 09:34:12

Boot failure management

OpenPOWER systems have two sides of the PNOR firmware: Primary side and golden side. BMC always boots the system from the primary side of the PNOR firmware. If the primary side boot fails twice due to system checkstop, watchdog conditions (conditions where the watchdog timer detects malfunctions of the operating system or server and recovers from them), or any other reason, BMC boots the system from the golden side of the PNOR firmware. For this process, BMC uses a boot count sensor, which is initially set to two. During the start of every boot operation, BMC decrements this sensor value by one. When the system reaches to the user-accessible level (that is, petitboot or when Linux boots), OPAL resets this sensor value back to two. Whenever the value of the boot count reaches zero due to boot failure, BMC starts booting the system from the golden side of the PNOR firmware.

During any boot failures, run the following command to know the value of boot count:

$IPMI_CMD sensor list | grep -i "boot count"
Boot Count | 0x0 | discrete | 0x0280| na | na | na | na | na | na

The BIOS golden side sensor has two discrete values (0x0080, 0x0180). The value of this sensor determines the side of the PNOR firmware boot.

The following example shows that the value of this sensor is set to 0x0180, which means that the system boots from the golden side of the PNOR firmware:

$IPMI_CMD sensor list | grep -i golden
BIOS Golden Side | 0x0 | discrete | 0x0180| na | na | na | na | na | na

The following example shows that the value of this sensor is set to 0x0080, which means that the system boots from the primary side of the PNOR firmware:

$IPMI_CMD sensor list | grep -i golden
BIOS Golden Side | 0x0 | discrete  | 0x0080 | na | na | na  | na  | na | na

Out-of-band firmware update

Usually, systems have general availability firmware installed. If the firmware gets corrupted or you want to upgrade the firmware to install bug fixes, you can select the out-of-band firmware update method. System firmware is a combination of the host firmware component (PNOR) and the BMC firmware component. It is a Hardware Platform Management (.hpm) file. To update the firmware code (xxxx.hpm) by using the out-of-band method, you need to complete the following steps:

  1. Power off the system by running the following command: 
    $IPMI_CMD chassis power off
  2. Issue a cold reset to BMC by running the following command:
    $IPMI_CMD mc reset cold

    Then wait until the BMC starts.

  3. Before you update the BMC network settings, back up the network settings by running the following command: 
    $IPMI_CMD raw 0x32 0xba 0x18 0x00
  4. Update the BMC and PNOR levels by the using the HPM upgrade option.
    $IPMI_CMD hpm upgrade <xxxx.hpm file> -z 30000 force

    Wait until the firmware upgrade is successful.

  5. Power on the system by running the following command: 
    $IPMI_CMD chassis power on

VUART-to-VGA rendering

Generally, most of the users want to see boot messages or any boot failures during system boot. To view early boot messages in a system boot, the VUART-to-VGA rendering feature is implemented in OpenPOWER systems. By using this feature, when you power the system on, you can see all the early boot messages in a local VGA console. The BMC renders the output of the host UART to the VGA display.

Figure 1. Sample remote VGA console output

Firmware settings

The boot loader (petitboot) has the following settings in the user interface:

  • Network settings
  • Turbo mode setting (enable/disable)
  • Boot device selection
  • Boot device order

All these features can also be controlled by the BMC. By using IPMI commands, you can enable or disable the turbo mode, select a boot device, and configure the host network.

The settings that are changed by using the IPMI commands take preference, and the settings that are changed at petitboot in nonvolatile random access memory (NVRAM) configuration are overridden. Also, system information provides complete firmware versions (BMC and PNOR versions, including the golden side version), and the BMC MAC address.

IPMI lock and unlock feature

When a system is deployed to a customer, unauthenticated in-band IPMI must not be able to access certain BMC configuration information and functions. The OpenPOWER original equipment manufacturer (OEM) IPMI Lock command can be run in an authenticated interface to lock certain IPMI commands. After BMC is in the IPMI lockdown mode, further unauthenticated in-band IPMI messages must satisfy the allowed commands, added in a list called white list that is matched by NetFn command specification. To revert the BMC to a fully permitted configuration, you can run the OEM IPMI Unlock command in an authenticated interface.

  • Lock IPMI interface command: You can run this command to lock the IPMI interface for safer executions. Only a predefined set of commands are allowed in this mode of operations. This is similar to the safe mode operations. 

    $IPMI_CMD raw 0x32 0xf3 0x4c 0x4f 0x43 0x4b 0x00; echo $?
     
    0
    # ipmitool chassis status
    Error sending Chassis Status command: Insufficient privilege level
     
    Get SEL time
    # ipmitool raw 0x0a 0x48; echo $?
     00 ae c0 57
    0

    From these examples, you can see that the Chassis Status command does not produce any output because the command is not listed in the white list. However, when you run the Get SEL Time command, output is displayed because the command is available in the white list. Commands that are listed in the white list can work when BMC is in the IPMI lockdown mode.

  • Unlock IPMI Interface command: By unlocking the IPMI interface, you can run all the commands again. 

    $IPMI_CMD raw 0x32 0xF4 0x55 0x4e 0x4c 0x4f 0x43 0x4b 0x00

IPMI PNOR reprovision

You can revert the modified system settings and modified configurations back to default settings. The OEM IPMI PNOR Reprovision command in OpenPOWER resets the system to the default settings. BMC clears any persistent data that is set by the user. Currently, in PNOR, the erasable partitions are GARD (hardware guard entries), NVRAM (boot loader configuration), hostboot attribute overrides (turbo mode non-default value to default), and FIRDATA.

The following procedure shows the PNOR re-provision process for an NVRAM partition:

  1. Update the NVRAM partition with test data by running the following command: 

    # nvram --print-config
    "common" Partition
    ---------------------
     
    # nvram --update-config test-name=test-value
    # nvram --print-config
    "common" Partition
    ---------------------
    test-name=test-value
  2. Run the PNOR reset/reprovision command. 

    # $IPMI_CMD raw 0x3A 0x1C; echo $?
     
    [392812400894,5] IPMI: PNOR access requested
  3. Run the following command to get the reprovisioning status. Wait for reprovision to complete. 00 indicates successful reprovision. 03 indicates that the re-provisioning is still in progress. 

    # $IPMI_CMD raw 0x3A 0x1D; echo $?
    03
    0
    # $IPMI_CMD raw 0x3A 0x1D; echo $?
    03
    0
    [163479624724,5] IPMI: PNOR access released
                     
    # $IPMI_CMD raw 0x3A 0x1D; echo $?
    00
    0
  4. Read NVRAM data after reboot by running the following command:

    # nvram --print-config
    "common" Partition
    ---------------------
    test-name=test-value
    # reboot
                                     
    # nvram  --print-config

NVRAM data is erased after a PNOR reset or reprovision operation.

The IPMI raw command data might change between different OpenPOWER BMC vendors. However, the functionality remains the same. All the examples mentioned in this article mainly refer to American Megatrends BMC vendor-based OpenPOWER systems.

References


'IBM PowerLinux' 카테고리의 다른 글

H/W check and status check  (0) 2018.07.23
Amber off  (0) 2018.07.05
Scale-out LC System Event Log Collection Tool  (0) 2018.06.19
RHEL7: How to get started with CPU governor  (0) 2018.03.20
리눅스에서 LVM 구성 및 사용방법  (0) 2018.03.06

+ Recent posts