This chapter describes the functionality of the SGI 2100 module System Controller (MSC). The MSC interacts with the power supply, fan-tray module, node board(s), midplane, and other boards that have on-board regulators in the server.
The MSC is located in the upper left section on the front of the SGI 2100 system. It is between the CD-ROM drive bay and the hard disk bays. Figure 6-1 shows the MSC location.
The MSC front panel is shown in Figure 6-2. The MSC provides environmental monitoring for safe operation of the system. The MSC connects to the system midplane via an extender board and provides easy user access to switches and displays at the front of the system.
Pinouts for the controller's 8-pin diagnostic serial connector are shown in Figure 6-3.
In the lower right section on the back of the system is a 9-pin alternate console diagnostic serial connector that is a direct mirror of the 8-pin diagnostic connector on the front panel. Figure 6-4 shows the location and pinouts of the 9-pin rear-mounted MSC diagnostic connector.
The MSC has one keyswitch, two push buttons, and four LED indicators. The following paragraphs provide information on the use or significance of each control or indicator.
The Front Panel Keyswitch selects Standby, On, or Diagnostic status for the system.
The System Reset push button initiates a system-wide reset of the system. The keyswitch must be in the diagnostic position to use this button.
The Non-Maskable Interrupt (NMI) switch issues a reset signal to all node boards in the system. The keyswitch must be in the diagnostic position to use this button.
The AC Power OK green LED lights when the system is plugged into an outlet and the AC circuit breaker is turned on. The MSC is receiving DC voltage (V_5 Aux) through the midplane, as are other boards that require it.
The DC Power OK green LED lights three and one-half seconds after the keyswitch is turned to the On position. This indicates the system power supply is enabled and operating properly.
The Fan Speed High amber warning LED lights as an indication that the ambient temperature is higher than optimal, or a non-critical fan has failed. When a non-critical fan fails, the remaining fans are set at full speed to compensate. In this case, a service call should be placed immediately.
The Over Temperature Fault amber warning LED lights when the MSC's incoming air temperature or fan failure detection causes a shutdown of the system. If the environmental temperature exceeds the system's tolerance, or if a critical fan fails, the MSC shuts down the system. In some cases, a service call should be placed immediately. See the section “MSC Shutdown” in Chapter 7 for tips on how to troubleshoot this problem area.
The MSC has the following basic features and functions:
Issues a reset signal at power-on.
The front-panel mounted keyswitch provides a soft power-off to standby condition.
A front-panel mounted push-button non-maskable interrupt (NMI) switch.
Monitors ambient incoming air temperature into the system and adjusts fan speed accordingly (two speeds). A soft power-off of the system results when ambient temperature becomes too high for safe operation.
LED display of ambient over-temperature conditions.
NVRAM for storing configuration information (1024 x 8 bits).
Monitors fan rotation and automatically increases to high speed fan operation when a fan fails. Signals an impending shutdown when a single critical fan fails, or two or more non-critical fans fail.
LED display of high fan speed and possible fan tray failure (fan high-speed LED).
LED display of power supply operation. AC OK LED indicates AC voltage applied to system. DC OK indicates all Power Supply DC voltages (+12 V, +5 V, +3.45 V), and remote DC voltages (3.3 V, 2.4 V, 1.6 V) are present with no error conditions in the system. The DC OK LED does not indicate regulation or accuracy of the DC voltages present.
Provides a 100-Kbps bidirectional communication path between the MSC, mid-plane, and Hub ASIC IO space on each node board in the system. This communication path allows the MSC to receive system status messages from all node boards in a system, and to provide status messages from the MSC and all node boards in a system. This communication path is referred to as the I2C interface.
Provides ability to request the system serial number and configuration information via the I2C Interface.
Eight-digit alphanumeric status display. This display is updated by the MSC or the node cards in the system via the I2C interface.
Provides a seven-wire 9600 BAUD alternate console diagnostic port for off-line configuration and troubleshooting. Also communicates with the node board(s) when the IO console port or a system console is not available or functional. This interface also supports the minimum requirements for modem support.
Software Reset, NMI, and soft power-off commands through the alternate console diagnostic port.
Supports alternate console diagnostic port command line power supply voltage margining. Margining allows the 3.45-V or 5-V outputs of the power supply to be moved 5% higher or lower independently. This does not effect remote regulated termination voltages (1.6 V, 2.4 V, router 3.3 V).
Supports alternate console diagnostic port command-line regulated termination voltage margining for the termination voltages 1.6 V, 2.4 V, and 3.3 V, (all termination voltages will be margined 5% higher or lower together, not independently). This does not affect the power supply voltages.
Sends early warning high priority interrupt (Panic Interrupt) to all node boards warning of impending shutdown due to AC power fail, ambient over-temperature or the switch being placed in the standby position.
Provides an interlock (removable keyswitch) to prevent unauthorized personnel from turning the system to on or standby, and to limit operation of the System Reset and NMI functions. The software password allows access and permissions through the alternate diagnostic console port.
The MSC front panel has an eight-character LED readout that supplies information about system status. In the case of a problem related to the power supply, you should check the information in the section “Power Supply Problems” in Chapter 7 for additional information.
Table 6-1 gives a list of MSC messages and an explanation of what the impacts may be.
Error Message | Meaning of Message |
---|---|
SYS OK | The system is operating normally. |
R PWR UP | The system is being powered on remotely via the MSC's serial connection. |
TEMP OK | The system temperature is within normal operating parameters. |
PSTMP OK | The power supply operating temperature is OK. |
POWER UP | The system is being powered on from the front panel switch. |
PFW FAIL | The power supplied to the system has failed or dropped below acceptable parameters. The system has shut down. |
PS OT FL | The system's power supply temperature has exceeded safety limits and the system has shut down. |
PS FAIL | The internal power supply has failed and the system has shut down. |
OVR TEMP | The system's temperature has exceeded acceptable limits and the system has shut down. |
KEY OFF | The MSC's switch has been turned to standby. |
RESET | The Controller's switch has been turned to the diagnostic position and the reset button pushed. |
NMI | The Controller's switch has been turned to the diagnostic position and the non-maskable interrupt (NMI) button pushed. |
M FAN FL | More than one fan has failed and the system has shut down. |
R PWR DN | The system has been powered off from a remote location. |
PWR CYCL | The system has received the command to power cycle from the console or a remote user. |
HBT TO | The system has registered a heartbeat time-out. A non-maskable interrupt is generated, followed by a system reset. |
FAN FAIL | A system fan has failed. If it is fan 1, 2, or 3, the system shuts down. A service call should be placed as soon as possible. |
PS HITMP | The internal power supply unit is running at higher than normal temperatures. |
POK FAIL | A power OK failure occurred on an unidentified board. |
POK N 0 | A power OK failure occurred on the first node board. |
POK N 1 | A power OK failure occurred on the second node board. |
POK N 2 | A power OK failure occurred on the third node board. |
POK N 3 | A power OK failure occurred on the fourth node board. |
POK RT 0 | A power OK failure occurred on the first router board. |
POK RT 1 | A power OK failure occurred on the second router board. |
SP INT 1 | The MSC's firmware generated a spurious timer interrupt signal. |
SP INT 2 | The MSC's firmware generated a spurious clock signal. |