Looking at the OFW tree on the E250, there is a cpu-fan-speeds property. I assumed that this meant that the fan speed would be controllable from software, maybe similar to the SB1000/SB2000. However, the ecadc driver only handled reading the temperatures, so I wondered if there was another chip that controlled the fan. I have a spare set of boards, so I went over them with a magnifying glass and found:
System Board | |
---|---|
PCF8584T | Controller |
PCF8591 | Digital Analogue and Analogue Digital Converter |
ATMEL 24C02N | EEPROM (2KB) |
DS1307 | Real-Time Clock |
Power Distribution Board | |
PCF8591 | Digital Analogue and Analogue Digital Converter |
PCF8574AT | Remote 8-Bit I/O Expander |
PCF8574AT | Remote 8-Bit I/O Expander |
Backplane (SCSI) | |
PCF8574AT | Remote 8-Bit I/O Expander |
PCF8574AT | Remote 8-Bit I/O Expander |
PCF8574AT | Remote 8-Bit I/O Expander |
Using i2cscan, I saw these show up on the I²C bus, although the PCF8591T at address 0x4a didn't show up, but we know about that from OFW. Because these chips have a small range of addresses, it's easy to match address to chip type:
Address | Chip |
---|---|
0x38 | PCF8574AT |
0x39 | PCF8574AT |
0x3d | PCF8574AT |
0x3e | PCF8574AT |
0x3f | PCF8574AT |
0x4a | PCF8591T |
0x4e | PCF8591 |
0x52 | 24C02N |
0x68 | DS1307 |
I assumed that the PCF8591 on the system board is at address 0x4e, and the PCF8591T on the power distribution board is at address 0x4a because of the information in OFW. Comparing OFW information from other machines, the Atmel 24C02N is the NVRAM and the DS1307 is the clock for the RSC. This left the 5 PCF8574AT chips to identify. As there is no other chip that can control the fans, it seemed very likely that the PCF8591T would do this. The information in OFW only has information 2 of the 4 channels, so I guessed that the other channels might control the fans. I committed the change to add the I²C devices before looking at controlling the fan speed.
Looking at the cpu-fan-speeds property and writing values from 0x64 to 0xff to both channels showed that channel 1 controlled the fan speed. However, the minimum value read from channel 1 was 0x96, so it seems that writing lower values has no effect. The value read seemed to be approximately 1/16 lower than the value written too, but the read value is only used for display output, so this seemed unimportant. Writing to channel 2 didn't appear to do anything. Looking at the tda driver, there is code there to read a sensor and adjust the fan speed. Using a similar design seemed sensible. However, the OFW properties seem to suggest that the raw value read from the sensors could be used as an index into the cpu-fan-speeds property, as was already done using the cpu-temp-factors property to calculate the real temperature. However, after the calculation, the raw value was lost, and there isn't a one-to-one mapping if reversing the calculation.
After a conversation with Michael van Elst where we discussed the problem and the resolution of the readings versus the output in micro-kelvins, a solution was obvious. The last 8 bits of the converted value can be used to store the raw value without affecting the precision of the conversion. The raw value can then be extracted from the CPU with the highest temperature and used as the index into the cpu-fan-speeds property. By opening and closing the windows and running openssl speed, I was able to vary the CPU temperatures enough to show the fan speed being altered by the CPU temperature changes. In practice the CPU temperatures didn't increase enough for values above 0x73 to be used, so the fan speed didn't change, as the minimum value for the DAC is 0xa5. The new code also reports the fan speed via the envsys framework:
Current CritMax WarnMax WarnMin CritMin Unit [ecadc0] PDB: 31.356 60.000 55.000 5.000 degC SCSI: 28.548 60.000 55.000 5.000 degC CPUFAN: 169 0 0 0 0 none [ecadc1] CPU0: 42.000 68.000 63.000 0.000 degC CPU1: 43.000 68.000 63.000 0.000 degC MB0: 35.568 60.000 55.000 5.000 degC MB1: 30.420 60.000 55.000 5.000 degC
The reported values match the values reported by the RSC, although it appears that the RSC increases the fan speed reading (presumably to account for the difference in read and write values). It also reports other information, so the next step was to check the values reported by the PCF8574AT chips. Note, that only 1 PSU is connected, so the RSU shows some error statuses.
==================== Environmental Status ==================== System Temperatures (Celsius): ------------------------------ CPU0 42 CPU1 43 MB0 35 MB1 30 PDB 31 SCSI 29 ================================= Front Status Panel: ------------------- Keyswitch position is in On mode. System LED Status: DISK ERROR POWER [OFF] [ ON] POWER SUPPLY ERROR ACTIVITY [ ON] [OFF] GENERAL ERROR THERMAL ERROR [OFF] [OFF] ================================= Disk LED Status: OK = GREEN ERROR = YELLOW DISK 5: [OK] DISK 3: [OK] DISK 1: [OK] DISK 4: [OK] DISK 2: [OK] DISK 0: [OK] ================================= Fan Bank : ---------- Bank Speed Status (0-255) ---- ----- ------ SYS 179 OK ================================= Power Supplies: --------------- Supply Status ------ ------ 0 FAILED: DC Power Failure 1 OK =================================
When I commited the original code, Tobias Nygren pointed out that, in the past, there had been a driver for the environmental controller in the E450, and that had an associated header file with definitions for the chips there. This was very useful when observing the values of the PCF857A's when altering various states (removing disks, disconnecting PSU's, etc). From the observations, I determined:
Address 0x38 | State Changes (interrupts?) |
---|---|
0xff | Normal state |
0x9f | PSU state change |
0xfb | Disk state change |
Address 0x39 | PSU State |
0xfc | Both PSU's present and running |
0xec | PSU 0 failed |
0xdc | PSU 1 failed |
0xde | Only PSU 0 present |
0xed | Only PSU 1 present |
Address 0x3d | Disk State |
0xc0 | Disks 0 1 2 3 4 5 |
0xd0 | Disks 0 1 2 3 . 5 |
0xf0 | Disks 0 1 2 3 . . |
0xf8 | Disks 0 1 2 . . . |
0xfc | Disks 0 1 . . . . |
0xfe | Disks 0 . . . . . |
0xff | Disks . . . . . . |
Address 0x3e | Front Panel |
0xbf | Key position normal |
0x7f | Key position diag |
0xff | Key position secure |
0xbd | PSU fault |
0xbe | Disk fault |
0xbb | Temperature fault |
0xb7 | General fault |
0xaf | Activity |
Address 0x3f | Disk Fault LEDs |
0xff | Normal state |
0xfe | Disk 0 fault |
0xfd | Disk 1 fault |
0xfb | Disk 2 fault |
0xf7 | Disk 3 fault |
0xef | Disk 4 fault |
0xdf | Disk 5 fault |
The state changes on the chip at address 0x38 are only set for 30 seconds, after that the chip reverts to reading 0xff. The chip at address 0x3f didn't show any changes when I was altering various things, but the Sun Enterprise 250 Server Owner's Guide noted in the About the Status and Control Panel section:
"This yellow LED lights steadily to indicate a fault in one of the hard disk drives. When this LED is lit, one or more disk LEDs may also be lit, indicating the source of the fault."
which meant that there must be a way of controlling the disk LEDs. Writing to the chip confirmed this and also which pin controlled which LED.
The final steps were to agree how to handle the different types of pins, so that I could pass that information from the machine-dependent part to the device driver and to add a driver for the PCF8574 chips. After a mail thread on the tech-net mailing list was somewhat inconclusive, I went for an interim solution of passing the type in the name of the pin. The driver was straightforward, as I could use the existing pcagpio driver as a basis, removing parts not required in the simpler PCF8574 chip and adding in support for sysmon. While looking at that driver, I committed the debugging code that I'd used to identify GPIO pins on the v240 some time back. The new commits were changes to the OFW patching code, corresponding changes in pcagpio and the new pcf8574 driver. With theses changes, monitoring information is available via envsys and sysctl:
orthanc# envstat Current CritMax WarnMax WarnMin CritMin Unit [ecadc0] PDB: 28.548 60.000 55.000 5.000 degC SCSI: 26.676 60.000 55.000 5.000 degC CPUFAN: 164 0 0 0 0 none [ecadc1] CPU0: 40.000 68.000 63.000 0.000 degC CPU1: 41.000 68.000 63.000 0.000 degC MB0: 35.100 60.000 55.000 5.000 degC MB1: 29.484 60.000 55.000 5.000 degC [pcf8574io1] psu0_present: TRUE psu1_present: TRUE psu0_fault: FALSE psu1_fault: FALSE [pcf8574io2] disk0_present: TRUE disk1_present: TRUE disk2_present: TRUE disk3_present: TRUE disk4_present: TRUE disk5_present: TRUE [pcf8574io3] key_normal: TRUE key_diag: FALSE orthanc# sysctl -a -e | grep hw.led hw.led.disk_fault=0 hw.led.psu_fault=0 hw.led.overtemp=0 hw.led.fault=0 hw.led.activity=0 hw.led.disk0_fault=0 hw.led.disk1_fault=0 hw.led.disk2_fault=0 hw.led.disk3_fault=0 hw.led.disk4_fault=0 hw.led.disk5_fault=0
-^- More notes -^-