Looking at the OFW tree on the E250, there is a cpu-fan-speeds property. I assumed that this meant that the fan speed would be controllable from software, maybe similar to the SB1000/SB2000. However, the ecadc driver only handled reading the temperatures, so I wondered if there was another chip that controlled the fan. I have a spare set of boards, so I went over them with a magnifying glass and found:

System Board
PCF8584T Controller
PCF8591 Digital Analogue and Analogue Digital Converter
DS1307 Real-Time Clock
Power Distribution Board
PCF8591 Digital Analogue and Analogue Digital Converter
PCF8574AT Remote 8-Bit I/O Expander
PCF8574AT Remote 8-Bit I/O Expander
Backplane (SCSI)
PCF8574AT Remote 8-Bit I/O Expander
PCF8574AT Remote 8-Bit I/O Expander
PCF8574AT Remote 8-Bit I/O Expander

Using i2cscan, I saw these show up on the I²C bus, although the PCF8591T at address 0x4a didn't show up, but we know about that from OFW. Because these chips have a small range of addresses, it's easy to match address to chip type:

Address Chip
0x38 PCF8574AT
0x39 PCF8574AT
0x3d PCF8574AT
0x3e PCF8574AT
0x3f PCF8574AT
0x4a PCF8591T
0x4e PCF8591
0x52 24C02N
0x68 DS1307

I assumed that the PCF8591 on the system board is at address 0x4e, and the PCF8591T on the power distribution board is at address 0x4a because of the information in OFW. Comparing OFW information from other machines, the Atmel 24C02N is the NVRAM and the DS1307 is the clock for the RSC. This left the 5 PCF8574AT chips to identify. As there is no other chip that can control the fans, it seemed very likely that the PCF8591T would do this. The information in OFW only has information 2 of the 4 channels, so I guessed that the other channels might control the fans. I committed the change to add the I²C devices before looking at controlling the fan speed.

Looking at the cpu-fan-speeds property and writing values from 0x64 to 0xff to both channels showed that channel 1 controlled the fan speed. However, the minimum value read from channel 1 was 0x96, so it seems that writing lower values has no effect. The value read seemed to be approximately 1/16 lower than the value written too, but the read value is only used for display output, so this seemed unimportant. Writing to channel 2 didn't appear to do anything. Looking at the tda driver, there is code there to read a sensor and adjust the fan speed. Using a similar design seemed sensible. However, the OFW properties seem to suggest that the raw value read from the sensors could be used as an index into the cpu-fan-speeds property, as was already done using the cpu-temp-factors property to calculate the real temperature. However, after the calculation, the raw value was lost, and there isn't a one-to-one mapping if reversing the calculation.

After a conversation with Michael van Elst where we discussed the problem and the resolution of the readings versus the output in micro-kelvins, a solution was obvious. The last 8 bits of the converted value can be used to store the raw value without affecting the precision of the conversion. The raw value can then be extracted from the CPU with the highest temperature and used as the index into the cpu-fan-speeds property. By opening and closing the windows and running openssl speed, I was able to vary the CPU temperatures enough to show the fan speed being altered by the CPU temperature changes. In practice the CPU temperatures didn't increase enough for values above 0x73 to be used, so the fan speed didn't change, as the minimum value for the DAC is 0xa5. The new code also reports the fan speed via the envsys framework:

            Current  CritMax  WarnMax  WarnMin  CritMin  Unit
     PDB:    31.356   60.000   55.000    5.000           degC
    SCSI:    28.548   60.000   55.000    5.000           degC
  CPUFAN:       169        0        0        0        0  none
    CPU0:    42.000   68.000   63.000    0.000           degC
    CPU1:    43.000   68.000   63.000    0.000           degC
     MB0:    35.568   60.000   55.000    5.000           degC
     MB1:    30.420   60.000   55.000    5.000           degC

The reported values match the values reported by the RSC, although it appears that the RSC increases the fan speed reading (presumably to account for the difference in read and write values). It also reports other information, so the next step was to check the values reported by the PCF8574AT chips. Note, that only 1 PSU is connected, so the RSU shows some error statuses.

==================== Environmental Status ====================

System Temperatures (Celsius):
      CPU0    42
      CPU1    43
       MB0    35
       MB1    30
       PDB    31
      SCSI    29


Front Status Panel:

Keyswitch position is in On mode.

System LED Status:  DISK ERROR      POWER  
                      [OFF]         [ ON]      
                      [ ON]         [OFF]      
                    GENERAL ERROR   THERMAL ERROR  
                      [OFF]         [OFF]      


Disk LED Status:    OK = GREEN  ERROR = YELLOW
        DISK  5:    [OK]    DISK  3:    [OK]    DISK  1:    [OK]
        DISK  4:    [OK]    DISK  2:    [OK]    DISK  0:    [OK]


Fan Bank :

Bank      Speed     Status
----      -----     ------
 SYS       179        OK


Power Supplies:

Supply     Status
------     ------
  0        FAILED: DC Power Failure
  1          OK  


When I commited the original code, Tobias Nygren pointed out that, in the past, there had been a driver for the environmental controller in the E450, and that had an associated header file with definitions for the chips there. This was very useful when observing the values of the PCF857A's when altering various states (removing disks, disconnecting PSU's, etc). From the observations, I determined:

Address 0x38 State Changes (interrupts?)
0xff Normal state
0x9f PSU state change
0xfb Disk state change
Address 0x39 PSU State
0xfc Both PSU's present and running
0xec PSU 0 failed
0xdc PSU 1 failed
0xde Only PSU 0 present
0xed Only PSU 1 present
Address 0x3d Disk State
0xc0 Disks 0 1 2 3 4 5
0xd0 Disks 0 1 2 3 . 5
0xf0 Disks 0 1 2 3 . .
0xf8 Disks 0 1 2 . . .
0xfc Disks 0 1 . . . .
0xfe Disks 0 . . . . .
0xff Disks . . . . . .
Address 0x3e Front Panel
0xbf Key position normal
0x7f Key position diag
0xff Key position secure
0xbd PSU fault
0xbe Disk fault
0xbb Temperature fault
0xb7 General fault
0xaf Activity
Address 0x3f Disk Fault LEDs
0xff Normal state
0xfe Disk 0 fault
0xfd Disk 1 fault
0xfb Disk 2 fault
0xf7 Disk 3 fault
0xef Disk 4 fault
0xdf Disk 5 fault

The state changes on the chip at address 0x38 are only set for 30 seconds, after that the chip reverts to reading 0xff. The chip at address 0x3f didn't show any changes when I was altering various things, but the Sun Enterprise 250 Server Owner's Guide noted in the About the Status and Control Panel section:

"This yellow LED lights steadily to indicate a fault in one of the hard disk drives. When this LED is lit, one or more disk LEDs may also be lit, indicating the source of the fault."

which meant that there must be a way of controlling the disk LEDs. Writing to the chip confirmed this and also which pin controlled which LED.

The final steps were to agree how to handle the different types of pins, so that I could pass that information from the machine-dependent part to the device driver and to add a driver for the PCF8574 chips. After a mail thread on the tech-net mailing list was somewhat inconclusive, I went for an interim solution of passing the type in the name of the pin. The driver was straightforward, as I could use the existing pcagpio driver as a basis, removing parts not required in the simpler PCF8574 chip and adding in support for sysmon. While looking at that driver, I committed the debugging code that I'd used to identify GPIO pins on the v240 some time back. The new commits were changes to the OFW patching code, corresponding changes in pcagpio and the new pcf8574 driver. With theses changes, monitoring information is available via envsys and sysctl:

orthanc# envstat
                   Current  CritMax  WarnMax  WarnMin  CritMin  Unit
            PDB:    28.548   60.000   55.000    5.000           degC
           SCSI:    26.676   60.000   55.000    5.000           degC
         CPUFAN:       164        0        0        0        0  none
           CPU0:    40.000   68.000   63.000    0.000           degC
           CPU1:    41.000   68.000   63.000    0.000           degC
            MB0:    35.100   60.000   55.000    5.000           degC
            MB1:    29.484   60.000   55.000    5.000           degC
   psu0_present:      TRUE
   psu1_present:      TRUE
     psu0_fault:     FALSE
     psu1_fault:     FALSE
  disk0_present:      TRUE
  disk1_present:      TRUE
  disk2_present:      TRUE
  disk3_present:      TRUE
  disk4_present:      TRUE
  disk5_present:      TRUE
     key_normal:      TRUE
       key_diag:     FALSE
orthanc# sysctl -a -e | grep hw.led


-^- More notes -^-