After using my V240 as a test bed for the interrupt problems that we were seeing, and finally finding and fixing them, I thought that I should finally work on Sun V2x0 environmental monitoring. I ended up finding and fixing a few other code bugs along the way.
I already knew that the devices most likely to be sensors were at addresses 0x2e and 0x4e, but I did check all the devices in case the Sun i2c bridge had altered the addresses. Using modified i2cscan again, I determined that there was an ADM1031 at address 0x2e and a LM75 at address 0x4e. I didn't manage to make any permanent changes, but did make the ALOM think that the configuration card had been removed, which required a power cycle.
We already had code to add missing DIMM's for SPARCle, so adding the missing devices was straightforward. With this, our LM75 driver should have attached, but it failed due to not being able to write. Modifying the driver not to write was simple, and we could have device properties set on the V240 (and V210, which Martin Husemann kindly tested) to mark them as "no-write".
The next step was a driver for the ADM1026 chip. Whilst writing this, I noticed that I would sometimes see bogus values read from the chip (0x00, 0xba, 0xbb, and 0xbc were common, but others were possible). I haven't tracked down what causes this - it isn't timing related, as I've tried adding a delay between reads and that doesn't fix it. The workaround is to read the register twice and compare the values. If they are different, read another register then the original one (twice again and compare). Testing the ADM1026 on the V440, I noticed a similar problem. I also noticed that the fan speeds on the V240 didn't match the speeds reported by the ALOM for one set of fans, as one of the fan divisor registers wasn't correctly set. Device properties to the rescue again.
Whilst looking at the dbcool driver (to check if ADM1026 support should be merged there), I noticed that direct configuration wasn't supported, so it was simple to add that although the Red Sun Blade 2500 had the ADM1031 chips at addresses which the driver was testing anyway. I also discovered a problem with our PCF8584 driver when writing. This was the cause of the LM75 failing to attach. Fixing this meant that I could remove the device property that I thought that I needed for LM75 on V240. It also meant that I could actually write to i2c devices instead of inadvertently writing to a different device. As a test, I turned the hardware locator on and off using the ALOM, checked the GPIO registers, and can now turn it on and off from userland:
Locator ON:
/tmp/i2cscan -w /dev/iic0 0x22 0x07 0x1f # set port to output /tmp/i2cscan -w /dev/iic0 0x22 0x03 0x5c # set logic level = 0
Locator OFF:
/tmp/i2cscan -w /dev/iic0 0x22 0x03 0xdc # set logic level = 1 /tmp/i2cscan -w /dev/iic0 0x22 0x07 0x9f # set port to input
These addresses are PCA9555 GPIO's and a driver for them should be straighforward. However, another driver or userland program that has hardware-specific information would be needed to handle this, along with keyswitch position, PSU status, etc. Collecting information about some values to report and set should be possible (if tedious) by reading the GPIO registers after each physical change (and also comparing with the ALOM output on machines that have those).
As an example of hardware-specific information, a simple awk script can change the envstat output:
taco# sh /tmp/sunenvstat Model: SUNW,Sun-Fire-V240 Current CritMax WarnMax WarnMin CritMin Unit [adm1026hm0] F0.RS: 5720 RPM F1.RS: 5769 RPM F2.RS: 5921 RPM fan 3: 0 RPM MB.P0.F1.RS: 16463 RPM MB.P1.F0.RS: 17307 RPM MB.P0.F0.RS: 15697 RPM MB.P1.F1.RS: 16463 RPM internal: 25.000 degC MB.P0.T_CORE: 50.000 degC MB.P1.T_CORE: 41.000 degC MB.BAT.V_BAT: 2.906 V V3.3 standby: 3.348 V V3.3 main: 3.348 V V5.0: 4.995 V MB.P0.V_CORE: 1.488 V V+12: 11.750 V V-12: -3.375 V MB.V_+1V5: 1.512 V MB.V_+2V5: 2.496 V MB.V_VCCTM: 2.543 V MB.V_GBE_CORE: 1.207 V MB.V_GBE_+2V5: 2.508 V V3.0 5: 0.000 V MB.V_VTT: 1.250 V MB.P1.V_CORE: 1.484 V [lmtemp0] MB.T_ENC: 13.000 degC
so that it matches the names that the ALOM has for the sensors. As another test, Michael Lorenz kindly checked that it was now possible to alter the limit values with the dbcool driver on his SB2500. He also discovered that the machine will power off if the CPU temperature limit is exceeded. Presumably, the Therm output of the ADM1031 is either monitored by the firmware, or connected to the PSU.
-^- More notes -^-