Having had my V240 for almost 2 years, and with access to a Blade 2500 (previously used for Solaris testing at work) I thought that I should look at the problems that we were having with some devices on some sparc64 machines. We'd narrowed the probable cause to something interrupt-related, but the exact reason was unclear.

The machines affected were all UltraSPARC-IIIi machines, and these had different peripherals from the previous models. For example, Broadcom BCM5704 ethernet and Symbios 53c1020 SCSI. We initially assumed that the problem was in the bge driver, as we were netbooting the machines in order to install them, but it soon became obvious that all devices used at boot time were affected. This caused strange workarounds like using one of the ethernet ports to boot and different one when running, or plugging the keyboard and mouse into different USB ports after the kernel had taken over.

Having determined that the problem must be in our code, I loked at the UltraSPARC-IIIi code, which was handled in the schizo driver. There wasn't anything obvious there, so the next step was to compare it with the FreeBSD code and look for differences. Again, I didn't spot anything obvious there, but that was probably due to the other differences between the two codebases. The next step was to read through the UltraSPARC IIIi Processor User's Manual in order to understand how interrupt routing worked on the US-IIIi.

Having a basic understanding of the interrupt routing, the next steps were to add lots of debugging code to understand what was happening when we set up the processor and when it was running. To do this, I added code to dump the register states and also made it callable from ddb. This allowed me to finally spot the missing configuration in our code. To quote from a mail that I sent to Martin Husemann, Michael Lorenz and Matthew Green at the time:

"I've attached a patch that should get interrupts working on Tomatillo (e.g V240, V440, SB2500). If you want just the quick fix, you only really need:

  +             imap |= (CPU_UPAID << INTMAP_TID_SHIFT);
    

as the rest is mostly debugging (e.g. it seemed a shame to discard the interrupt dump) plus cosmetics."

They all confirmed that the patch worked, so I was able to commit it, and finally fix the longstanding problem.


-^- More notes -^-