Data General CLARiiON Storage Systems

This page describes some of the things I’ve learned while bringing up one of these (rather nice so far) storage systems.

I have the 20-slot floor-mount model C2200D. As far as I can tell, this is identical to the rack-mount model, and in fact, if you remove the skins from the box, its rack ears are revealed. Nice design.

Specs by observation

I know of two architectures of storage processors (SPs) used in these units. The older variety is built around an AMD 29000 processor, while the newer unit is reportedly based on a PowerPC 603. I own the former, so I will only be able to describe the AMD-based SP in detail. One external difference between the two is that the AMD-based SP has an enable/disable switch accessible from the back of the system, near the "Service" and "Ready" LEDs, while the PowerPC-based SP does not.

The 20-slot storage system has five internal SCSI busses, designated "A" through "E." For each bus, there is an NCR 53C710 chip on the SP. There is one external SCSI bus per SP, which is the data interface to the host computer. This is a differential (wide?) bus, using an NCR 53C720 chip on the SP for its interface. Some of these chips bear a "TolerANT" brand logo, a detail which is likely meaningless.

There are four groups (0 through 3) of five drives each in the 20-slot cabinet. In any given group, each of the five drives sits on a different SCSI bus. This seems quite reasonable; one five-disk RAID 5 array, sitting, say, in group 0, would have the benefit of five SCSI busses, one per drive, through which to push SCSI commands and data. This has got to be about optimal for performance.

The SP units designed for the 20-slot cabinets have six total SCSI chips -- one each for the five internal busses, and one for the external bus. The SP units designed for the 10-slot cabinet have only three SCSI chips: two 53C710 chips and one 53C720. This leads me to believe that there are only two internal busses in the 10-slot cabinets, probably one for the left (or top) group of five drives and one for the right (or bottom). This is consistent with DG’s marketing of the 10-slot cabinet as a lower-end system. I am sure this has an impact on the optimal way to lay out "physical units" across disk drives.

There are four SIMM slots on each SP. Both SIMM-1 and SIMM-3 are labelled "Install first," so I assume they must be installed in pairs. My SPs came with two SIMMs each, for a total of 8 MB per SP. The SIMMS are rated at 70ns and have 12 chips. I don’t know whether the SPs can use other than 4 MB parity SIMMS.

The drive canisters contain a bit of circuitry on them. This may be able to control power to the drive, but there’s certainly more to it than that. Just what they can do, I don’t know.

A drive canister is designed to work with a particular model of narrow SCSI disk drive. The only difference I am aware of between the few canister models I’ve seen is in the cable that connects the drive’s SCSI ID select, spindle sync pins, activity LED pins, and possibly other goodies to the canister’s internal circuit board. I have mapped out several of the pins on the canister’s connector. I’ll draw that up and add it here some time.

The Battery Backup Unit (BBU) contains two packs of 6 lead-acid "D" cells each. I don’t know whether these are used in series or parallel, nor do I know what voltages they provide to the storage system. Presumably, they at least provide both 5V and 12V for the disk drives. There is quite a bit of circuitry and logic inside the BBU. More than just power conversion, certainly.

The batteries in my BBU are dead. I will be replacing them soon.

As I understand it, the BBU is used to power the system long enough to flush the write caches to disk in the event of a power failure. For this reason, caching is not available if the BBU is not present or has failed. (I’m not sure why, but read caching is disabled, too.) Since I have seen mention of an "internal cache vault," I imagine that this means that the system doesn’t necessarily attempt to flush all the write cache’s data to the correct places on disk, but instead merely dumps it as fast as it can. In that case, there would be some cleanup work to do on the next startup.

This is especially reasonable considering that any given bit of data might need to go into several places (mirrored units), and there may be additional information needed to synchronize the system (RAID 5 units will need to read data from each drive in the physical unit to compute parity and then write the parity for any affected stripe). Of course, if it wrote the dirty cache data to a single location, problems would occur if that drive fails as a result of the power failure. I don’t really know much about this aspect of the implementation.

Every unit in the system plugs into the backplane circuit board. This large board is directly visible if you remove some drive canisters and look inside. The drive canisters plug into slots on one side of the board, and the SPs plug into the other side. Also connected to the other side are the cables to the console ports, SCSI ports, power supply units (VSCs), and the BBU. The SCSI ports are wired correctly. That is, for each port ("A" and "B"), a ribbon cable is wired from the "SCSI In" connector on the back of the system, through the connector on the backplane, and back out to the "SCSI Out" on the back of the system. 68-pin connectors and ribbon cable are used throughout.

There is no active circuitry whatsoever on the backplane. This means that it is extremely unlikely to fail, and all the failure-prone components are located in easily-removable units. There is one curious card tucked way inside the system, though. It’s plugged into the back side of the backplane board (the side away from the drives), in a card-edge slot identical to the drive canister slots. It contains circuitry similar to that found in a drive canister, but there are no connections for a drive. There is a designation printed on the card that was meaningless to me at the time, and which I failed to record. The only way to reach this card is to remove the skins (assuming a floor-standing model) and the right-side steel panel. Look for the card at the bottom of the backplane, near the edge facing you.

Getting the system running

You cannot bring the storage system up with no drives, or with drives that are not suitably prepared. Not realizing this, I tried, and for a while I was concerned when it didn’t work. But I learned some things in the process. I installed two fresh disks (disks which had never seen a CLARiiON storage system before) into a couple of empty canisters. (These canisters turn up occasionally in the surplus market.)

I connected a DEC vt320 terminal to the "Console A" DB25 connector. It took me several tries to discover that the port has been set to 8E1, rather than 8N1, as I had expected. This is configurable once the unit is running, though, and I have now set mine to 8N1.

I restarted the system (by power-cycling it), and watched the firmware version messages come up, followed by a line containing the capitol letters "A" through "W," followed by the disappointing message "Storage System Failure ... Internal Code 0001007f". Ouch.

After a lot of experimentation, it turns out that the system really wants a disk unit in slot A0. (Actually, in any slot in the group 0 -- more on that in a moment.) Group 0 (containing slots A0 through E0) is not, as I had first assumed, the top-left group when the storage system is in its floor-standing, vertical configuration. Group 0 is the top-right group.

Not just any old disk will work, though. A fresh drive installed in one of the bays and inserted into slot A0 just changes the "internal code" (after restarting the system) to 0001007e. Not much better. There is something special about the disk that must go in slot A0.

According to an OEM guide for these systems, drives A0, B0, C0, D0, E0 and A3 "may store the licensed internal code and/or serve as the storage-system internal cache vault." Because of this, those drives cannot be allocated as hot spares. Hmmm. I suspect that the storage system loads the "licensed internal code," which I take to mean real-time operating system, from disk A0. I don’t see any Flash on the SP board (although there is a Dallas DS1248Y-150, presumably for storing serial port configuration and similar information), so this seems entirely possible.

If you do not have a drive specially prepared for booting the storage system, I wonder how you would go about creating one. Even if you were able to purchase the "licensed internal code" from Data General (or one of the dozen or so vendors that re-badge these boxes), I don’t know how you’d go about getting that code onto a disk without removing the drive from its canister, connecting it directly to a computer, and installing the image.

There must be some normal way to prepare a disk for use as a startup disk. I don’t know that way. So I inserted an appropriate, prepared drive into slot A0, and powered up the cabinet. The system booted up just fine, getting all the way from "A" to "Z," and then went on to display the "Presentation Utility".

But just bringing the system up this way, with my two new drives inserted in various slots in the system, didn’t seem to do anything useful to the new drives such that one of them could eventually be used in slot A0. Even binding them into various physical units didn’t seem to prepare them for slot-A0 use.

What I finally discovered does work is the following:

  1. Insert the working disk in slot A0
  2. Insert another disk into some other slot (I used B0)
  3. Start up the storage system
  4. Bind the two disks into a mirrored (RAID 1) unit
  5. Wait for binding to complete
  6. Remove the drive from slot A0
  7. Insert a fresh drive in slot A0
  8. Wait for rebuilding to complete

At this point, the drive now in slot A0 will be suitable for starting the system up.

The binding step appears to be necessary. It also appears to be sufficient -- in a subsequent test, I tried the following:

  1. Insert a working disk in slot A0
  2. Insert a fresh disk into another bootable slot (I used B0)
  3. Start up the storage system
  4. Bind the two disks into a mirrored (RAID 1) unit
  5. Wait for binding to complete

At this point, I was able to turn the storage system off, remove the drive in slot A0, and successfully start the system up using only the drive in slot B0.

One method I tried which failed was to start the system up with a working drive in slot A0 and a fresh drive in slot B0, and then pull drive A0 from the running system. Restarting the system at this point did not work. Apparently, without binding the drives into a mirrored unit, the operating system isn’t copied.

Summary of the startup requirements

The storage system seems to need to load its operating system from disk when it starts up. It first looks in slot A0 for a drive containing an appropriate image. If it does not find a drive in slot A0, it tries B0. It continues the search in all of A0 through E0 and A3, and probably in that order. If it finds a drive in one of those slots that does not contain the appropriate image, it displays a failure information code "0001007e." If it finds no drive in one of those slots, it displays a failure information code "0001007f."

If you bind the boot drive together with another drive in one of the special os-loading-capable slots into a RAID 1 mirrored physical unit, the operating system is mirrored to the other drive. This is convenient, since it means that if either of the two drives fails, the other will still be suitable for bringing the system up if a restart were necessary. I wouldn’t imagine that the OS image would be copied to a drive in a slot other than the special positions. I imagine that other types of bound physical units might also result in duplication of the operating system code. I have tested neither of the latter two hypotheses, though.

More observations

When a fresh disk is first inserted into any slot in a running system, that disk will be "formatted." I am not sure whether this involves a low- level format, but since it is done exactly once, it certainly involves writing some data to the disk -- presumably setting up on-disk structures for maintaining log entries, the operating system storage area, configuration information, and user data.

The binding configuration is stored on the boot drive. This configuration is mirrored to the redundant disk at least in the case I tested above, using A0 and B0 as a mirrored pair. Again, this is convenient, since it means that you can pull the drive in slot A0, and even in the event of power failure (or any other cause for a restart), you can still not only start up from the drive in slot B0, but continue to use the mirrored pair (in degraded mode, of course -- you’re missing a disk).

Links

The following resources were useful in preparing this document, as well as for learning how to use the system in the first place.