IBM Advanced Computing Systems -- Draft Legacy section
Mark Smotherman
last update February 24, 2010
under construction
(This reflects my current understanding.
I very much appreciate corrections and new material.)
Legacy
ACS never made it out of the laboratory; I suppose it was too big and
expensive, but for me it was probably the most exciting project I have
ever been involved in. In reflecting on this, I believe that what made
it particularly exciting was that we were a small team, mostly hand-picked
by Jack Bertram, and we pioneered every aspect of the project....
My only regret about ACS is that unlike compiler ideas, we did not take the
time to publish our ideas on hardware so others could build on them.
-- John Cocke, Turing Award Lecture (CACM, March 1988)
A partial list of the impact of the ACS efforts would include:
- compiler technology
- identification and separation of machine-independent versus
machine-dependent optimizations
- instruction scheduling
- Fran Allen states in 1981 IBM JRD article: "Out of this project came
... the foundations of the theory of program analysis and optimization."
- instruction set
- no direct inst. set descendant for ACS-1
- ideas regarding a "highly compilable" instruction set
had influence on IBM 801 and RISC through John Cocke
- multiple condition codes in IBM POWER architecture
- some influence on S/370 inst. set by ACS-360, such as SIOF
- microarchitecture
- spread of superscalar concept
- branch target buffer
- direct descendant for ACS-360 appears to be ES/9000 Model 520
- circuit/packaging technology and circuit design
- immediate influence on ECL work at Amdahl 470 via engineers who
left IBM for MASCOR and then went to Amdahl when MASCOR failed
- See R. Beall, "Packaging for a super computer,"
Proc. IEEE Intercon, March 1974, pp. 1-9.
- See US Patents
- 3,808,475, Buelow and Zasio, LSI chip construction and method, 1974
- 3,981,070, Buelow and Zasio, LSI chip construction and method, 1976
- 4,016,463, Beall, Buelow, and Zasio, High density multilayer printed
circuit card assembly and method, 1977
- 4,115,837, Beall and Zasio, LSI chip package and method, 1978
- 4,396,971, Beall and Zasio, LSI chip package and method, 1983
- influence on VLSI design through Lynn Conway
text below needs to be revised
Further work was done at IBM in the 1970s on superscalar S/370s,
but few, if any, public documents exist.
The exception is the high-end ES/9000 Model 520 processor - a two-issue
superscalar, designed during the 1980s and described by John Liptay in
the July 1992 issue of the IBM Journal of Research and Development.
[See Robert Tomasulo,
"Out-of-order processing - History of the IBM System/360
Model 91" (video, 40:38 in length).
At 5:50 and again at 9:50 he describes a multiple-instruction-issue
IBM mainframe that was complete in its logical design circa 1972 but
not built because of expense.]
In 1989, 20 years after the cancellation of the ACS, IBM introduced the
four-way superscalar RS/6000. John Cocke was a key influence on the design.
Even today, three decades after the ACS, state-of-the-art superscalar
processors are just catching up to the ACS: the
DEC 21264 fetches up to four instructions per cycle and can issue
up to six per cycle; the
IBM POWER3
also fetches up to four instructions per cycle and can execute up to
eight per cycle.
Here are some contributions of ACS to IBM in particular and to the
field of computer design and engineering in general:
- The concept of a branch target buffer.
Sussenguth's Prefetch Sequence Control registers
(US 3,559,183 - 1971)
(A branch target cache, BTC, was later independently invented in 1987
for the AMD 29K at the suggestion of Phil Frieden.)
- An instruction window with multiple instruction decode and entry.
The dynamic instruction scheduling technique proposed in ACS, later
in the project called the "contender stack," was documented in the
1966 internal ACS report, which was later released as a 1969 Watson
Lab tech report available to the public; and, it was mentioned in
Herb Schorr's 1968 internal publication, which was later released
as a 1971 external conference publication.
I believe patent attorneys did not adequately understand the
invention, and thus dismissed the DIS disclosures in 1966 as too
similar to the CDC 6600. [See Brian Randell's
"Reminiscences of Project Y and the ACS."
The CDC 6600 used a central window structure called the
"scoreboard", but it added at most one instruction per cycle to
this structure.]
The DIS scheme from ACS was merged with a register renaming
scheme and issued as US Patent 3,718,912, Instruction execution unit,
Leo Hasbrouck, Bill Madden, Robert Rew, Ed Sussenguth, and John
Wierzbicki, filed Dececember 1970m and issued February 1973.
All listed inventors were ACS project members. (Lynn Conway had
left the company by that point.)
The DIS scheme shares some aspects with a central window proposal
in the 1980s by H.C. Torng of Cornell, known as the "dispatch stack"
[see IEEE Trans. on Computers, C-35, 9 (Sept. 1986) pp. 815-828].
It is ironic that Torng had IBM sponsorship for his research, but
none of his sponsors at IBM knew of the ACS work. Rather, Torng
remembers being asked repeatedly during presentations to IBM groups
at Yorktown, Poughkeepsie, and Kingston, as to how his work related
to the Tomasulo method of the Model 91 [personal communication].
Additionally, Harry Dwyer was an engineer from IBM Endicott who
went to Cornell for his Ph.D. under Professor Torng. Dwyer worked
on and later patented improvements to the dispatch stack. Dwyer
never encountered information about ACS during his work at IBM
Endicott or at Cornell, or even later at IBM in Austin [personal
communication].
(Torng's work differs from the ACS's use of scheduling matrices
since it focuses on maintaining dependency counts between
instructions in the instruction stack and issuing instructions
that have counts of zero. Torng's patent, "Instruction issuing
mechanism for processors with multiple functional units,"
US 4,807,115 (1989), includes a reference to the ACS patent of
Hasbrouck, et al., Instruction Execution Unit, US 3,718,912,
perhaps added during a prior art search.)
- Register renaming using physical registers.
The Model 91 was famous for introducing a system of substituting
tags for operand references. These tags represented not separate
physical registers, but results being produced from various functional
units. The ACS-1 introduced using physical registers in renaming in
the form of backup registers, one per arithmetic register. This
concept of backup registers found in ACS-1 was used in the 1980s on
the IBM Cheetah superscalar processor design. The led to the use of
a pool of extra physical registers not addressable by the instruction
set to rename FP registers on loads (this idea is found in the
Hasbrouck, et al., ACS-360 patent, US 3,718,912 - 1973). The pooled
approach was used in the America processor, which succeeded the Cheetah,
and in the RS/6000. While backup registers allow the hardware to unroll
tight loops by factor of two, the pooled approach is able to unroll
tight loops to an even greater degree.
- Pioneering RAS features such as use of a service processor,
scan in and scan out, and diagnostics that attempted to
isolate failures to a FRU (Field Replaceable Unit).
Can the scan-in, scan-out contribution be pinned down?
LSSD is attributed to Ed Eichelberger; see US patent 3,761,695,
filed Oct 1972, and issued Sep 1973. What did ACS do?
- SIOF (Start I/O Fast Release). This was included in the S/370
architecture ... [description]
-
Gordon Bell's assessment of impact of CDC 6600 on IBM: ACS team led
to John Cocke's compiler work and then RISC; establishing and then
cancelling ACS led to more computer engineers in Silicon Valley and
formation of new companies, esp. Amdahl
- ... more to be added ... trails of knowledge transfer ...
US Patents associated with ACS techniques
- 3,405,323 - Rohinton Surty and Jak Taranto -
Apparatus for cooling electronic components.
Filed March 1967; issued October 1968.
- 3,460,010 - Bob Domenico and Bob Lloyd -
Thin film decoupling capacitor incorporated in an integrated
circuit chip and process for making same.
Filed May 1968; issued August 1969.
- 3,465,435 - James Steranko - Method of forming an interconnecting
multilayer circuitry. Filed May 1967; issued September 1969.
- 3,479,233 - Bob Lloyd - Method for simultaneously forming a
buried layer and surface connection in semiconductor devices.
Filed January 1967; issued November 1969.
- 3,500,192 - Charles Donaher and James Steranko - Oscillatory
probe system for contacting and testing a circuit point through
a high density of wires. Filed January 1967; issued March 1970.
- 3,512,582 - Richard Chu, Un-Pah Hwang, and John Seely -
Immersion cooling system for modularly packaged components.
Filed July 1968; and issued May 1970.
- 3,516,156 - James Steranko - Circuit package assembly process.
Filed December 1967; issued June 1970.
- 3,524,497 - Richard Chu, Martin Cohen, and Omkarnath Gupta -
Heat transfer in a liquid cooling system.
Filed April 1968; issued August 1970.
- 3,541,317 - Charles Freiman -
Parallel addition and division of two numbers by a fixed divisor.
Filed August 1967; issued November 1970.
- 3,559,183 - Ed Sussenguth - Instruction sequence control.
Filed February 1968; issued January 1971.
- 3,560,277 - Bob Lloyd, Stanley Davis, Charles Frank Myers -
Process for making semiconductor bodies
having power connections internal thereto.
Filed January 1968; issued February 1971.
(corrected to show co-assignees of IBM and Motorola)
See also patent application 697,732 - Fred Reid - Power connections
in integrated circuit chip - filed January 1968, and CA925222,
published March 1974.
- 3,576,969 - Rohinton Surty and Conrad Trollmann (IBM Germany) -
Solder reflow device.
Filed September 1969; issued May 1971.
- 3,577,189 -
John Cocke, Brian Randell, Herb Schorr, and Ed Sussenguth -
Apparatus and method in a digital computer for allowing
improved program branching with branch anticipation reduction of the
number of branches, and reduction of branch delays.
Filed January 1969; issued May 1971.
- 3,577,190 - John Cocke, Phil Dauber, Herb Schorr, and Ed Sussenguth -
Apparatus in a digital computer for allowing the skipping
of predetermined instructions in a sequence of instructions, in response
to the occurrence of certain conditions.
Filed June 1968; issued May 1971.
- 3,586,101 - Richard Chu, Omkarnath Gupta, Un-Pah Hwang,
Kevin Moran, and Robert Simons - Cooling system for
data processing equipment.
Filed December 1969; issued June 1971.
- 3,591,787 - Charles Freiman and Chung Wang -
Division system and method.
Filed January 1968; issued July 1971.
Cites Don Senzig's article on "High-speed division algorithm," in
IBM Technical Disclosure Bulletin, October 1967.
- 3,593,299 - Graham Driscoll and Ed Sussenguth -
Input-output control system for data processing apparatus.
Filed July 1967; issued July 1971.
- 3,602,420 - Richard Wilkinson - Ultrasonic bonding device.
Filed February 1970; issued August 1971.
- 3,609,991 - Richard Chu and Un-Pah Hwang -
Cooling system having thermally induced circulation.
Filed October 1969; issued October 1971.
- 3,638,081 - Bob Lloyd -
Integrated circuit having lightly doped expitaxial collector
layer surrounding base and emitter elements and heavily doped
buried collector larger in contact with the base element.
Filed August 1968; issued January 1972.
- 3,639,218 - Leo Missel - Shelf life improvement of electroplated
solder. Filed October 1969; issued February 1972.
- 3,653,572 - Sherman Dushkes and Conrad Trollmann (IBM Germany) -
Hot gas solder removal. Filed September 1969; issued April 1972.
- 3,654,530 - Bob Lloyd - Integrated clamping circuit.
Filed June 1970 (continued from August 1968); issued April 1972.
- 3,670,944 - Sherman Dushkes and Rohinton Surty -
Miniature ultrasonic bonding device.
Filed February 1970; issued June 1972.
- 3,675,215 - Richard Arnold, Phil Dauber, and Ed Sussenguth -
Pseudo-random code implemented variable block-size storage
mapping device and method.
Filed June 1970; issued July 1972.
- 3,675,217 - Phil Dauber, Russ Robelen, and John Wierzbiski [sic] -
Sequence interlocking and priority apparatus.
Filed December 1969; issued July 1972.
- 3,670,307 - Richard Arnold, Phil Dauber, Charles Freiman,
Russ Robelen, and John Wierzbicki -
Interstorage transfer mechanism.
Filed December 1969; issued June 1972.
- 3,670,309 - Gene Amdahl, Richard Arnold, Phil Dauber,
Charles Freiman, Russ Robelen, Herb Schorr, and John Wierzbicki -
Storage control system.
Filed December 1969; issued June 1972.
- 3,718,912 - Leo Hasbrouck, Bill Madden, Robert Rew, Ed Sussenguth,
and John Wierzbicki - Instruction execution unit.
Filed December 1970; issued February 1973.
- 3,774,677 - Vincent Antonetti, Omkarnath Gupta, and Kevin Moran -
Cooling system providing spray type condensation.
Filed February 1971 (but the patent specification states that it is
copending with the application that resulted in the '101 patent);
issued November 1973.
IBM Technical Disclosures associated with ACS techniques
- C.V. Freiman, H. Schorr, and R. Swartout,
"Preferential Data Paths," vol. 8, no. 12, May 1966, pp. 1752-1753.
One of the features of many large, scientific computers is their use
of a bank of fast local registers for holding operands and intermediate
results. These machines also employ several arithmetic units. It is
common to use a full, minor machine cycle to transfer data from a
local register to an arithmetic unit. Another minor cycle is used to
return the answer to a local register. In case the arithmetic unit is
a floating point adder, this data transfer time is equal to the time
required for the addition itself.
The basic reason these transfers require such a large amount of time
is that the busses used service a large number of sources and destinations.
The system provides for direct paths between specified registers R0...R15
and specified arithmetic units AUO... AU15 in addition to the general busses
used. This saves, the two minor cycles previously required to transfer data
back and forth between the local registers and the arithmetic units. In many
cases an arithmetic operation can be executed in one minor cycle instead of
three. During compilation, operands would be assigned registers according to
the functional unit to be employed in executing the operation specified
whenever possible. Similarly, machine control would choose the appropriate
arithmetic unit if a choice exists when issuing instructions.
- J. Cocke and H. Schorr, "Memory Queueing System," November 1966, 3pp.
- J.J. Steranko, "High density bonding tweezers," vol. 10, no. 5,
October 1967, pp. 630-631.
- A. Carroll, C.J. Donaher, F.A. Reid, and J.J. Steranko,
"Single chip carrier package," vol. 12, no. 4, September 1969,
p. 538.
- W.R. Arnold and G.B. Cherniack, "Integrated circuit component,
vol. 12, no. 6, November 1969, pp. 793-794.
- "System for interlocking between asynchronously operating indexing
and arithmetic units," vol. 14, no. 2, July 1971
(two entries with the same title but with different authors;
the first entry seems to be a correction/extension of
the second entry to correctly handle two loads returning
values to a register in the A unit out of their normal
program order)
- M.E. Homan and E.H. Sussenguth, pp. 656-659.
- C.V. Freiman, M.E. Homan, G.T. Paul, and C.C. Wang, pp. 660-662.
- B.O. Beebe, J. Cocke, J.G. Earle, C.R. Holleran,
M.E. Homan, R.J. Robelen, P. Shivdasani, and E.H. Sussenguth,
"Instruction sequencing control", vol. 14, no. 12,
May 1972, pp. 3599-3611.
Navigation within IBM ACS pages:
Back to first ACS page
Next section: ACS Veterans