Bottlenecks in information processing

I have recently finished reading an interesting novel. It is a thriller

based around the concepts of cost accounting. No, I am not joking. It

really is a thriller - a novel - and the main topic really is cost

accounting.

The book I'm referring to is called The Goal[1] by Eliyahu M. Goldratt.

It uses a passable, attention holding domestic story line as a hook on

which to hang a very interesting exploration of manufacturing processes

and how best to manage them for financial gain.

As a software engineer, I found the book fascinating from two very

different angles. First, it is interesting to think how information

technology can best support the process of physical goods manufacturing

through areas such as robotic automation and telemetry for decision

support. Secondly, it is interesting to think of software systems as

examples of manufacturing systems in which the raw material is data and

the "product" is information.

Goldratt's book brilliantly illustrates how the behavior of bottlenecks

in a manufacturing process impact every other part of the process in a

fundamental way. You need to be intimately aware of all aspects of your

bottlenecks as the health of your entire operation depends on them.

Reading about manufacturing bottlenecks in the book, caused my mind to

wander to the software-system-as-manufacturing-process analogy and to

ask "what are the bottlenecks in software applications?"

In any software system with a lot of data to process, the obvious target

for attention as a possible bottleneck is CPU/RAM. After all, this is

the part of the assembly line through which all data must pass at some

stage. From that conclusion, it is a short step to the follow-on

conclusion that speed of CPU/RAM and by extension efficiency of

processing algorithms executed on that CPU/RAM combination make up the

core of the bottleneck.

Before we pat ourselves on the back and declare the bottleneck found,

let us switch back to physical manufacturing for a quick reality check.

We have machines - computers - that are pretty cheap in comparison to

the cost of most manufacturing equipment. We have lots and lots of data

to process with these computers. Typically many orders of magnitude more

than one machine can process at any one time.

We could either optimize every last scintilla of performance out of one

of those machines or we could get lots of them working on the data in

parallel. The former route costs us lots of time and money in terms of

labor costs (developers) and capital costs for a small number of

top-of-the-range computers. Also, the outcome of the investment in

terms of improved throughput is uncertain. The latter route - lots and

lots of cheap "throwaway" machines - will cost us a fixed amount of

money (low labor costs as we are not optimizing any algorithms) and we

can accurately measure the improvements in throughput we expect to see.

Looking at the problem this way, as a form of manufacturing, it is

pretty much a no-brainer to conclude that splitting inventory into

chunks and getting cheap machines to process the stuff in parallel is

compelling. Common sense right?

If so, how come this approach is so uncommon in application software?

Another book I read recently provides clues. Here is I.L. Auerbach,

quoted from a paper he wrote in 1970[2]:

"The problem [in information systems] has been compounded by laying too

much stress on what poses for efficiency as a design criterion; namely,

speed of computation..."

In the same paper, he says:

"Too frequently, too many designers approach too many problems as

exercises requiring original creativity. The predictable result is that

the system built is unique and is (a) not adaptable to different problem

domains, (b) not easily maintained, and (c) not economic."

There is food for though there. In the thirty four years since Auerbach

wrote that paper, very little has changed. We still think the cure for

the bottleneck of software lies in speeding up the computation part and

using the creativity of expensive engineers to get it.

Maybe we should turn our gaze away from CPU/RAM and way from

hyper-efficient-algorithms in our search for the bottlenecks. Maybe the

bottlenecks lies elsewhere? If we conceptualize the problem as a true

manufacturing process, then splitting the processing into batches and

doing the work in parallel on cheap machines jumps off the page as

obvious thing to do.

Alas, we don't do it. We software engineers just do not conceptualize

what we do in manufacturing terms.

Perhaps the real bottleneck lies between our ears?

[1] "The Goal"

http://www.starvingmind.net/detail/0884270610/The_Goal_A_Process_of_Ongoing_Improvement.html

[2] "The Skyline of Information Processing"

http://www.bookfinder.com/dir/i/The_Skyline_of_Information_Processing/0444104542/

This story, "Bottlenecks in information processing" was originally published by ITworld.

Related:

Copyright © 2004 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon