Paper Review of "Hints On Computer System Design"
The first paper I read (although the second paper I'll review) in my personal quest towards becoming a software engineer was Butler Lampson's Hints for Computer System Design, 1983. It's an excellent paper, which has given me a great deal of hope that becoming a competent designer of complex systems is attainable, while reinforcing my growing belief that such individuals are rather rare.
The paper itself is extremely practical, and walks through a number of small ideas about improving system design. There is very little mathematics (in the strictest sense, L. Lamport seems to argue that the sense of mathematics which excludes computer systems engineering is artificially small), nor is there much of a logical structure to the hints: it's simply an approachable collection of good ideas for software engineering.
(The ideas behind the hints are Lampson's, the wording of the hints are modified versions of Lampson's originals, and the commentary following the hints is my own.)
Writing Interfaces
- Interfaces seperate clients from implementation.
- Interfaces's are hard to design (they constitute a mini-lanaguage).
- Inaccurate assumptions can lead to inept interfaces.
- Interfaces should be kept as stable as possible.
- Interfaces should not expose the secrets of the underlying implementation.
- Don't design an interface which contains functionality you don't know how to implement.
- Interfaces shouldn't hide power.
- Some functionality can be left to the client (i.e. don't solve a simple client problem with a complex server solution).
Poorly defined interfaces kill projects. Again--I repeat from the core of my being--poorly defined interfaces kill projects. A well written interface can shield consumers from a great deal of complication and frustration, while a bad interface can make a simple process complex and a quick process slow.
The art of writing interfaces needs further work at reducing it into a science, towards which ends these hints are a nice start waiting a thorough ending.
System Recovery
- Systems must support end-to-end recovery.
- If you support end-to-end recovery, all other recoveries are optimization.
- Actions should be either atomic or restartable.
These three succinct statements have caused me to truly reconsider system reliability and recover, and to develop a more systematic approach for the current systems I am working on, and to build in these fundamentals at the ground floor for future projects.
These changes go big, and they also go small. Consider the differences between these two snippets:
# snippet one
for file in files:
try:
with open(file, 'r') as in:
data = in.read()
# do something
os.remove(file)
except:
attempt_recovery(file)
# snippet two
try:
for file in files:
with open(file, 'r') as in:
data = in.read()
# do something
for file in files:
os.remove(file)
except:
attempt_recovery(files)
The first snippet is makes operating on each file an atomic operation, deletes important recovery context (the contents of all previously successful files) as it goes, making it impossible to restart the entire operation after reaching a late error.
It is likely more efficient to only attempt recovery of the files where the operation fails, but the option to rollback and reattempt the process from the beginning is no longer available. Instead of returning the system to a known code-path for recovery (reverting to beginning, trying again), a recovery specific code-path is written.
Generalization & Abstraction
- Do one thing, do it well.
- Don't generalize for generalization's sake.
Programs maintain a balance between abstract and concrete, and veering too far one direction or another has obvious consequences. The greater the number of individuals who need to understand a given code path, the more I would err towards simplicity over flexibility and power (not that these things are necessary on opposite sides of the spectrum, but often they are).
There is a frequently suggested idea that generalized code is generally less efficient than more focused code. If the concrete code is designed in such a way to take advantage of additional knowledge about each subset of problems (optimization for adding strings as opposed to adding integers) that is certainly true, but often I find that abstraction allows a single code path written with greater care. It certainly is possible to nail yourself to the wall with either approach.
Performance
- Handle the best case and worst case seperately. Best case must behave efficiently; worst case must make progress.
- Use static analysis.
- Cache answers to expensive questions.
- Use hints to speed up calculation.
- Use batch processing.
- Safety first, speed afterwards.
I think the performance advice is fairly straight-forward with the exception of two bits: handling best and worst case seperately, and using hints.
Using hints is the concept that you can often take advantage of some highly likely but not guaranteed aspect of your dataset to provide faster operations. For example, you might choose an insertion sort over a quicksort if you know the majority of your dataset will already be in-order.
Planning
- When making changes, keep a place to stand.
- Plan to throw one away.
- Divide & conquer.
These three pieces of advice speak strongly to the design decisions my team has been making over the last half-year or so. We've had to upgrade one interface quite substantially, while leaving the previous version intact. Certain parts of the system have been rewritten with the benefit of a clearer understanding of the problem they are solving.
In my experience thus far, the hardest part is always finding an optimal way to slice a project up such that each individual has a portion that is suited to their skill-level and preferences, while still making sure the project actually gets done. System design strikes me as intimately involved with understanding the engineers who are doing the implementation.
Higher Order Functions
- Use procedures as function arguments.
This is a wonderful hint, the value of which is often obscured by the recent reliance on languages which only support the object-oriented paradigm for programming.
A personal example for passing function arguments is when performing stream-processing on incoming data, where you need to perform different operations (or a different order of operations) depending on the input.
process_stream({json, Data}) ->
process(Data, [fun convert_to_xml/1, fun validate_xml/1, fun send_to_server/1]);
process_stream({xml, Data}) ->
process(Data, [fun validate_xml/1, fun send_to_server/1]).
process(Data, Funcs) ->
lists:foldr(fun(NextFun, InData) -> NextFun(InData) end, Data, Funcs).
This leads to a concise, simple and composable stream processing mechanism which can be easily modified in the future. Certainly this can be done without passing functions as parameters, but it is a fairly clean solution.
That said, what Lampson was really thinking of was probably more along the lines of writing a generic filter function which takes a predicate function as a parameter.
def my_filter(lst, filter_fun):
new_lst = []
for obj in lst:
if filter_fun(obj):
lst.append(obj)
return new_lst
filtered = my_filter([1,2,3,-1,5,-3,0], lambda x : x > 0)
This is a rather handy pattern for utility functions.
Promising Reading Fodder from References
Looking through the references, there are a number of papers which seem to carry some promise, which I'll be looking for in the future. That said, I haven't read these papers yet, so it may turn out my optimism is unfounded; please point out my folly if you've read one of the papers before and found it lacking.
- Gray, J. et al. The recovery manager of the System R database manager. Computing Surveys 13, 2, June 1981, pp 223-242.
- Hoare, C.A.R. Hints on programming language design. SIGACT/SIGPLAN Symposium on Principles of Programming Languages, Boston, Oct. 1973.
- Lampson, B.W. and Sturgis, H.E. Atomic transactions. In Distributed Systems — An Advanced Course, Lecture Notes in Computer Science 105, Springer, 1981, pp 246-265.
- Lampson, B.W. and Sturgis, H.E. Reflections on an operating system design. Comm. ACM 19, 5, May 1976, pp 251-265.
- Reed, D. Naming and Synchronization in a Decentralized Computer System, MIT LCS TR-205. Sep. 1978.
Final Thoughts
As I said in the introduction, this is a tremendous paper; it is extremely approachable for laymen, provides a great deal of thought-provoking ideas, and is coming from an extremely successful engineer. Looking through the hints, essentially all of them directly apply to decisions my team has made or is making at work. Areas where we made conflicting have often turned into swampy code pits.
This paper probably doesn't have too much depth to offer an experienced developer, but provides a great deal of breadth, and serves well as a checklist for software engineering decisions.