pgstrata
Revenge of the Nerds
2

May 2002

3

"We were after the C++ programmers.

4

We managed to drag a lot of them about halfway to Lisp."

5

- Guy Steele, co-author of the Java spec

6

In the software business there is an ongoing struggle between the pointy-headed academics, and another equally formidable force, the pointy-haired bosses.

7

Everyone knows who the pointy-haired boss is, right?

8

I think most people in the technology world not only recognize this cartoon character, but know the actual person in their company that he is modelled upon.

9

The pointy-haired boss miraculously combines two qualities that are common by themselves, but rarely seen together: (a) he knows nothing whatsoever about technology, and (b) he has very strong opinions about it.

10

Suppose, for example, you need to write a piece of software.

11

The pointy-haired boss has no idea how this software has to work, and can't tell one programming language from another, and yet he knows what language you should write it in.

12

Exactly.

13

He thinks you should write it in Java.

14

Why does he think this?

15

Let's take a look inside the brain of the pointy-haired boss.

16

What he's thinking is something like this.

17

Java is a standard.

18

I know it must be, because I read about it in the press all the time.

19

Since it is a standard, I won't get in trouble for using it.

20

And that also means there will always be lots of Java programmers, so if the programmers working for me now quit, as programmers working for me mysteriously always do, I can easily replace them.

21

Well, this doesn't sound that unreasonable.

22

But it's all based on one unspoken assumption, and that assumption turns out to be false.

23

The pointy-haired boss believes that all programming languages are pretty much equivalent.

24

If that were true, he would be right on target.

25

If languages are all equivalent, sure, use whatever language everyone else is using.

26

But all languages are not equivalent, and I think I can prove this to you without even getting into the differences between them.

27

If you asked the pointy-haired boss in 1992 what language software should be written in, he would have answered with as little hesitation as he does today.

28

Software should be written in C++.

29

But if languages are all equivalent, why should the pointy-haired boss's opinion ever change?

30

In fact, why should the developers of Java have even bothered to create a new language?

31

Presumably, if you create a new language, it's because you think it's better in some way than what people already had.

32

And in fact, Gosling makes it clear in the first Java white paper that Java was designed to fix some problems with C++.

33

So there you have it: languages are not all equivalent.

34

If you follow the trail through the pointy-haired boss's brain to Java and then back through Java's history to its origins, you end up holding an idea that contradicts the assumption you started with.

35

So, who's right?

36

James Gosling, or the pointy-haired boss?

37

Not surprisingly, Gosling is right.

38

Some languages are better, for certain problems, than others.

39

And you know, that raises some interesting questions.

40

Java was designed to be better, for certain problems, than C++.

41

What problems?

42

When is Java better and when is C++?

43

Are there situations where other languages are better than either of them?

44

Once you start considering this question, you have opened a real can of worms. If the pointy-haired boss had to think about the problem in its full complexity, it would make his brain explode.

45

As long as he considers all languages equivalent, all he has to do is choose the one that seems to have the most momentum, and since that is more a question of fashion than technology, even he can probably get the right answer.

46

But if languages vary, he suddenly has to solve two simultaneous equations, trying to find an optimal balance between two things he knows nothing about: the relative suitability of the twenty or so leading languages for the problem he needs to solve, and the odds of finding programmers, libraries, etc. for each.

47

If that's what's on the other side of the door, it is no surprise that the pointy-haired boss doesn't want to open it.

48

The disadvantage of believing that all programming languages are equivalent is that it's not true.

49

But the advantage is that it makes your life a lot simpler.

50

And I think that's the main reason the idea is so widespread.

3–5

"We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp." — Guy Steele, co-author of the Java spec.

6–9

In software there's an ongoing struggle between the pointy-headed academics and the pointy-haired bosses. Everyone knows the boss; you know the actual person he's modelled on. He combines two qualities rarely seen together: he knows nothing whatsoever about technology, and he has very strong opinions about it.

10–20

Suppose you need to write some software. The boss can't tell one language from another, yet he knows you should write it in Java. Inside his brain: Java is a standard, I won't get in trouble using it, and there'll always be lots of Java programmers, so if mine quit, as they mysteriously always do, I can replace them.

21–25

It doesn't sound unreasonable, but it rests on one unspoken, false assumption: that all programming languages are pretty much equivalent. If that were true he'd be right — use whatever everyone else uses.

26–34

But languages are not equivalent, and I can prove it without even getting into the differences. If they were all equivalent, why would anyone create a new language? Because they think it's better: Gosling's first Java white paper makes clear Java was designed to fix problems with C++. Follow the trail and you end up holding an idea that contradicts the assumption you started with.

35–43

So who's right, Gosling or the boss? Gosling. Some languages are better, for certain problems, than others. And that raises questions: which problems is Java better for, when is C++, and are other languages better than either?

44–47

Once you consider this you've opened a real can of worms. As long as he thinks all languages equivalent, the boss just picks the one with the most momentum. But if languages vary, he must balance two unknowns he knows nothing about: which language suits his problem, and the odds of finding programmers and libraries for it. No surprise he doesn't want to open that door.

48–50

The disadvantage of believing all languages are equivalent is that it's not true. But the advantage is that it makes your life simpler — the main reason the idea is so widespread.

2–50

The pointy-haired boss knows nothing about technology yet has strong opinions, and decrees Java because it's a standard. His comfort rests on one false assumption: that all languages are equivalent.

52

It is a comfortable idea.

53

We know that Java must be pretty good, because it is the cool, new programming language.

54

Or is it?

55

If you look at the world of programming languages from a distance, it looks like Java is the latest thing. (From far enough away, all you can see is the large, flashing billboard paid for by Sun.)

56

But if you look at this world up close, you find that there are degrees of coolness.

57

Within the hacker subculture, there is another language called Perl that is considered a lot cooler than Java.

58

Slashdot, for example, is generated by Perl.

59

I don't think you would find those guys using Java Server Pages.

60

But there is another, newer language, called Python, whose users tend to look down on Perl, and more [blocked] waiting in the wings.

61

If you look at these languages in order, Java, Perl, Python, you notice an interesting pattern.

62

At least, you notice this pattern if you are a Lisp hacker.

63

Each one is progressively more like Lisp.

64

Python copies even features that many Lisp hackers consider to be mistakes.

65

You could translate simple Lisp programs into Python line for line.

66

It's 2002, and programming languages have almost caught up with 1958.

52–60

It is a comfortable idea. We assume Java must be good because it's the cool new language — but from a distance all you see is the flashing billboard Sun paid for. Up close there are degrees of coolness: Perl is cooler than Java (Slashdot is generated by Perl, not Java Server Pages), and newer still is Python, whose users look down on Perl.

61–65

Take these in order — Java, Perl, Python — and you notice a pattern, at least if you're a Lisp hacker. Each is progressively more like Lisp. Python copies even features many Lisp hackers consider mistakes; you could translate simple Lisp programs into Python line for line.

66

It's 2002, and programming languages have almost caught up with 1958.

52–66

Up close, the coolness of languages runs Java, Perl, Python — each progressively more like Lisp. It's 2002, and programming languages have almost caught up with 1958.

68

Catching Up with Math

69

What I mean is that Lisp was first discovered by John McCarthy in 1958, and popular programming languages are only now catching up with the ideas he developed then.

70

Now, how could that be true?

71

Isn't computer technology something that changes very rapidly?

72

I mean, in 1958, computers were refrigerator-sized behemoths with the processing power of a wristwatch.

73

How could any technology that old even be relevant, let alone superior to the latest developments?

74

I'll tell you how.

75

It's because Lisp was not really designed to be a programming language, at least not in the sense we mean today.

76

What we mean by a programming language is something we use to tell a computer what to do.

77

McCarthy did eventually intend to develop a programming language in this sense, but the Lisp that we actually ended up with was based on something separate that he did as a theoretical exercise [blocked]-- an effort to define a more convenient alternative to the Turing Machine.

78

As McCarthy said later,

79

Another way to show that Lisp was neater than Turing machines was to write a universal Lisp function and show that it is briefer and more comprehensible than the description of a universal Turing machine. This was the Lisp function eval..., which computes the value of a Lisp expression.... Writing eval required inventing a notation representing Lisp functions as Lisp data, and such a notation was devised for the purposes of the paper with no thought that it would be used to express Lisp programs in practice.

80

What happened next was that, some time in late 1958, Steve Russell, one of McCarthy's grad students, looked at this definition of eval and realized that if he translated it into machine language, the result would be a Lisp interpreter.

81

This was a big surprise at the time.

82

Here is what McCarthy said about it later in an interview:

83

Steve Russell said, look, why don't I program this eval..., and I said to him, ho, ho, you're confusing theory with practice, this eval is intended for reading, not for computing. But he went ahead and did it. That is, he compiled the eval in my paper into [IBM] 704 machine code, fixing bugs, and then advertised this as a Lisp interpreter, which it certainly was. So at that point Lisp had essentially the form that it has today....

84

Suddenly, in a matter of weeks I think, McCarthy found his theoretical exercise transformed into an actual programming language-- and a more powerful one than he had intended.

85

So the short explanation of why this 1950s language is not obsolete is that it was not technology but math, and math doesn't get stale.

86

The right thing to compare Lisp to is not 1950s hardware, but, say, the Quicksort algorithm, which was discovered in 1960 and is still the fastest general-purpose sort.

87

There is one other language still surviving from the 1950s, Fortran, and it represents the opposite approach to language design.

88

Lisp was a piece of theory that unexpectedly got turned into a programming language.

89

Fortran was developed intentionally as a programming language, but what we would now consider a very low-level one.

90

Fortran I [blocked], the language that was developed in 1956, was a very different animal from present-day Fortran.

91

Fortran I was pretty much assembly language with math.

92

In some ways it was less powerful than more recent assembly languages; there were no subroutines, for example, only branches.

93

Present-day Fortran is now arguably closer to Lisp than to Fortran I.

94

Lisp and Fortran were the trunks of two separate evolutionary trees, one rooted in math and one rooted in machine architecture.

95

These two trees have been converging ever since.

96

Lisp started out powerful, and over the next twenty years got fast. So-called mainstream languages started out fast, and over the next forty years gradually got more powerful, until now the most advanced of them are fairly close to Lisp.

97

Close, but they are still missing a few things....

69–73

Lisp was first discovered by John McCarthy in 1958, and popular languages are only now catching up. How can that be, when computer technology changes so fast? In 1958 computers were refrigerator-sized behemoths with the processing power of a wristwatch. How could anything that old be superior?

75–79

Because Lisp wasn't really designed to be a programming language. The Lisp we got came from a theoretical exercise [blocked]: a more convenient alternative to the Turing Machine. Its core, eval, required a notation representing Lisp functions as Lisp data — devised with no thought it would express real programs.

80–84

Then in late 1958 Steve Russell, one of McCarthy's grad students, realized that translating eval into machine language would yield a Lisp interpreter. McCarthy's reaction: ho, ho, you're confusing theory with practice, eval is for reading, not computing. But Russell compiled it into IBM 704 machine code, and suddenly the theoretical exercise was an actual language — more powerful than McCarthy had intended.

85–86

So the short explanation for why this 1950s language isn't obsolete is that it was math, not technology, and math doesn't get stale. The right comparison isn't 1950s hardware but the Quicksort algorithm, discovered in 1960 and still the fastest general-purpose sort.

87–93

Fortran, the other 1950s survivor, took the opposite approach. Lisp was theory that accidentally became a language; Fortran I [blocked] was built intentionally as one, but very low-level — assembly with math, no subroutines, only branches. Present-day Fortran is arguably closer to Lisp than to Fortran I.

94–97

The two were trunks of separate evolutionary trees, one rooted in math and one in machine architecture, converging ever since. Lisp started out powerful and got fast; mainstream languages started out fast and gradually got more powerful, until the most advanced are fairly close to Lisp. Close, but still missing a few things.

68–97

Lisp isn't obsolete because it was math, not technology — McCarthy's 1958 theoretical exercise that Steve Russell accidentally turned into a working interpreter. Math doesn't get stale.

99

What Made Lisp Different

100

When it was first developed, Lisp embodied nine new ideas.

101

Some of these we now take for granted, others are only seen in more advanced languages, and two are still unique to Lisp.

102

The nine ideas are, in order of their adoption by the mainstream,

103
104

Conditionals.

105

A conditional is an if-then-else construct.

106

We take these for granted now, but Fortran I didn't have them.

107

It had only a conditional goto closely based on the underlying machine instruction.

108
109

A function type.

110

In Lisp, functions are a data type just like integers or strings.

111

They have a literal representation, can be stored in variables, can be passed as arguments, and so on.

112
113

Recursion.

114

Lisp was the first programming language to support it.

115
116

Dynamic typing.

117

In Lisp, all variables are effectively pointers.

118

Values are what have types, not variables, and assigning or binding variables means copying pointers, not what they point to.

119
120

Garbage-collection.

121
122

Programs composed of expressions.

123

Lisp programs are trees of expressions, each of which returns a value.

124

This is in contrast to Fortran and most succeeding languages, which distinguish between expressions and statements.

125

It was natural to have this distinction in Fortran I because you could not nest statements.

126

And so while you needed expressions for math to work, there was no point in making anything else return a value, because there could not be anything waiting for it.

127

This limitation went away with the arrival of block-structured languages, but by then it was too late.

128

The distinction between expressions and statements was entrenched.

129

It spread from Fortran into Algol and then to both their descendants.

130
131

A symbol type.

132

Symbols are effectively pointers to strings stored in a hash table.

133

So you can test equality by comparing a pointer, instead of comparing each character.

134
135

A notation for code using trees of symbols and constants.

136
137

The whole language there all the time.

138

There is no real distinction between read-time, compile-time, and runtime.

139

You can compile or run code while reading, read or run code while compiling, and read or compile code at runtime.

140

Running code at read-time lets users reprogram Lisp's syntax; running code at compile-time is the basis of macros; compiling at runtime is the basis of Lisp's use as an extension language in programs like Emacs; and reading at runtime enables programs to communicate using s-expressions, an idea recently reinvented as XML.

141

When Lisp first appeared, these ideas were far removed from ordinary programming practice, which was dictated largely by the hardware available in the late 1950s.

142

Over time, the default language, embodied in a succession of popular languages, has gradually evolved toward Lisp.

143

Ideas 1-5 are now widespread.

144

Number 6 is starting to appear in the mainstream.

145

Python has a form of 7, though there doesn't seem to be any syntax for it.

146

As for number 8, this may be the most interesting of the lot.

147

Ideas 8 and 9 only became part of Lisp by accident, because Steve Russell implemented something McCarthy had never intended to be implemented.

148

And yet these ideas turn out to be responsible for both Lisp's strange appearance and its most distinctive features.

149

Lisp looks strange not so much because it has a strange syntax as because it has no syntax; you express programs directly in the parse trees that get built behind the scenes when other languages are parsed, and these trees are made of lists, which are Lisp data structures.

150

Expressing the language in its own data structures turns out to be a very powerful feature.

151

Ideas 8 and 9 together mean that you can write programs that write programs. That may sound like a bizarre idea, but it's an everyday thing in Lisp.

152

The most common way to do it is with something called a macro.

153

The term "macro" does not mean in Lisp what it means in other languages.

154

A Lisp macro can be anything from an abbreviation to a compiler for a new language.

155

If you want to really understand Lisp, or just expand your programming horizons, I would learn more [blocked] about macros.

156

Macros (in the Lisp sense) are still, as far as I know, unique to Lisp.

157

This is partly because in order to have macros you probably have to make your language look as strange as Lisp.

158

It may also be because if you do add that final increment of power, you can no longer claim to have invented a new language, but only a new dialect of Lisp.

159

I mention this mostly as a joke, but it is quite true.

160

If you define a language that has car, cdr, cons, quote, cond, atom, eq, and a notation for functions expressed as lists, then you can build all the rest of Lisp out of it.

161

That is in fact the defining quality of Lisp: it was in order to make this so that McCarthy gave Lisp the shape it has.

100–102

When first developed, Lisp embodied nine new ideas. Some we take for granted, some appear only in advanced languages, and two are still unique to Lisp. In order of their adoption by the mainstream:

103–114
  1. Conditionals — if-then-else; Fortran I had only a conditional goto. 2. A function type: in Lisp functions are a data type like integers, with a literal representation, storable in variables, passable as arguments. 3. Recursion — Lisp was the first language to support it.
115–120
  1. Dynamic typing. All variables are effectively pointers; values have types, not variables, and assigning means copying pointers, not what they point to. 5. Garbage collection.
121–129
  1. Programs composed of expressions. Lisp programs are trees of expressions, each returning a value — unlike Fortran and its successors, which split expressions from statements, a distinction natural in Fortran I but which outlived its cause and spread into Algol and beyond.
130–135
  1. A symbol type: pointers to strings in a hash table, so you test equality by comparing a pointer, not each character. 8. A notation for code using trees of symbols and constants.
136–140
  1. The whole language there all the time — no real distinction between read-time, compile-time, and runtime. Running code at read-time lets you reprogram Lisp's syntax; at compile-time it's the basis of macros; at runtime it makes Lisp an extension language for programs like Emacs; and reading at runtime lets programs communicate using s-expressions, recently reinvented as XML.
141–145

Over time the default language has evolved toward Lisp: ideas 1-5 are widespread, 6 is starting to appear, and Python has a form of 7.

146–150

Number 8 may be the most interesting. Ideas 8 and 9 entered Lisp only by accident — yet they're responsible for both its strange appearance and its most distinctive features. Lisp looks strange not because it has a strange syntax but because it has no syntax: you write directly in the parse trees other languages build behind the scenes, and those trees are lists, which are Lisp data structures.

151–155

Together they mean you can write programs that write programs — bizarre-sounding, but everyday in Lisp, usually done with a macro. The term doesn't mean in Lisp what it means elsewhere: a Lisp macro can be anything from an abbreviation to a compiler for a new language. To really understand Lisp, learn more [blocked] about macros.

156–161

Macros are still, as far as I know, unique to Lisp — partly because to have them your language has to look as strange as Lisp, and partly because once you add that final increment of power you've invented not a new language but a new dialect of Lisp. Mostly a joke, but true: define a language with car, cdr, cons, quote, cond, atom, eq, and functions as lists, and you can build all the rest of Lisp from it.

99–161

Lisp embodied nine ideas, from conditionals to garbage collection to code-as-data; the mainstream has absorbed most. The last two, born by accident, let you write programs that write programs — macros, still unique to Lisp.

163

Where Languages Matter

164

So suppose Lisp does represent a kind of limit that mainstream languages are approaching asymptotically-- does that mean you should actually use it to write software?

165

How much do you lose by using a less powerful language?

166

Isn't it wiser, sometimes, not to be at the very edge of innovation?

167

And isn't popularity to some extent its own justification?

168

Isn't the pointy-haired boss right, for example, to want to use a language for which he can easily hire programmers?

169

There are, of course, projects where the choice of programming language doesn't matter much.

170

As a rule, the more demanding the application, the more leverage you get from using a powerful language.

171

But plenty of projects are not demanding at all.

172

Most programming probably consists of writing little glue programs, and for little glue programs you can use any language that you're already familiar with and that has good libraries for whatever you need to do.

173

If you just need to feed data from one Windows app to another, sure, use Visual Basic.

174

You can write little glue programs in Lisp too (I use it as a desktop calculator), but the biggest win for languages like Lisp is at the other end of the spectrum, where you need to write sophisticated programs to solve hard problems in the face of fierce competition.

175

A good example is the airline fare search program [blocked] that ITA Software licenses to Orbitz.

176

These guys entered a market already dominated by two big, entrenched competitors, Travelocity and Expedia, and seem to have just humiliated them technologically.

177

The core of ITA's application is a 200,000 line Common Lisp program that searches many orders of magnitude more possibilities than their competitors, who apparently are still using mainframe-era programming techniques. (Though ITA is also in a sense using a mainframe-era programming language.)

178

I have never seen any of ITA's code, but according to one of their top hackers they use a lot of macros, and I am not surprised to hear it.

164–168

So if Lisp is a limit mainstream languages approach asymptotically, should you use it? How much do you lose with a less powerful language? Isn't it wiser not to be at the edge, isn't popularity its own justification, isn't the boss right to want a language he can easily hire for?

169–173

There are projects where language choice doesn't matter much. As a rule, the more demanding the application, the more leverage a powerful language gives — but plenty of projects aren't demanding. Most programming is little glue programs, and for those any familiar language with good libraries works. To feed data between two Windows apps, sure, use Visual Basic.

174–178

The biggest win is at the other end, where you solve hard problems against fierce competition. A good example is the airline fare search [blocked] ITA Software licenses to Orbitz. They entered a market dominated by two entrenched competitors, Travelocity and Expedia, and humiliated them technologically: a 200,000-line Common Lisp program searching orders of magnitude more possibilities than rivals still using mainframe-era techniques. A top hacker says they use a lot of macros, and I'm not surprised.

163–178

For little glue programs any familiar language will do; the big win for Lisp is at the hard end, against fierce competition. ITA's 200,000-line Common Lisp fare search humiliated entrenched rivals.

180

Centripetal Forces

181

I'm not saying there is no cost to using uncommon technologies.

182

The pointy-haired boss is not completely mistaken to worry about this.

183

But because he doesn't understand the risks, he tends to magnify them.

184

I can think of three problems that could arise from using less common languages.

185

Your programs might not work well with programs written in other languages.

186

You might have fewer libraries at your disposal.

187

And you might have trouble hiring programmers.

188

How much of a problem is each of these?

189

The importance of the first varies depending on whether you have control over the whole system.

190

If you're writing software that has to run on a remote user's machine on top of a buggy, closed operating system (I mention no names), there may be advantages to writing your application in the same language as the OS.

191

But if you control the whole system and have the source code of all the parts, as ITA presumably does, you can use whatever languages you want.

192

If any incompatibility arises, you can fix it yourself.

193

In server-based applications you can get away with using the most advanced technologies, and I think this is the main cause of what Jonathan Erickson calls the "programming language renaissance."

194

This is why we even hear about new languages like Perl and Python.

195

We're not hearing about these languages because people are using them to write Windows apps, but because people are using them on servers.

196

And as software shifts off the desktop [blocked] and onto servers (a future even Microsoft seems resigned to), there will be less and less pressure to use middle-of-the-road technologies.

197

As for libraries, their importance also depends on the application.

198

For less demanding problems, the availability of libraries can outweigh the intrinsic power of the language.

199

Where is the breakeven point?

200

Hard to say exactly, but wherever it is, it is short of anything you'd be likely to call an application.

201

If a company considers itself to be in the software business, and they're writing an application that will be one of their products, then it will probably involve several hackers and take at least six months to write.

202

In a project of that size, powerful languages probably start to outweigh the convenience of pre-existing libraries.

203

The third worry of the pointy-haired boss, the difficulty of hiring programmers, I think is a red herring.

204

How many hackers do you need to hire, after all?

205

Surely by now we all know that software is best developed by teams of less than ten people.

206

And you shouldn't have trouble hiring hackers on that scale for any language anyone has ever heard of.

207

If you can't find ten Lisp hackers, then your company is probably based in the wrong city for developing software.

208

In fact, choosing a more powerful language probably decreases the size of the team you need, because (a) if you use a more powerful language you probably won't need as many hackers, and (b) hackers who work in more advanced languages are likely to be smarter.

209

I'm not saying that you won't get a lot of pressure to use what are perceived as "standard" technologies.

210

At Viaweb (now Yahoo Store), we raised some eyebrows among VCs and potential acquirers by using Lisp.

211

But we also raised eyebrows by using generic Intel boxes as servers instead of "industrial strength" servers like Suns, for using a then-obscure open-source Unix variant called FreeBSD instead of a real commercial OS like Windows NT, for ignoring a supposed e-commerce standard called SET that no one now even remembers, and so on.

212

You can't let the suits make technical decisions for you.

213

Did it alarm some potential acquirers that we used Lisp?

214

Some, slightly, but if we hadn't used Lisp, we wouldn't have been able to write the software that made them want to buy us.

215

What seemed like an anomaly to them was in fact cause and effect.

216

If you start a startup, don't design your product to please VCs or potential acquirers. Design your product to please the users. If you win the users, everything else will follow.

217

And if you don't, no one will care how comfortingly orthodox your technology choices were.

181–187

There's a cost to uncommon technologies; the boss isn't completely wrong to worry, but because he doesn't understand the risks he magnifies them. Three problems can arise: your programs might not interoperate well, you might have fewer libraries, and you might have trouble hiring.

188–192

Interoperation depends on whether you control the whole system. But if you control everything and have all the source, as ITA does, you can use whatever you want and fix any incompatibility yourself.

193–196

In server-based applications you can use the most advanced technologies — the main cause of what Jonathan Erickson calls the "programming language renaissance." That's why we hear about Perl and Python at all: not from Windows apps but from servers. As software shifts off the desktop [blocked] onto servers, there'll be less pressure to use middle-of-the-road technologies.

197–202

Libraries also depend on the application: their availability can outweigh the language's power, but only short of anything you'd call an application. A software company writing one of its own products will put several hackers on it over at least six months — and at that size, powerful languages outweigh pre-existing libraries.

203–208

The third worry, hiring, is a red herring. Software is best developed by teams of fewer than ten, and you shouldn't struggle to hire at that scale for any known language. If you can't find ten Lisp hackers, your company is probably in the wrong city. In fact a more powerful language decreases the team you need: fewer hackers, and the ones who use advanced languages tend to be smarter.

209–215

You will get pressure to use "standard" technologies. At Viaweb we raised eyebrows by using Lisp — also generic Intel boxes instead of Suns, FreeBSD instead of Windows NT, and ignoring a supposed e-commerce standard, SET, no one now remembers. You can't let the suits make technical decisions. Lisp alarmed some acquirers slightly — but if we hadn't used it, we couldn't have written the software that made them want to buy us. The anomaly was cause and effect.

216–217

If you start a startup, don't design your product to please VCs or acquirers. Design your product to please the users. If you win the users, everything else will follow — and if you don't, no one will care how comfortingly orthodox your technology choices were.

180–217

The boss's three worries — interoperation, libraries, hiring — are real but magnified. The fix isn't orthodoxy: design your product to please the users, and everything else follows.

219

The Cost of Being Average

220

How much do you lose by using a less powerful language?

221

There is actually some data out there about that.

222

The most convenient measure of power is probably code size [blocked].

223

The point of high-level languages is to give you bigger abstractions-- bigger bricks, as it were, so you don't need as many to build a wall of a given size.

224

So the more powerful the language, the shorter the program (not simply in characters, of course, but in distinct elements).

225

How does a more powerful language enable you to write shorter programs?

226

One technique you can use, if the language will let you, is something called bottom-up programming [blocked].

227

Instead of simply writing your application in the base language, you build on top of the base language a language for writing programs like yours, then write your program in it.

228

The combined code can be much shorter than if you had written your whole program in the base language-- indeed, this is how most compression algorithms work.

229

A bottom-up program should be easier to modify as well, because in many cases the language layer won't have to change at all.

230

Code size is important, because the time it takes to write a program depends mostly on its length.

231

If your program would be three times as long in another language, it will take three times as long to write-- and you can't get around this by hiring more people, because beyond a certain size new hires are actually a net lose.

232

Fred Brooks described this phenomenon in his famous book The Mythical Man-Month, and everything I've seen has tended to confirm what he said.

233

So how much shorter are your programs if you write them in Lisp?

234

Most of the numbers I've heard for Lisp versus C, for example, have been around 7-10x.

235

But a recent article about ITA in New Architect magazine said that "one line of Lisp can replace 20 lines of C," and since this article was full of quotes from ITA's president, I assume they got this number from ITA.

236

If so then we can put some faith in it; ITA's software includes a lot of C and C++ as well as Lisp, so they are speaking from experience.

237

My guess is that these multiples aren't even constant.

238

I think they increase when you face harder problems and also when you have smarter programmers.

239

A really good hacker can squeeze more out of better tools.

240

As one data point on the curve, at any rate, if you were to compete with ITA and chose to write your software in C, they would be able to develop software twenty times faster than you.

241

If you spent a year on a new feature, they'd be able to duplicate it in less than three weeks.

242

Whereas if they spent just three months developing something new, it would be five years before you had it too.

243

And you know what?

244

That's the best-case scenario.

245

When you talk about code-size ratios, you're implicitly assuming that you can actually write the program in the weaker language.

246

But in fact there are limits on what programmers can do.

247

If you're trying to solve a hard problem with a language that's too low-level, you reach a point where there is just too much to keep in your head at once.

248

So when I say it would take ITA's imaginary competitor five years to duplicate something ITA could write in Lisp in three months, I mean five years if nothing goes wrong.

249

In fact, the way things work in most companies, any development project that would take five years is likely never to get finished at all.

250

I admit this is an extreme case.

251

ITA's hackers seem to be unusually smart, and C is a pretty low-level language.

252

But in a competitive market, even a differential of two or three to one would be enough to guarantee that you'd always be behind.

220–224

How much do you lose with a less powerful language? There's data. The most convenient measure of power is probably code size [blocked]: high-level languages give bigger abstractions — bigger bricks — so you need fewer to build a wall of a given size. The more powerful the language, the shorter the program.

230–232

Code size matters because the time to write a program depends mostly on its length. A program three times as long takes three times as long to write — and you can't fix that by hiring, since beyond a certain size new hires are a net loss. Fred Brooks described this in The Mythical Man-Month, and everything I've seen confirms it.

233–236

So how much shorter is Lisp? Most numbers I've heard for Lisp versus C are around 7-10x. But a recent article about ITA in New Architect said "one line of Lisp can replace 20 lines of C," and since it quoted ITA's president, I assume the number came from ITA — whose software includes a lot of C too, so they speak from experience.

237–242

My guess is these multiples aren't even constant; they increase with harder problems and smarter programmers. As one data point: if you competed with ITA in C, they'd develop software twenty times faster. A year of your work, they'd duplicate in under three weeks; and if they spent three months on something new, it'd be five years before you had it too.

243–249

And that's the best case. Code-size ratios assume you can actually write the program in the weaker language, but with one too low-level you reach a point where there's just too much to keep in your head. So those five years are if nothing goes wrong — and in most companies, any five-year project is likely never to get finished at all.

250–252

ITA's an extreme case — its hackers are unusually smart, C pretty low-level. But in a competitive market, even a two-or-three-to-one differential would keep you always behind.

219–252

Power is measured by code size, and the time to write a program tracks its length. ITA could outpace a C competitor twentyfold — and that's the best case, since the hard version may be unwritable in a weak language at all.

254

A Recipe

255

This is the kind of possibility that the pointy-haired boss doesn't even want to think about.

256

And so most of them don't.

257

Because, you know, when it comes down to it, the pointy-haired boss doesn't mind if his company gets their ass kicked, so long as no one can prove it's his fault.

258

The safest plan for him personally is to stick close to the center of the herd.

259

Within large organizations, the phrase used to describe this approach is "industry best practice."

260

Its purpose is to shield the pointy-haired boss from responsibility: if he chooses something that is "industry best practice," and the company loses, he can't be blamed.

261

He didn't choose, the industry did.

262

I believe this term was originally used to describe accounting methods and so on.

263

What it means, roughly, is don't do anything weird. And in accounting that's probably a good idea.

264

The terms "cutting-edge" and "accounting" do not sound good together.

265

But when you import this criterion into decisions about technology, you start to get the wrong answers.

266

Technology often should be cutting-edge.

267

In programming languages, as Erann Gat has pointed out, what "industry best practice" actually gets you is not the best, but merely the average.

268

When a decision causes you to develop software at a fraction of the rate of more aggressive competitors, "best practice" is a misnomer.

269

So here we have two pieces of information that I think are very valuable.

270

In fact, I know it from my own experience.

271

Number 1, languages vary in power.

272

Number 2, most managers deliberately ignore this.

273

Between them, these two facts are literally a recipe for making money.

274

ITA is an example of this recipe in action.

275

If you want to win in a software business, just take on the hardest problem you can find, use the most powerful language you can get, and wait for your competitors' pointy-haired bosses to revert to the mean.

255–258

This is the possibility the boss doesn't want to think about, so most don't. Because when it comes down to it, he doesn't mind his company getting its ass kicked, so long as no one can prove it's his fault. The safest plan for him is to stick close to the center of the herd.

259–261

In large organizations this is called "industry best practice." Its purpose is to shield the boss from responsibility: if he picks "best practice" and the company loses, he can't be blamed — he didn't choose, the industry did.

262–268

The term comes from accounting and means, roughly, don't do anything weird — good advice there, where "cutting-edge" and "accounting" don't sound good together. But technology often should be cutting-edge. As Erann Gat has pointed out, "industry best practice" gets you not the best but merely the average; when it makes you develop at a fraction of competitors' rate, "best practice" is a misnomer.

269–275

So we have two facts I know from experience: languages vary in power, and most managers deliberately ignore this. Between them they're literally a recipe for making money, and ITA is an example in action. To win in a software business, take on the hardest problem you can find, use the most powerful language you can get, and wait for your competitors' pointy-haired bosses to revert to the mean.

254–275

The boss sticks to "industry best practice" to dodge blame, which in technology gets you the average, not the best. The recipe: take the hardest problem, use the most powerful language, and wait for your rivals' bosses to revert to the mean.

277

278

Appendix: Power

279

As an illustration of what I mean about the relative power of programming languages, consider the following problem.

280

We want to write a function that generates accumulators-- a function that takes a number n, and returns a function that takes another number i and returns n incremented by i.

281

(That's incremented by, not plus.

282

An accumulator has to accumulate.)

283

In Common Lisp this would be (defun foo (n) (lambda (i) (incf n i))) and in Perl 5, sub foo { my ($n) = @_; sub {$n += shift} } which has more elements than the Lisp version because you have to extract parameters manually in Perl.

284

In Smalltalk the code is slightly longer than in Lisp foo: n |s| s := n. ^[:i| s := s+i. ] because although in general lexical variables work, you can't do an assignment to a parameter, so you have to create a new variable s.

285

In Javascript the example is, again, slightly longer, because Javascript retains the distinction between statements and expressions, so you need explicit return statements to return values: function foo(n) { return function (i) { return n += i } } (To be fair, Perl also retains this distinction, but deals with it in typical Perl fashion by letting you omit returns.)

286

If you try to translate the Lisp/Perl/Smalltalk/Javascript code into Python you run into some limitations.

287

Because Python doesn't fully support lexical variables, you have to create a data structure to hold the value of n.

288

And although Python does have a function data type, there is no literal representation for one (unless the body is only a single expression) so you need to create a named function to return.

289

This is what you end up with: def foo(n): s = [n] def bar(i): s[0] += i return s[0] return bar Python users might legitimately ask why they can't just write def foo(n): return lambda i: return n += i or even def foo(n): lambda i: n += i and my guess is that they probably will, one day. (But if they don't want to wait for Python to evolve the rest of the way into Lisp, they could always just...)

290

In OO languages, you can, to a limited extent, simulate a closure (a function that refers to variables defined in enclosing scopes) by defining a class with one method and a field to replace each variable from an enclosing scope.

291

This makes the programmer do the kind of code analysis that would be done by the compiler in a language with full support for lexical scope, and it won't work if more than one function refers to the same variable, but it is enough in simple cases like this.

292

Python experts seem to agree that this is the preferred way to solve the problem in Python, writing either def foo(n): class acc: def __init__(self, s): self.s = s def inc(self, i): self.s += i return self.s return acc(n).inc or class foo: def __init__(self, n): self.n = n def __call__(self, i): self.n += i return self.n I include these because I wouldn't want Python advocates to say I was misrepresenting the language, but both seem to me more complex than the first version.

293

You're doing the same thing, setting up a separate place to hold the accumulator; it's just a field in an object instead of the head of a list. And the use of these special, reserved field names, especially __call__, seems a bit of a hack.

294

In the rivalry between Perl and Python, the claim of the Python hackers seems to be that that Python is a more elegant alternative to Perl, but what this case shows is that power is the ultimate elegance: the Perl program is simpler (has fewer elements), even if the syntax is a bit uglier.

295

How about other languages?

296

In the other languages mentioned in this talk-- Fortran, C, C++, Java, and Visual Basic-- it is not clear whether you can actually solve this problem.

297

Ken Anderson says that the following code is about as close as you can get in Java: public interface Inttoint { public int call(int i); } public static Inttoint foo(final int n) { return new Inttoint() { int s = n; public int call(int i) { s = s + i; return s; }}; } This falls short of the spec because it only works for integers.

298

After many email exchanges with Java hackers, I would say that writing a properly polymorphic version that behaves like the preceding examples is somewhere between damned awkward and impossible.

299

If anyone wants to write one I'd be very curious to see it, but I personally have timed out.

300

It's not literally true that you can't solve this problem in other languages, of course.

301

The fact that all these languages are Turing-equivalent means that, strictly speaking, you can write any program in any of them.

302

So how would you do it?

303

In the limit case, by writing a Lisp interpreter in the less powerful language.

304

That sounds like a joke, but it happens so often to varying degrees in large programming projects that there is a name for the phenomenon, Greenspun's Tenth Rule:

305

Any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp.

306

If you try to solve a hard problem, the question is not whether you will use a powerful enough language, but whether you will (a) use a powerful language, (b) write a de facto interpreter for one, or (c) yourself become a human compiler for one.

307

We see this already begining to happen in the Python example, where we are in effect simulating the code that a compiler would generate to implement a lexical variable.

308

This practice is not only common, but institutionalized.

309

For example, in the OO world you hear a good deal about "patterns".

310

I wonder if these patterns are not sometimes evidence of case (c), the human compiler, at work.

311

When I see patterns in my programs, I consider it a sign of trouble.

312

The shape of a program should reflect only the problem it needs to solve.

313

Any other regularity in the code is a sign, to me at least, that I'm using abstractions that aren't powerful enough-- often that I'm generating by hand the expansions of some macro that I need to write.

279–282

As an illustration of relative power, consider writing a function that generates accumulators: it takes a number n and returns a function that takes i and returns n incremented by i. (That's incremented by, not plus. An accumulator has to accumulate.)

283–289

In Common Lisp this is (defun foo (n) (lambda (i) (incf n i))). Perl 5 is longer because you extract parameters manually; Smalltalk because you can't assign to a parameter; Javascript because it keeps the statement/expression distinction. Python hits real limits: lacking full lexical variables, you create a structure to hold n; lacking a function literal, you make a named function. Users might ask why they can't just write the obvious one-liner — one day, I'd guess, they will.

290–294

Python experts prefer to simulate a closure with a class holding a field per variable — more complex, and reserved names like __call__ seem a hack. They claim Python is more elegant than Perl, but this case shows power is the ultimate elegance: the Perl program is simpler even if uglier.

295–299

In Fortran, C, C++, Java, and Visual Basic it's not clear you can solve this at all. Ken Anderson's closest Java only works for integers; a properly polymorphic version is somewhere between damned awkward and impossible.

300–305

It's not literally impossible — these languages are Turing-equivalent. In the limit case you'd write a Lisp interpreter in the less powerful language. That sounds like a joke, but it happens so often there's a name for it, Greenspun's Tenth Rule: Any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp.

306–313

So the question isn't whether you'll use a powerful enough language, but whether you'll (a) use one, (b) write a de facto interpreter for one, or (c) become a human compiler for one. In the OO world you hear about "patterns," and I wonder if they aren't sometimes case (c) at work. When I see patterns in my programs I consider it a sign of trouble: a program's shape should reflect only the problem it solves. Any other regularity means I'm hand-expanding some macro I need to write.

277–313

The accumulator-generator problem gets harder as languages weaken — easy in Lisp and Perl, awkward in Python, near-impossible in Java. Greenspun's Tenth Rule: any big enough C program reimplements half of Lisp.

315

Notes

316
  • The IBM 704 CPU was about the size of a refrigerator, but a lot heavier.
317

The CPU weighed 3150 pounds, and the 4K of RAM was in a separate box weighing another 4000 pounds.

318

The Sub-Zero 690, one of the largest household refrigerators, weighs 656 pounds.

  • Steve Russell also wrote the first (digital) computer game, Spacewar, in 1962.
  • If you want to trick a pointy-haired boss into letting you write software in Lisp, you could try telling him it's XML.
  • Here is the accumulator generator in other Lisp dialects: Scheme: (define (foo n) (lambda (i) (set! n (+ n i)) n)) Goo: (df foo (n) (op incf n _))) Arc: (def foo (n) [++ n _])
  • Erann Gat's sad tale about "industry best practice" at JPL inspired me to address this generally misapplied phrase.
  • Peter Norvig found that 16 of the 23 patterns in Design Patterns were "invisible or simpler" in Lisp.
  • Thanks to the many people who answered my questions about various languages and/or read drafts of this, including Ken Anderson, Trevor Blackwell, Erann Gat, Dan Giffin, Sarah Harlin, Jeremy Hylton, Robert Morris, Peter Norvig, Guy Steele, and Anton van Straaten.
319

They bear no blame for any opinions expressed.

320

Related:

321

Many people have responded to this talk, so I have set up an additional page to deal with the issues they have raised: Re: Revenge of the Nerds [blocked].

322

It also set off an extensive and often useful discussion on the LL1 mailing list. See particularly the mail by Anton van Straaten on semantic compression.

323

Some of the mail on LL1 led me to try to go deeper into the subject of language power in Succinctness is Power [blocked].

324

A larger set of canonical implementations of the accumulator generator benchmark [blocked] are collected together on their own page.

316–318

The IBM 704 CPU was about the size of a refrigerator but much heavier — 3150 pounds, plus 4000 for the box holding 4K of RAM. Steve Russell also wrote the first digital computer game, Spacewar, in 1962. To trick a boss into letting you use Lisp, try telling him it's XML. Peter Norvig found that 16 of the 23 patterns in Design Patterns were "invisible or simpler" in Lisp.

320–324

Many people responded to this talk, so I set up Re: Revenge of the Nerds [blocked] for the issues they raised. It also set off a discussion on the LL1 mailing list — see Anton van Straaten on semantic compression — which led me to go deeper in Succinctness is Power [blocked].

315–324

Marginalia: the refrigerator-sized IBM 704, Russell's Spacewar, the trick of calling Lisp "XML," Norvig on design patterns, and pointers to the follow-up debate.