README: 2006 Nov 09 - 2010 Nov 14

Sun Nov 14 23:09:00 2010

Correctness is a Constraint, Performance is a Goal

The recent brouhaha over the change in glibc memcpy behaviour (which broke some notable bits of non-open-source code, like Adobe Flash) suggests that while it might have made some loose sense "back in the day" (note that "the day" was before memmove was introduced at all; my recollection is that the first memcpy that wasn't overlap-safe was an early SunOS for SPARC version - which cause lots of pain when porting 68k-SunOS code in the early 1990's) the thing that should have been done (and we should do now) is to make that "undefined" behaviour simply be abort().

In the days of SPECINT92, the raw speed of very-short memcpy's might have mattered enough to strip off the tests - but we're talking two compares (which should have very good branch prediction properties, and which compilers that are already inlining should be able to resolve in many cases anyhow.) I do appreciate having memcpy be efficient per byte-moved, after having it not scramble data.

For those who believe the speed matters - well, we could leave behind a career_limiting_memcpy that doesn't have the checks, and see who's willing to justify using it...

Exercise for the reader: take Linus' sample mymemcpy from https://bugzilla.redhat.com/show_bug.cgi?id=638477#c38 and add the abort-on-overlap test to it - then run chrome, firefox, or other large-project of your choice. Note that every time it aborts is probably a reportable bug -- but it's a lot cheaper than running it under fullscale valgrind.

Footnotes:

Fri Nov 10 01:26:00 2006

Premature Optimization and Object-Oriented Blame

The year is 1989. A small project team, working on a C++ project (at a time when "5 years of C++ experience" meant your name was Bjarne or Andrew); two adventurous types who'd used the language on toy projects, one experienced but not terribly detail-oriented one, and an enthusiastic newbie. Rush-rush application, starting from a throwaway prototype in smalltalk that showed great UI but 1% of the required performance; thus, sufficiently neophilic management to buy into the idea of doing rapid development with brand new tools - or maybe they felt that if we were wrong, we'd at least fail quickly too :-)

About two months in, the UI is going well (one of the things the C++ object model was actually quite well suited for is graphics classes, something like

drawable
window
shape

and maybe another layer of input handling, broken down along the same geometric hierarchy, oh and replacing drawable with implementations in SunView, Domain, and X11 in about a day, made us very popular) but we're starting to load enough data to start seeing slow-to-reproduce difficult-to-narrow-down bugs in the underlying data structures.

After some in-fighting over whose classes were the problem (the UI side or the data-wrangling side, different developers) we come upon our first significant application of the true power of C++: object oriented blame. Since the UI classes and the data classes talked to each other over a very narrow interface, it was easy to stub out one or the other with a "stupid but obviously correct" implementation for testing... UI classes that just printed text instead of drawing anything, data classes with hard coded values, that sort of thing. Not only does this technique work for finding bugs - in 2006, we call it Mock Object Testing - but it has some psychological benefit for the programmers involved, in terms of keeping the mistake-finding more reality-based.

In this story, the suspected data structure was a doubly linked ordered list, with cached "recent" pointer to (attempt to) speed up insertion. (Remember this was 5 years before the SGI STL release...) That's kind of complex, and the performance aspects were entirely speculative. Since a test harness would have been on the "wrong" side of the argument (that is, the arguments that it "wasn't the data structure" would have been wielded against the tests too; yes, we know better today) the alternative was a testing-only implementation of a simple vector based on realloc and memcpy. Completely naive implementation (on the principle of making it easy to test): no buffer gap or indirection, every insert was a realloc and data move, every reference was a linear search.

As might be expected from the tone of the story, dropping in the naive implementation caused the bugs to vanish, and pretty effectively ended the argument... but it turned out to be a bigger hammer than expected: after finally fixing the bugs in the original data structure, some "large scale" data load tests could finally get run. They took... rather a long time. Swapping in the naive implementation solved that problem - graphing the two showed an apparent difference in order! Empirically, the memcpy based implementation took time linear in the quantity of input data, while the list was at least quadratic. This was probably due to the underlying realloc doing bucketing for us, and having most of its overhead be per-object rather than per-byte, so the single block technique won out over many smaller allocations.

At the time, the idea that "this application is all about manipulating this set of data in this way - therefore that needs something sophisticated enough to perform well" made sense. Perhaps we can be forgiven for this - after all, the popularization (by Knuth) of Hoare's dictum about premature optimization also dates to 1989. But clearly the smarter approach would have been to take our knowledge about the structure of the problem and conclude that "therefore the data performance should be measured more closely."

This is a story I've told a number of times in person, and has held its relevance for over 15 years.

Footnotes:

http://www.research.att.com/~bs/bs_faq.html#invention
http://www.mockobjects.com/
http://en.wikipedia.org/wiki/Standard_Template_Library
http://www.sgi.com/tech/stl/
Knuth, "The Errors of TeX" http://portal.acm.org/citation.cfm?id=66415.66416
Actually, the newbie was off doing the filenames and preferences UI, in one giant class, and came up with good tests for it, so when he came back and integrated with the rest of the system, it basically just worked...

Thu Nov 9 01:22:00 2006

There Are No Mysteries

I believe (and argue) that there are no mysteries in computing. If "something strange" is happening - especially to a system you've built - it is always possible to identify the problem. To put it another way, claiming that a problem is mysterious is simply an admission of personal failure to be curious enough. (I distinguish that from "not having enough resources" because when people say something is mysterious, they don't mean that they're chosing not to look, they mean that looking is pointless - and, of course, they're wrong :-)

I got my start in computing with a book on 8085 Hex Machine Code, in an era when Radio Shack still carried chips and people still built things with them. When the TRS-80 came out, it was made of chips like that, and you could get easily comprehensible schematics. I learned to solder and use a voltmeter and an oscilliscope before I learned to code.

My first "real" job began with porting a device driver for a 24M hard drive from 8085 to Z80, on an upgraded IMSAI. Serial terminals, Diablo daisy wheel printing terminal, the front panel later made popular in War Games. We wired a lot of our own cables. One particular memory is of having wired a serial cable through the wall to the Diablo in the other room, trying to print from WordStar, and having the whole system hang.

Today, the response would probably be to pound on the keyboard a bit and reboot something. Then, it was more arcane, but vastly more effective:

look at the front panel. Notice the address lights are flickering only in the lowest bits. Conclude that it's spinning.
hit stop/step
read the data bits, see that it's a conditional jump instruction
hit step, see part of the jump target address
hit step, see the rest part of the jump target address
hit step again, see it take the jump
see an input-from-port that reads from the serial port status byte
see a mask operation
see the jump again.
realize (probably with a quick look at the data sheet for the UART) that it's looking at a different RS232 status pin than we expected it to be
store a different mask into the code
step some more, hear a single character print in the next room (daisy wheels are loud)
let it continue and actually print the rest of the file.

That was just a local patch - I suspect we fixed the cable after that. Note that this wasn't experienced old hands - this was two high school kids in a summer job - but we knew what was going on at a very low level, and thought this was entirely reasonable. (Note that we weren't only bit-banging hackers; we also wrote dBase apps, sold software and did telecom planning...)

One lesson you could take from this is "wow, glad that's someone else's problem this century." It is obviously useful to be able to sweep all of that under the rug of Abstraction - right up to the point where it fails to work... I think a more powerful lesson is that all of the interesting debugging is down a layer from where you're actually trying to work - abstractions are great when things work, and have to be the first thing out the window when they don't (this is also a reason that proprietary layers hurt, always.)

Another lesson is that we can find out what's going on, even in the face of obvious complexity. There are more abstractions (more layers and more breadth), but they can still be pried open. Thus, There Are No Mysteries.

(There are certainly cases where you've lost/discarded too much information to find out what happenned. And this is limited in scope to computing, though I personally believe it's much more broadly applicable - as an approach, a philosophy, a mental model - "mystery means you haven't bothered to look hard enough". I think it's worthwhile to take that attitude.)