rants: Sat Nov 8 17:30:00 2014

Sat Nov 8 17:30:00 2014

Language-based Package Managers are Wrong

I have some obvious biases (having started as a Debian Developer back when that meant "you've done some interesting work so Bruce added you to the mailing list"1) but I'd like to make it clear up front that fundamentally, packaging is a good thing - there really is a very important abstraction of "This named piece of software is composed of these files with these properties" with patterns of interaction like "how to upgrade", "how to remove", "how to get automatically started at boot time" and beyond.

Unix got there early - tarballs were never enough but Solaris (among others) had "System V packages", the early days of linux included a distribution called BoGuS that didn't include the software at all - just a snippet of text that had a download command, build command, and any renaming necessary to "fix" the output of make install - baby steps, but you probably recognize the immediate descendant of BoGuS, called RPM...

When Java came along, it was clumsily cross-platform - "write once run everywhere" in the marketing, "write once run screaming" to developers - the JVM served as a lowest-common-denominator interface to the operating system, didn't really fit Windows or Unix very well (what were you thinking, Sun? Couldn't even do unix domain sockets until a decade later?) which meant that a number of things that were traditionally part of the operating-system-as-platform got reinvented (usually at lower fidelity) on the inside of the JVM-as-platform. Of course, there was a lot of torque (and money) behind this, and "jar" files are just zip files with a little metadata, the namespace was already there, and entire infrastructures got built around deploying those jar files and connecting them up with things. (None of this improved the interaction with the operating system underneath, to this day manually tuning JVM memory usage is still a thing, and the "bootable jvm" model fell by the wayside.)

On an entirely different path, one of the secondary pillars of perl's success was CPAN - because it provided search and taxonomy and hierarchy, so you that you could find code without too much work, and you were guided somewhat naturally into publishing code in namespaces where people would discover it (search by itself is not enough, you still need to be able to guess that the solution you're looking for actually exists and enough about what shape it is to come up with a useful query.) Having all of this in one place made it easier to propagate consistent practices for building and installing your code (the namespace alone helped a lot, Makefile.PL helped the automation along.) There wasn't a lot of pressure for deep integration here, most CPAN consumers were solving problems, not building systems, so being able to grab a particular package and drop it in place was enough. Debian picked up some automation around turning CPAN metadata into Debian metadata fairly early - in the form of tools that produced a "first draft" package layout that you could refine into something that was good enough, because inherently, CPAN wasn't going to have information on a number of distribution-specific concerns, that was information supplied by the installing sysadmin - or in the case of a decent packaging job, supplied by Debian itself, often in the form of distribution-wide tools used by packages of all forms (uid management, documentation management, cron and inetd management, etc.)

Note that at this point, packaging was a feature of technical-quality operating systems2... Solaris, Debian, Redhat... while desktop operating systems like Windows3 and Mac OS4 entirely left it out, instead having entirely third-party "installer" models with poorly managed central "registry" metadata with no real standards for installation-level cooperation among tools (see DLL Hell as one of the results of this approach.) Eventually Visual Studio started including installer-builder tools which improved the overall story but didn't help with combining separate software installs that worked together. Add to this the desire in the various language communities to include beginners on Windows and the path towards "doing it yourself" becomes quite justifiable - which led to Ruby "gems" and subsequently to Python "eggs". While "eggs" helped enable dropping a copy of a Python module into each of the various Python installations a Windows box would end up with, it interfered with packaging on Debian for quite a while.

Even if the existing language managers improved enough to actually be good at the job they've set out to do (and I know good people putting serious work towards this goal) they would still be "wrong" - because the idea that an advanced software project would be entirely served by a single language has continued to fall over in the real world. Even the vaunted Lisp Machine had as a selling point5 that it could do cross-language debugging... defense projects that are locked in to Ada95 still bring in tcl or python for tool-wrangling... even the 3D printing world has data formats that are complex enough that they themselves can reasonably be considered languages.

While it is both entertaining and educational to recast your entire toolchain in one language (I've done big chunks of this in perl and later python (as have many others), I suspect joeyh is already halfway there in Haskell) when you consider that any programming language is expressive in a specific way, if your problem is at all challenging there may be parts of it that are better expressed in other languages. The products I've worked on over the last decade went from perl/shell/C++ to Python/C++ to Java/Python/C++ to Java/R/Python with a little Scala - in an environment with a lot of pressure to reduce complexity and to especially avoid engineer retraining or replacement. I suspect that most projects that survive in a single language simply aren't done yet, or don't have the kind of delivery or performance pressure that leads you to recognize the presence of domains of expressiveness in which an alternative language can be vastly more effective. Given that a single language really can't solve your entire interesting problem, how could a single-language packaging system ever be enough to deploy it?

  1. While this is expressed as a disclosure of bias, it's also a declaration that I really have been part of the problem for That Long, and a slight justification of the lack of cited references - this is a blog, not a paper, and this is far more about uncorking some accumulated experience to make a point than it is about proving one. (That said, I was around for all of this...) 

  2. Not mentioning the BSDs because they spent this period trailing about 5 years behind Debian in the packaging space; as such they didn't really influence the trajectory of history in this area. 

  3. Only in 2014 is Windows finally getting "modern" packaging systems, with OneGet and Chocolatey NuGet being integrated into PowerShell. 

  4. Apple basically followed a slightly more sophisticated "installer" path, keeping around manifests that were reminiscent of Solaris packages, using disk images (DMGs) instead of zip files or tarballs, and providing a central tool to manage permissions and locations; there was a flurry of third party installers as well, for a number of years - if they hadn't died off naturally, the up and coming App Store model would have killed them for good. 

  5. Literally a selling point - there were sites that bought Lisp Machines specifically to run C code on them, because the Lisp Machine C environment was so strict about otherwise undefined behaviour that getting it to compile at all resulted in fixing bugs you didn't (yet) know you had, far beyond what lint could do at the time, and easily as effective as valgrind is today.