Making an effort to keep rambling about the site isolated from the actually interesting bits...
The entire purpose of this rewrite (and fundamentally, of the website itself) is to give me a reasonably low-friction path for publishing that still lets me tweak the things I want to tweak. In the past two weeks, the snow has melted, the wildlife has started to wake up, and the only posts I've put out are hidden here in the meta-blog about the site itself. "That's not right..."
This Friday I was reminded that I ought to post to my gadget blog (I literally have a calendar reminder on my phone telling me to blog there, once a week...) especially since the most recent post was something I wrote in January as much as a tracer bullet for the software, as for actual information. Still, it did only appear on the net two weeks ago, so arguably I'm not that far behind; still, I have a significant backlog of gadgets to write about, and more arriving every day (the Verve USB sensor box showed up this morning) so if I don't start getting things written, I'm never going to catch up :-)
(Why do I even need to "catch up"? Isn't it many writers dream to have a vast field of things to write about, and to never have to fear writers block? Sure, but many of these gadget reports are a lot less interesting if they aren't timely - a KickStarter that's now real and maybe you can buy it directly, a new bit of tech that hasn't seen many other reviews - and while I'm not unwilling to write about useful everyday tools it doesn't make sense to let them get in the way of writing about the shiny new gadgets.)
It'll take a while for actually writing to be come more of a habit than tweaking the python code that generates the site so let's at least see if I can manage to do this two weeks in a row. At least this post only took about 10 minutes to get typed and pushed...
There was one "final" delay to make the new infrastructure support the
little "single serving" web sites I've collected over the years
(rather than moving the main
www pointer and then moving the rest of
them back to a different pointer, and there weren't that many files
involved anyway, nor (with one exception) much history beyond "created
site in 2008", so there wasn't much history to worry about.) After I
got that working, I flipped the switch in the morning of March 11, 2014.
Not yet officially spring, but a local high of 60F anyway which is
just as good :-)
Of course, I immediately found half a dozen failures, mostly by watching the access and error logs; my old nagaina monitoring system pointed out that I'd mishandled some links to old photo galleries, and I'm still sorting through some of that (mostly with the intent of constructing some new photo galleries) so I'm not as concerned with precise preservation there (much of it was acled off previously anyhow.)
Overall, nothing severe enough to even consider rolling back. Other than finally releasing months of new bloggery (like this entire section), things shouldn't look that different, I haven't added any decorative photography or even much color, but the structure is in place to try some new things. At very least, The Rants Will Flow!
Spring is now two weeks away (still gloomy, still below 30F, yard still covered in snow, no crocuses... but Spring is still around the corner.)
A few days back I actually hit "feature complete" in the sense that as far as I could tell everything I needed to do in order to Flip The Switch was implemented, and that while there is still a fair list of things that Would Be Interesting and Would Be Good Improvements, none of them truly block Being As Good As The Existing One combined with Actually Letting Five Months Of Backlog Out The Door... but I still had a lingering Fear Of Screwing It Up. Does this really all work? Will it look ok?
Well, it's never going to look good if I'm the only one working on the visual appearance, but at a glance it's at least cleaned up a bit. The real concern is having things that could be called Broken or otherwise not as I intended. What's the standard software engineering way of increasing confidence? Well, from the outside, you'd be forgiven for thinking "delay actually shipping anything" was the Best Practice :-) but what I'm getting at is Testing. Pick a few things that I'm worried about, and implement tests for them. (And because This Is Me, we're talking fully automated tests...)
The first step was dependency tracking. Not because I'm trying (yet) to do this as an incremental build - even on the crufty slow machines with buckets of spinning rust that I use as servers, a full rebuild only takes 15 seconds, a full build to an empty directory takes 20 - but because it let me figure out what things in the output directory were spurious leftovers from a previous build (and should trigger a clean build) and what things in the source tree were getting ignored (usually by not being properly attributed in the dependency graph, but it did expose some actual bugs.)
The second step was building the link graph - I did a
codes-well-with-others pass on the easily available ones,
linkchecker is nicely packaged
in Debian, under active development (yet already quite feature-rich)
but the default (fixed in the 9.0 release that went out this week)
was to fetch and check external links too, which is a good thing to
have in general, except that
linkchecker does have some nice features like the ability to report
output as a directly usable sitemap, which I will probably revisit
when 9.0 comes out.
It only took an hour to do a trivial walk from the top level
index.html of the output tree using lxml.html and record what paths
it saw, filtering out HTML "anchors", links that were offsite, and
normalize them all to in-tree pathnames. It took very little longer
to match that up against the output side of the dependency checker,
and then (by hand) to check some of the "missing" files in
google... leading me to conclude that a bunch of stuff is accessible
due to being included in RSS feeds, even though it's not actually
linked anywhere. Enough things were reachable to convince me that
the test worked and that the site was basically OK, and that more
significant linking is actually a content project, not a deployment
So these confidence-building steps have gone in, they've built confidence appropriately, and the only reason I haven't switched DNS over is that I hang out with enough operations people that I Accept As Truth that I shouldn't do this right before bedtime :-)
Flipping the switch tomorrow...
This week, I've gotten a plausible burndown list together, which is always as useful for the things not on it as for the things that are ("yes I know that part would be interesting to hack on and we know clever things about doing it, but it's not in the way of the deployment so back off of it for now.") One or two things may turn out to be dependencies (getting RSS generation right might require actually implementing the README.md parser so I have the classic material to start with, although it's also possible that I can treat those as legacy components for now just to get this out the door, since I have generic blogging working.)
Since this is a relatively small static site on a tiny network, the
constraints on web server choice aren't weighted the same, and in
particular "sane configuration" takes precedence over performance
(because none of them are that bad at just throwing files over the
wire) once the basic feature checklist is satisfied. This led me to
satisfy my decade of frustration with Apache configuration by tossing
it out the window and using nginx instead. The
configuration is still arcane, of course, it's just not syntactically
horrifying, and it does start with more modern assumptions about what
things should be easy to express. It's also nearly as pathologically
undebuggable as apache configuration is, and could really use a higher
level configuration language that generates the one it includes - or
at least a macro language - I could easily take the hundred lines of
config I have now and drop it down to fewer than ten descriptive
lines just using
cpp but it's not really worth doing that here and
I also need to start using a simple problem-tracker to remind me of things like that, but a few text files are working out well enough as well, and I really shouldn't let that get in the way of deployment :-)
A codes-well-with-others shoutout to moreutils; a pattern of roughly
chronic flock $command 2>&1 | ifne mail -s "failed" $me
makes a decent
post-update hook for triggering a deployment
command on a push to a particular repo.
While "ThokTober" sounded good when I started, we're well into February and Spring is only 40 days off. Not that that makes the project unsucessful per se - I understand the problem much more deeply which was certainly a goal, and there was never an external deadline, just a self-inflicted one.
The deepest bit of understanding was that attempting to shoehorn the
"weight of history" (that is, the legacy thok.org content) into an
existing system was misguided effort - the legacy content was some
2300 files, the new content (that could easily be adapted to whatever
system I chose) was only 45 files -
"And we decided that one big pile is better than two little piles, and rather than bring that one up we decided to throw ours down." -
and that the optimal deployment tool for the legacy content was simply
rsync. This reduced the complexity of the rest of the problem a
great deal, as now it was simply a matter of recognizing files that
needed markdown processing, and recognizing files of more interesting
type ("blog post", "blog", "photoessay"...) and doing something with
Once I got past the problem of stuffing the large pile into a third party blogging system, what about the small pile? All of the static-blog tools I looked at where a little too opinionated (which is great when starting from scratch) and the static-site-generator tools were not opinionated enough. I finally concluded that in order to figure out what features I actually wanted, I'd need to sit down and implement them from scratch, and discover what aspects of "blog" features were artifacts of "what is easy to write" and what things actually mattered. (This also, admittedly, let me side-step the conclusion that ikiwiki, not nikola, was closest in behaviour to what I was looking for, even though it was a pile of perl with character set issues...)
Having concluded that the big opinion was that "text should be in markdown" pulling little things together was a lot simpler, and I could make visible progress a feature at a time. Other useful conclusions include:
rsyncare entirely sufficient for 0.1
Tagsmetadata that gets pulled out to a single
tag-index.htmldemonstrates the concept and is a good enough implementation of article keywords; adding other metadata like "twitter summary" can follow in time, but I don't need to start with it.
As a nod to codes-well-with-others, I've built most of the recent pieces atop two new third party libraries:
lxml.htmlis quite nice for mucking about with generated html; since HTML is horribly fragile but no tools report legitimate diagnostics for it, sticking to carefully constructed operations on the element tree seems like the only sane way to perform operations like "add stylesheet" or "promote the first
head titleelement" in arbitrary contexts.
All that said, it really does look like I could do a first deployment this week, though perhaps the start of Spring is ultimately more realistic...
The "ThokTober" effort is showing glorious levels of scope creep and schedule slip. This post is still only going to the "pre-production" version of the site, so it doesn't count towards the live publication milestone, but in the mean time I've
foo~and their datestamps, that counts as at just about the least plausible amount of history to attempt to preserve) and the individually versioned files (ad-hoc
RCSuse on files that were served up directly, rather than through
cgiweb) and fed them all through
cvs2gitproducing an epic "blob" that serves as the starting point for the new
git rebaseto glue that 18-year "blob" on to the short-term
ikiwikiprototype such that they look like a continuous stream of "history"...
All in all I've made a lot of progress, it's just that the direction of that progress hasn't been towards the original goal of "publishing my writing again." Still, much was learned, and this post should contribute to confirming that the machinery still works after the above bits of git churn...
THOK.ORG finally ground to a halt - the last post here that wasn't
either part of the clawback of my blogspot blog or meta-bloggery about
blog tool making was in 2010 (a rant that coined the term
career_limiting_memcpy which sadly failed to catch on.) It's not
that I haven't had time to write, or things to write (I churned out two
other tumblr blogs in between, for a while, solely because I could
write in markdown and post from my phone both of which were far
less friction than the state I'd gotten stuck in with my own code.)
All of this finally came to a head in October 2013, which I christened
"ThokTober" with the intent of spending a month picking up some
existing blogging tool and running with it. As integration projects
are wont to do, this dragged on to early December, at which point I
ikiwiki (and I do mean "settled" - I really though I'd
end up with
nikola what with python vs. perl, vastly more plugins
and features, and bigger dev community... and I may yet go back to it,
ikiwiki ended up with far less friction in terms of actually
getting sites up and "good enough".)
There will be some loose ends for a while (part of the transition involves semi-automatically converting my ad-hoc markup to proper Markdown, and I still don't really know what I'm going to do with the existing RSS) but if I get one new non-meta post up before Christmas I'll finally declare victory :)