Amazon S3 counts as a gadget, right? :-) I've been using it professionally for a while, and of course many of the services we take for granted until
us-east-1 goes down use it too. Turns out that you can hook it in to a homebrew website without very much work...
The other day I traced a period of terrible performance (8s network latency getting out of the house) to a visit from
Googlebot-Video/1.0 fetching an old
AVI file from a post-hoc image stabilization project (now made mostly redundant by youtube's builtin stabilization feature.) The file was about 50M, and anyone interested in the project really wants the less-compressed original, shoving it to youtube really doesn't help... but it turns out that that's tiny by Amazon S3 standards, and the free tier covers it just fine.
There were a surprisingly small set of steps; I'm posting them here with the actual domains involved, since they're visible and public anyway, you just need to convert them to your own needs...
avi.thok.orgalthough something more generic like
s3.thok.orgwould have been a common choice. Do this first, because the bucket namespace is global and isn't checked against DNS registration at all, so there's a very faint chance someone already has a bucket of that name; at this stage, if you find a collision you can just pick a different name, like
git clonethe github version and run it from the checkout - the one in ubuntu doesn't actually handle puts with redirects.)
s3cmd --configureand get the Access and Secret keys from the console under "security"; don't bother to configure encryption or https because these are files that are already available by http, you don't want to deal with certificates, and you'll check the md5sums later.
s3cmd put --no-encrypt kicx1440.avi s3://avi.thok.org/me/publish/europython/day2/kicx1440.aviworks just fine, without having to do anything about
s3cmd setacl --acl-public s3://avi.thok.org/me/publish/europython/day2/kicx1440.avimakes that single file public. At this point, there's a long convoluted url that will fetch this file, and you could stop here and just change the html that points to it, but let's handle this cleanly...
avi IN CNAME s3.amazonaws.com.Carlton Bale gets credit for having the first google hit that actually said this would work. Once you've pushed this through,
curl -L -v -I http://avi.thok.org/me/publish/europython/day2/kicx1440.aviworks - note carefully, the
-Igets curl to do a
-Hwas already taken?) so you get back headers, not 100m of video. You should see the
Locationheader taking you over to S3, and then a convincing
ETag(md5sum of the file, in this particular case) and
RewriteRule ^/(me/.*\.avi)$ http://avi.thok.org/$1 [R,L]To pick this apart:
RewriteRuleis the apache swiss-army-knife of URL mangling.
$for end) and grabs everything after the leading slash (thus the slash is outside the grouping parentheses.) Within this part of the path, it has to start with
me/and end with
.avibut can have anything at all in between; if we wanted literally all AVI files, we'd drop the
me/part, but I have some small ones elsewhere on the site that I didn't want to bother hunting down and uploading.
avi.thok.orgto point to the
CNAMEwe set up above,
$1is the first set of parentheses in the match (so,
Rsays to make it a redirect (and because our result starts with
httpit automatically becomes an "external" redirect, in this case a 302, ie. "don't try to fetch this url, just tell the client to go away and find it themselves." You can't get theyah from heah, but you can get there from over there... the
Lis for "last" and just says to stop trying and don't do any more rewriting on this particular result.
/etc/init.d/apache2 reloador however your system spells that. At this point, you can
curl -L -v -I http://www.thok.org/me/publish/europython/day2/kicx1440.avi(note that we're actually starting with the primary domain here, where the original problem started) and follow our
HTTP/1.1 302 Foundand then amazon's
HTTP/1.1 307 Temporary Redirectand the bandwidth problem (remember the bandwidth problem? This song's about a bandwidth problem) is now gone.
[R=307]and make the first hop a Temporary Redirect as well. Not sure if that's correct, yet, but given that this all started with a search engine bot that wasn't aware of the human-readable "slow (home)" and "fast (MIT)" alternate links, it's worth looking into.
thok.orgwere more of a CMS, automatically noticing avi files and pushing them to amazon would be a good transparent trick. For a total of five files on a home website? Not actually worth the trouble, even if the logs say I have at least a month before the bot comes around again :-)