README: 2008 Apr 06 - 2008 May 17

Description

I got lucky with some pictures of a hawk in my front yard http://www.flickr.com/photos/eichin/2390767918/ and captured 30 seconds of shaky handheld 640x480x15fps video as well. The video is actually pretty good in the stable bits, and the event itself was interesting, so I thought I might try some "manual" image stabilization...

Sat May 17 20:51:00 2008

After watching a few youtube videos, and realizing just how much video is uploaded off of handheld cameras that could have been shot with a tripod (the subject isn't actually moving), I'm starting to wonder if there's interest in a version of the stabilization code packaged up for end users. It would have to be no harder to use than uploading to youtube is now; perhaps an alternate upload site, or a desktop app (both of which have challenges.)

The first step is to find out if there are any users for it. I'll ask friends, but I probably won't go as far as opening up comments on here; that'll just add another excuse to delay :-)

The second step is to bundle it all up - find a way to grab mjpeg frames directly instead of splitting into files, and writing output frames the same way. (As with the current code, the panorama feature comes along for free...) It looks like linking against mplayer might be difficult - it links against 86 shared libraries! That suggests continuing to run it as a separate process (which then makes a cross platform version harder...)

Tue Apr 15 02:42:00 2008

Some further tweaking and a report_deltas.py tool showed that at least some of the bad automatic matches were simply due to the search space not being large enough; even with the auto bumping up when a movement neared the edge of the space, there were larger jumps in the video. Actually using a +/-40 pixel window takes over two minutes per frame, which would mean a 24 hour run for this particular video - yet many of the deltas are small (there are long runs where the y-axis drift is less than 2 pixels in either direction, apparently I can pan sideways quite smoothly with help from the camera's built in image stabilization :-)

I could probably come up with a better range of first guesses for the search, but I'd need some indication of when an image is "really" bright enough. For the simple how_much_white metric, successful matches score anywhere from 2500 to 8500 - and the first bad one is... below 600. Huh. The next bad one is around 2k, then 1300... actually, it looks like I do have a good metric, at least on the down side; I'll need to do another run to see how the chosen points compare to the "next best" ones to see if it works a a real discriminator, though.

Mon Apr 14 22:51:00 2008

An overnight run of the automatic processor got 500+ frames, but with a couple of notable glitches. In order to nail them down, I added labels to the viewer (it really should be a widget somewhere, but wx.lib.ogl.OpDraw(wx.lib.ogl.DRAWOP_DRAW_TEXT).Do() was easier.) I also added manual left and right panning, now that the image is more than 1600 pixels wide; I should make that automatic panning and not waste the keystrokes :-)

A dozen manual tweaks on the 500+ frames got me a quite nice 5794x839 image, which is about 2/3 of the full height (I filmed in a reversed S-curve, so this is the pan right, down, and left; the remaining images pan right again.) However, before I process the rest, I need to evaluate the automatic decision versus the human one for those pictures and find a workaround...

Sun Apr 13 16:35:00 2008

It recently clicked that the "base" image that cook_movie.py produces is simply a non-blended panorama. This led me to take a slow-pan video of a chunk of Great Meadows yesterday, about 750 frames in three slices; 10 minutes with full_movie.py let me turn the first 120 frames into a simple 1667 x 531 wide-angle picture (which isn't actually better than what I could take with the camera directly, though it covers a wider angle than I could otherwise get from that location.)

750 frames is about the limit of the "load all images into memory" approach, it ends up needing almost 2G... and I still have accuracy issues. The next steps are to try edge detection on the images (PIL, it turns out, includes ImageFilter.Kernel with BuiltinFilter.FIND_EDGES which looks useful; while edge detection should help me doing it by eye, I also want to try coming up with an "overlap score" to do simple machine adjustment.

(Tried feeding all 750 frames to hugin; it crashed hard, so there's more than just NIH involved :-)

Bug fixes:

Footnotes:

Mon Apr 7 00:32:00 2008

A little more crunching produced "anchor" mode, if you hit * the "before" picture is always the first frame (a more clever thing would be to make it the current frame - if I ever need that.) Another twenty minute tweaking pass produced a bunch of new offsets; 45 seconds of cooking, 10 seconds in mencoder, and I'm done for the night :-)

time python cook_movie.py anchortmp newanchortmp

mencoder "mf://newanchortmp/0*.jpg" -mf fps=15 -o anchoroutput.avi -ovc lavc -lavcopts vcodec=mjpeg

Sun Apr 6 23:47:00 2008

Now that I'm reasonably happy with the frame-to-frame alignment (though looking at the full movie I notice a fair bit of vertical drift, and I should find a way to compensate for that... perhaps a mode where I keep a key frame around and sync to that, instead of pairwise...) it's time to cook them back into a movie. This simply means coming up with a bounding box (looks like 789x896 for the 640x480 input images) and pasting in each image.

I had a theory that since the frames were all properly registered, I could just paste them all together and use the overlap as a surround frame. Great in theory, but it turned out to show that I actually had a fair amount of vertical drift in my corrections, which meant that the background wasn't very good.

A weaker version - starting with a black frame, but not clearing previous ones - actually looks pretty good in spite of the drift, so I'm going to go with that one for now (so I have a result while I work on more ways to tweak the inputs :-) cook_movie.py does the weaker version, and has the full-background version in comments.

time python cook_movie.py tmp newtmp

mencoder "mf://newtmp/0*.jpg" -mf fps=15 -o output.avi -ovc lavc -lavcopts vcodec=mjpeg

If you want to look at them:

Sun Apr 6 23:30:00 2008

After completing all 491 frames, I noticed a missed jump around frame 45. Since it makes sense for corrections to "crack the whip" as it were, I added a "danger mode" - hit "^" and the screen goes red, and all changes apply to the current pair and every one forward of it. full_movie.py is the interactive tool.

Of course, since that's the effect that's implied by the carry-forward of corrections when they're still uninitialized, maybe it's not really a separate mode but should work that way all the time. That can wait until I try this on the next bit of shaky video.

While making the corrections, I had an odd experience - they were initially tedious (gross changes are obvious, getting the last few pixels is hard because the detail points are already overlapping) but after about 50 of them, suddenly felt much smoother - just lean on the arrows until the image "clicks." Surprisingly satisfying, too; clearly the visual part of my brain "got it" and took over the hard work :-) Regardless, there'd probably be value in adding a FatBits mode, to magnify the image around the chosen alignment target; alternatively, doing some consistent sort of edge detection on the whole image might help, lining up fine lines should be easier too.

Footnotes:

Sun Apr 6 19:17:00 2008

Today's milestones:

I've completed 130/491 frames. Next steps will be

Sun Apr 6 03:55:00 2008

Step 2, figure out out to "visually" line them up.

After 3 hours of "quality time" with demo.py from wx2.8-examples, and a mix of help and lies from the web, I now have an crude overlay.py that displays a pair of images on top of each other, with alpha blending, and lets me move them with the arrow keys (mice are useless for fine positioning anyhow so since I want single pixel tuning I'll just start there.) So, the arrow keys move the second image, space bar flips which one gets drawn on top, q quits and prints out the resulting pixel offset:

python overlay.py tmp/00000001.jpg tmp/00000005.jpg

[(0, 0), (3, 22)]

python overlay.py tmp/00000001.jpg tmp/00000305.jpg

[(0, 0), (66, -79)]

It even moves them smoothly. Next step: sleep, and then record deltas with each image, so I can walk through all 492 frames pairwise and tweak them as I go; after that, building a replacement set of non-shaky jpgs with wide borders, and making a movie out of it, will complete the project...

Sun Apr 6 00:52:00 2008

Step one, convert the jittery AVI into a pile of jpegs.

mplayer -vo jpeg:outdir=tmp p4051180.avi

Note that xzgv and keyboard autorepeat are actually an OK video player, at this tech level :-)