Wednesday, June 27, 2007

Star Ship Dave

Occasionally movies, TV shows, and ads are filmed around where I work in NYC. I've never seen a "star", but I often see the food trailers and office trailers and once or twice I've seen locations with cameras and lights—no action yet, though.

For a few weeks last fall, an extras tent was set up on the street where I work. The costumes on the extras were really cool: modern, colorful versions of '30s suits and dresses.

Today I saw a catering setup. Nothing odd about that. What was odd was the sign hand-written on a whiteboard attached to the food truck: "Welcome, Star Ship Dave". I can't help chuckling every time I think of that title. Whether or not it's a working title, it's got me wondering. Who is Dave? Is it the ship?

Perhaps it's a Dave we know already: David Hasselhoff, Dave Chapelle, David Letterman, Dave Barry, or David Bowie. (Nah, not Bowie—some Davids are always "David", never "Dave".)

Perhaps I'll brave the 90-degree heat today and venture outside in a while. I want to see what the costumes look like.

Saturday, June 23, 2007

Erlang Boids Simulation Design

As one of my Erlang programming exercises, I decided to write the non-graphical part of a Boids simulation. I've written the same thing in a few different languages (Java, Ruby, Lisp, C++) before. Using Erlang would be interesting because of its support for message-based parallelism and concurrency.

I figured at first that each bird should run in its own process. In each of my other implementations there is also a flock object. Not necessarily a big-O Object, but at least some central place where the birds are collected that can answer questions about them such as the average velocity of the flock and its center of mass. Even with a multi-process Erlang implementation, it makes sense to have a flock process. Flocking behavior rules depend upon each bird's knowledge of the position and velocity of all other birds in the flock (more realistically, all other birds it can see). There are two choices: make each bird know about all other birds or have a flock object that can answer questions about all birds. Making each boid (bird) aware of all the others seems to be a bad design decision: it requires O(n2) communications overhead for every calculation. Each boid has to ask all the others its position and velocity all the time.

With a flock object, the same communications are O(n) because the flock can ask each boid for its position and velocity, and then respond with the aggregate information (average positions, average velocities) when asked.The number of calculations is still O(n2) because each boid wants to know the average position and velocity excluding itself. It's the number of messages and the number of connections between processes that is O(n). To reduce the number of calculations to O(n), you could ignore the "excluding itself" qualification.

My first Erlang implementation resulted in a nasty deadlock. The flock periodically messaged each boid, telling it to move. When the boid received the move message, it updated its position and velocity. In order to do that, it had to message the flock, asking for information about the other boids. When that happened, the program hung because the flock not only wasn't listening for messages, it was still waiting for a response from that boid. Deadlock!

The next version had the flock asking each boid for its location information first, and then telling each boid to move, pasing in the information it needed. This worked, but it was synchronous and definitely not elegant.

Finally, I hit upon a better design: The flock would request position information from each boid periodically, and separately each boid would peroidically tell itself to move, asking the flock for the other boid's information when it needed it. There is a chance that a boid could have moved while other boids were relying upon it's old position and velocity cached by the flock, but that's not really important. In real life birds using old, slightly out-of-date information all the time: they blink, or are distracted by a bug, or are subject to the speed of light's limitations on information gathering.

Firefox: Of Thee I Sing, Thunderbird: Of Thee I Hum

or: Don't Fence Me In
or: Freedom, by George Michaels

I've tried the new Safari Beta 3. It's really fast and refreshingly standards-compliant. I don't care for the brushed aluminum look, though supposedly that's going away.

There's no way I'll switch from Firefox, though. There are two main reasons: I rely on many useful Firefox extensions, and Firefox is still more cross-platform than Safari. Firefox also supports more search engines in its search text field. To be fair and balanced, I must point out that Firefox 2.X crashes or hangs on Mac OS X all the time. I put up with that because of all the other benefits.

That first reason is the biggest though. Firefox is more hackable, which makes it more useful. Here is the list of Firefox extensions I use. The Firefox Add-ons page is also a great resource.

"It's All Text!" lets you edit any textarea's text using your favorite editor. Mine's set to use Emacs, of course. You can set the editor in the preference dialog box, but it might be easier for you to edit the Firefox's user prefs Javascript file and add

user_pref("extensions.itsalltext.editor",
          "/Applications/Emacs.app/Contents/MacOS/Emacs");

I should probably use emacsclient instead so the file is open in the currently running Emacs instead of in a new instance.

Switching to another browser like Safari would force me to give up too many extensions that I have come to rely on. Google Browser Synch is probably the most useful to me, with Firebug and LiveHttpHeaders coming in a close second and third.

Safari does have some features that Firefox doesn't, of course. Sharing your bookmarks with those around you via Bonjour and integration with Apple's AddressBook are the two I can think of. I don't want either. How often have I wanted to share all my bookmarks with others or see all of theirs? Never. Hell, I password-protect my Firefox so that if other people use it they won't see the form field values that Firefox saves for me.

There are some little things that bug me about Safari, too. For example, the keyboard shortcut to open/close the bookmarks sidebar is Option-Command-B and the command to switch to the Google search text field is Option-Command-F. I hate the Option-Command combo; it forces me to switch my whole left hand position away from the home row. In Firefox it's Command-B and Command-K for Mac OS X, Control-B and Control-K for Windows and Linux.

For similar reasons, I use Thunderbird instead of Apple's Mail. Not because of the extensions (I only use one: Enigmail), but because it is cross-platform, uses a standard format for email message storage, and uses an open format for its address book. I'm using Mac OS X now, but it's possible that I might want to switch to Linux (Ubuntu is quite nice) or be forced to switch to Windows at some point. I can use Thunderbird on those platforms, too.

When I must open Word documents, Excel spreadsheets, and PowerPoint presentations I use NeoOffice. On Windows, I use OpenOffice.org.

It's most important to me that my data is portable not only between operating systems but also between computers and applications. I refuse to get locked down into any proprietary format or have my data locked down on only one computer. I'm letting Google store my (encrypted) browser information, but I have my own copies of the data on more than one computer, inside Firefox itself.

Saturday, June 9, 2007

Feeling Left Behind

A few months ago, a study surfaced on Reddit that stated that the ability to digest milk in human adults is a relatively recent adaptation. It further posited that this ability was convincing proof of the theory of evolution: humans have evolved to be able to digest lactose later in life.

Years and years ago, I remember reading one of those fun kids' science articles that talked about genetic traits like the ability to roll your tongue or wiggle your ears. The one that stuck with me was that some people have a small, solitary hair between the last two knuckles of the ring fingers on either hand. I always imagined that having that hair proved that you were a genetic throwback.

I have that hair. I'm also slightly lactose intolerant. I feel like a caveman. Maybe I could get a job acting in a Geiko commercial.

At least I have some compensations: I can roll my tongue and wiggle my ears. So there, all you evolved monkeys!

Tuesday, June 5, 2007

Speedier Erlang

In the comments to Erlang Fractal Benchmark, Ulf Wiger noted in the comments that adding is_float guard clauses speeds up the code.

When I asked why on the Erlang email list, Ulf and others explained: when the compiler sees is_float(X) it knows that X must be a float. Instead of worrying about type checks, casting, and other inefficiencies, it can optimize the code that uses X.

Monday, June 4, 2007

Erlang Fractal Benchmark

While looking at a simple fractal benchmark that showed up on the programming Reddit, I noticed that there wasn't an Erlang version. Below is one I wrote last night. Erlang fares rather well. One thing mildly surprised me: it runs slightly faster in an Erlang shell within Emacs than in both Apple's Terminal and iTerm on Mac OS X. Within Emacs it runs in 1.09000 (runtime) 1.14100 (wall clock) seconds. In both Terminal and iTerm it runs in around 1.11000 (runtime) 1.16600 (wall clock) seconds. Perhaps screen I/O isn't as fast in the terminal programs.

Two caveats: first, these numbers were generated on my 2.33 GHz Intel MacBook Pro; I don't know what the original benchmarks used. Also, I only ran the code a handful of times and picked a "typical" time to report. A better test would have been to run the code hundreds or thousands of times and average the values.

This post also says a bit about intuition vs. measuring. I discuss some code modifications and their expected and actual effects below.

Another thing to note: the author of the fractal benchmark page says that he hasn't bothered to optimize the code for each language he tested. I don't know if using lists:map/2 or extracting iter_value/5 and using guard clauses would disqualify this version in his opinion.

-module(fractal_benchmark).
-author("Jim Menard, jimm@io.com").
-export([run/0]).

-define(BAILOUT, 16).
-define(MAX_ITERATIONS, 1000).

%% Idea from http://www.timestretch.com/FractalBenchmark.html

run() ->
    io:format("Rendering~n"),
    statistics(runtime),
    statistics(wall_clock),
    lists:map(fun(Y) ->
                      io:format("~n"),
                      lists:map(fun(X) -> plot(X, Y) end, lists:seq(-39, 39))
              end,
              lists:seq(-39, 39)),
    io:format("~n"),
    {_, Time1} = statistics(runtime),
    {_, Time2} = statistics(wall_clock),
    Sec1 = Time1 / 1000.0,
    Sec2 = Time2 / 1000.0,
    io:format("Erlang Elapsed ~p (runtime) ~p (wall clock) seconds~n",
              [Sec1, Sec2]).

plot(X, Y) ->
    case iterate(X/40.0, Y/40.0) of
        0 ->
            io:format("*");
        _ ->
            io:format(" ")
    end.

iterate(X, Y) ->
    CR = Y - 0.5,
    CI = X,
    iter_value(CR, CI, 0.0, 0.0, 0).

iter_value(_, _, _, _, I) when I > ?MAX_ITERATIONS ->
    0;
iter_value(_, _, ZI, ZR, I) when ZI * ZI + ZR * ZR > ?BAILOUT ->
    I;
iter_value(CR, CI, ZI, ZR, I) ->
    Temp = ZR * ZI,
    ZR2 = ZR * ZR,
    ZI2 = ZI * ZI,
    ZRnew = ZR2 - ZI2 + CR,
    ZInew = Temp + Temp + CI,
    iter_value(CR, CI, ZInew, ZRnew, I + 1).

You might have noticed that ZI * ZI and ZR * ZR are calculated twice: once in the body of the last clause and once in the second guard clause. The guard clause has to be executed every time, which means that running the last, most frequently executed clause executes the multiplications twice. I tried pre-calculating those values and adding them as parameters, so the arg list is

iter_value(_CR, _CI, _ZI, _ZI2, _ZR, _ZR2, I)

Did it help? It did indeed. Execution time in Emacs went down to 0.890000 (runtime) 0.919000 (wall clock) seconds. Here are the modified versions of iterate and iter_value:

iterate(X, Y) ->
    CR = Y - 0.5,
    CI = X,
    iter_value(CR, CI, 0.0, 0.0, 0.0, 0.0, 0).

iter_value(_, _, _, _, _, _, I) when I > ?MAX_ITERATIONS ->
    0;
iter_value(_, _, _, ZI2, _, ZR2, I) when ZI2 + ZR2 > ?BAILOUT ->
    I;
iter_value(CR, CI, ZI, ZI2, ZR, ZR2, I) ->
    Temp = ZR * ZI,
    ZRnew = ZR2 - ZI2 + CR,
    ZInew = Temp + Temp + CI,
    iter_value(CR, CI, ZInew, ZInew * ZInew, ZRnew, ZRnew * ZRnew, I + 1).

One final thing I tried was commenting out the calls to io:format. In my experience, screen I/O usually slows things down quite a bit. (In the case statement in plot/2, I had to replace them with void statements instead of simply commenting them out.) The result: execution time went down to 0.830000 (runtime) 0.839000 (wall clock) seconds within Emacs. In iTerm, execution time was only slightly slower than that. So the time decreased, but not nearly as much as it did when I removed the extra multiplications. I'm surprised. Is multiplication that expensive in Erlang, or is I/O well optimized, or was my instinct wrong? Come to think of it, io:format is only called once per coordinate; the multiplications happen thousands of times for each.

A distributed version of this algorithm is certainly possible (say, one process for every X,Y coordinate). My gut tells me that it would run slower because the calculation for an individual coordinate is relatively small and the message passing overhead and gathering and coordination of the results would outweigh the benefits. My intuition was wrong about the effects of I/O, though. I'd have to try it to make sure.

Additions: In the comments, Ulf Wiger suggested that I add is_float guards for all the function parameters that are floats. Doing this to iter_value/7 reduced execution time by almost half to 0.450000 (runtime) 0.455000 (wall clock) seconds, a huge savings. (Does anybody know why adding these guard clauses speeds up execution? I would imaging that extra checking would slow it down.) Here's the new code for iter_value/7:

iter_value(_CR, _, _ZI, _ZI2, _ZR, _ZR2, I)
  when I > ?MAX_ITERATIONS,
  is_float(_CR), is_float(_ZI), is_float(_ZI2), is_float(_ZR), is_float(_ZR2) ->
    0;
iter_value(_CR, _, _ZI, _ZI2, _ZR, _ZR2, I)
  when _ZI2 + _ZR2 > ?BAILOUT,
  is_float(_CR), is_float(_ZI), is_float(_ZI2), is_float(_ZR), is_float(_ZR2) ->
    I;
iter_value(CR, CI, ZI, ZI2, ZR, ZR2, I)
  when
  is_float(CR), is_float(ZI), is_float(ZI2), is_float(ZR), is_float(ZR2) ->
    Temp = ZR * ZI,
    ZRnew = ZR2 - ZI2 + CR,
    ZInew = Temp + Temp + CI,
    iter_value(CR, CI, ZInew, ZInew * ZInew, ZRnew, ZRnew * ZRnew, I + 1).

Ulf and others also suggested that I avoid printing each character separately. Doing so did not seem to change execution time at all. Here's what I did:

% changed the inner map call to plot
run() ->
    % ...
    Seq = lists:seq(-39, 39),
    lists:map(fun(Y) ->
                      CharList = lists:map(fun(X) -> plot(X, Y) end, Seq),
                      io:format("~s~n", [CharList])
              end,
              Seq),
    % ...

% changed plot to return single-character strings
plot(X, Y) ->
    case iterate(X/40.0, Y/40.0) of
        0 ->
            "*";
        _ ->
            " "
    end.

I also tried commenting out the io:format/2 call above. Execution time went up about 1/100 of a second.