Jan 7, 2012

Across The Years 72 hour: 150 miles

Day 1.. still fresh. Photo by Ray K
Going to keep this report short and sweet.

By the numbers:


Approximate miles per 24 hour split:
* Thursday (9am-9am): 80 miles
* Friday: 30 miles
* Saturday: 40 miles

Approx. time on the course per 24 hour split:
* Thursday: 23 hours
* Friday: 18 hours
* Saturday: 20 hours

Things were pretty good for the first 100K, which I achieved in about 16 hours. Afterwards, I slowed down dramatically. In particular, miles 65-90 were among the slowest I've ever run. 90-110 went OK, and the last 40 in 24 hours amounted to a sleep-deprived deathmarch where I was unable to move faster than about 16 minutes per mile at best, and about 21 average. I finally reached 150 miles with 90 minutes to spare. I went to sleep and didn't wake up until after the race had ended.

It became obvious on day 2 that I wasn't going to reach my original goal of 200 miles. This is due to incredible amount of time it took me to get from 65 to 90 miles. I revised my goal to 150 miles, which seemed easily doable at the time but even that took far more effort than I anticipated.

Thoughts:


By the end of the race I was in total countdown mode, repeating to myself in the wee hours of the morning "6 laps to go, 6 laps to go, 6 laps to go." This mantra didn't keep me going per se, but rather occupied my thoughts in a very difficult period for me.

In fact, through the last 48 hours I would occasionally find myself audibly saying, "this is so hard.." In fact, it occurred to me that this was so hard that the difficulty alone made it worth doing. Yes it's fun to see my runner friends, it's fun to achieve goals, it's fun to hear the hurrahs of all my non-runner facebook friends, it's even fun to be a little competitive.. but as I staggered along muttering at 2am, I suddenly realized that I simultaneously hated and loved how hard it was. I don't want to be melodramatic, but at that point things honestly sucked ass. But there was something intrinsic about the experience of attempting something so difficult that kept me going despite the suffering that would have overwhelmed the more trivial advantages.

People talk about scenery and friends and nice weather and whatever else gets them to do these things, but honestly - you can approach ATY as something so simple - running/walking - and make it difficult enough that the difficulty itself becomes the primary draw rather than the activity or mode. In other words, I go to this thing not because I like to run, but because I like to attempt incredibly difficult tasks.

That was the insight that I had from this race.

On another note, I know this was a good race for me because even thought I feel like it was a positive experience, I am also feeling completely depleted. My aches and pains are going away and my blisters are healing, but my mental state is still such that I am not interested in racing another ultra anytime soon. This will change, of course - I'm sure I'll be ready to rip off a 100-mile finish at Umstead on March 31 - but I also am unwilling to do anything before then, nor anything too soon after.

Jan 5, 2012

Java: Character Encoding in char[] and byte[]

We were dealing with an issue at work where some emails going to Japanese customers were coming through as jibberish. So I went into the code and noticed that, in the processing, at one point the email body gets converted to a byte array and then back as a String. Suspecting that was the problem, I found myself wondering about java's byte[] and the more familiar char[]. There are methods in the java class library for java.lang.String that convert into both  (getBytes() and toCharArray()), so what's the difference between these two primitive types?

The more I thought about it, the more it made sense that the byte array was going to be the issue. At first glance (especially for those of us in the west), a char would seem to be essentially the same as a byte. But that was before I gave more consideration to the incredibly complex and amazing world of character encoding. A byte is by definition 8 bits, so only character sets that use 8 bits can be represented as a byte - limiting you to 255 characters. The primitive data type char on the other hand (and I had to look this up) is a single 16-bit unicode character - two bytes long - and 16 bits is enough to represent 65535 unique characters - plenty for, say - the japanese character set.

To test this theory, I wrote a little program that read a string of japanese text from a file (in Unicode), then converted it to both a byte[] and a char[], then converted those back to two Strings and compared them against the original.


public void testJapaneseCharactersInIsolation() throws Exception {
    \\ jap.txt contains a sentence or two of japanese text.
    FileInputStream file = new FileInputStream ("c:\\jap.txt");
    \\ this would look at my locale and by default try it as UTF-8 
    \\ so we'll instead specifically tell it to read it as UTF-16.
    InputStreamReader isr = new InputStreamReader(file, "UTF-16");
    BufferedReader reader = new BufferedReader(isr);

    String japanese = reader.readLine();
    char[] chars = japanese.toCharArray();
    String japaneseChars = new String(chars);

    byte[] bytes = japanese.getBytes();
    String japaneseBytes = new String(bytes);

    if (japanese.equals(japaneseChars)) {
        System.err.println("chars equals original");
    } else {
        System.err.println("chars fails");
    }
    if (japanese.equals(japaneseBytes)) {
        System.err.println("bytes equals original");
    } else {
        System.err.println("bytes fails");
    }
}

And the output:

chars equals original
bytes fails

Internally, we're actually using ByteArrayOutputStream without specifying any sort of encoding on the toString(), so the effect is as if we were just doing a String foo = (bar.getBytes()) and expecting foo to equals the String bar - which it will for UTF-8 - but not for Unicode. Of course, Sun (now Oracle) is smart enough to assume people like me will want to use ByteArrayOutputStream for Japanese characters, and they support an encoding specification in the toString method of the library. The next step for me would be to write some business logic that specifies the proper encoding for the dialect in use (or, preferably, find a single encoding that will work for everyone.)

By the way - Joel Spolsky is much smarter than me and has written a great article educating developers who introduce bugs by only supporting 8-bit character encoding. It's called "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" http://www.joelonsoftware.com/articles/Unicode.html.