Dial ‘M’ for Millisecond

Grace Kelly starred in "Dial M For Murder," a 1954 Hitchcock film.  (source: imdb.com)

Grace Kelly starred in “Dial M For Murder,” a 1954 Hitchcock film. (source: imdb.com)

There are times when, if you look close enough, milliseconds matter.  Functions in your mobile app that don’t run as fast as they could will incrementally steal processor cycles … harass the garbage collector … like some shadowy process stalker on the other end of the phone.

Given a specific algorithm you often have a choice.  One choice will keep runtime overhead in check as much as possible.  Another may add incremental amounts of runtime and memory usage with each use.  Next thing you know  …

You’ve dialed ‘M’ for murderous milliseconds.  Just don’t be surprised when the monster comes calling.

The problem is – how do you know that’s what you’ve done in the first place? How do you get to the processor cycle killing beast before it shows up during the QA cycle … laughing at you as it adds three, four, five … no six … seconds to the time it takes your app do to something like de-serialize binary data from a network connection.

One place that can help is to look at the functions in your app that get run repeatedly throughout it’s life.  Functions that perform data marshaling of Java primitives to and from byte arrays are pretty obvious.  I recently had some time to take a deep dive into a couple ways of doing this on Android, and I thought I would share some of the hard data.

So, here’s the problem I was looking at: reading and writing  Java primitives to/from byte arrays.  These kinds of functions are relied upon a LOT in many mobile applications making them good candidates for optimization.  Shaving even a few hundred milliseconds off a function that, say saves Double values to a byte array, could mean that the user is waiting several seconds less for the app to complete a task.  In other words, the faster you can get your primitives into and out of a byte array the more responsive your application is going to be.

So, I wrote a bit of code to compare two different ways of saving double values into a byte array.  In one version I used DataOutputStream:

    public static byte[] toByteArray(Double[] array) {
        ByteArrayOutputStream os = new ByteArrayOutputStream(array.length * 8);
        DataOutputStream dos = new DataOutputStream(os);

        try {
            for (int i = 0; i < array.length; i++) {
                if (array[i] == null)
        } catch (IOException e) {
            throw new PersistException(e);

        return os.toByteArray();

In another version I relied on straight byte manipulation using bitwise operators:

    public static final byte[] toBytes(Double[] array) {

        byte[] result = new byte[array.length * 8];
        for (int i = 0; i < array.length; i++) {
            if (array[i] != null) {
                toBytes(array[i], result, i * 8);

        return result;

    public static final void toBytes(double value, byte[] dest, int start) {

        toBytes(Double.doubleToLongBits(value), dest, start);


    public static final void toBytes(long value, byte[] dest, int start) {

        dest[start] = (byte) (value >>> 56);
        dest[start + 1] = (byte) (value >>> 48);
        dest[start + 2] = (byte) (value >>> 40);
        dest[start + 3] = (byte) (value >>> 32);
        dest[start + 4] = (byte) (value >>> 24);
        dest[start + 5] = (byte) (value >>> 16);
        dest[start + 6] = (byte) (value >>> 8);
        dest[start + 7] = (byte) value;


The interesting thing about these two examples is this: If you look at the code for DataOutputStream you will notice that at its core, DataOutputStream essentially does the same thing as the function I wrote by hand.  You will also notice as well, that using DataOutputStream will potentially require more memory overall because the classes for DataOutputStream and ByteArrayOutputStream both need to be loaded into memory and they obviously have internal state that needs to be stored.

So, I calculated the time it took for each function to run:

        Double[] doubles = new Double[1000];
        for (int i = 0; i < 1000; i++) {
            doubles[i] = (double) i;

        long start = System.currentTimeMillis();
        byte[] resultBytes1 = toByteArray(doubles);
        long elapsed = System.currentTimeMillis() - start;
        Log.i("TRACE", "elapsed using buffers : " + elapsed + " ms to generate "
                + resultBytes1.length
                + " bytes");

        start = System.currentTimeMillis();
        byte[] resultBytes2 = toBytes(doubles);
        elapsed = System.currentTimeMillis() - start;
        Log.i("TRACE", "elapsed using my code : " + elapsed + " ms to generate "
                + resultBytes2.length + " bytes");


I ran this several times to make sure the Dalvik optimizer had a chance to do its job:

09-19 11:02:07.308: I/TRACE(11146): elapsed using buffers : 328 ms to generate 8000 bytes
09-19 11:02:07.498: I/TRACE(11146): elapsed using my code : 189 ms to generate 8000 bytes
09-19 11:02:34.038: I/TRACE(11196): elapsed using buffers : 310 ms to generate 8000 bytes
09-19 11:02:34.238: I/TRACE(11196): elapsed using my code : 196 ms to generate 8000 bytes
09-19 11:02:59.778: I/TRACE(11226): elapsed using buffers : 289 ms to generate 8000 bytes
09-19 11:02:59.948: I/TRACE(11226): elapsed using my code : 166 ms to generate 8000 bytes


It could be said that if you take the lazy way out and stick with DataOutputStream, you’re making a choice to build a user experience delay right into your application.  A delay that is visible to the user.

Using DataOutputStream to serialize 1,000 doubles ran consistently about 120 milliseconds slower than using shift operators directly. 120 milliseconds isn’t a lot of time until you consider the compounding effect those 120 milliseconds will have.  1,000 doubles isn’t a lot of data – only 8,000 bytes to be exact.  You will only have to serialize the array of Doubles using DataOutputStream about three times (e.g.  24,000 bytes of data) before that 120 milliseconds becomes noticeable.

Introducing delays into your application that serve no purpose is generally a bad idea.

Take into account your application will be serializing/de-serializing megabytes – perhaps even gigabytes of data while it runs, and the additional runtime overhead DataOutputStream requires becomes that lurking monster in your app that will, pardon the phrase, byte you in the end.

I haven’t taken into account the increased in-memory requirement that DataOutputStream will require as well, nor the additional work you’re setting up the garbage collector to do.  In short, the number of objects instantiated and destroyed by DataOutputStream undoubtedly adds more stress on the garbage collector.  That, in turn, could actually begin to slow down your application as well compared to using the other method.


And now it's your turn ... comment here or on Twitter (@Androider)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s