Lessons Learned in the World of Looping Variables

November 17, 2009

Hello everyone,

As you may have gathered from my last post, I have had a bear of a time trying to get the Gaussian smoothing script working in reasonable time frames.  In order to solve what was going wrong I had to resort to breaking the code up into small pieces and evaluate each one separately.

This sounds like a long and boring procedure, but it was very necessary to find what was slowing my code down so badly.  I thought it was running smoothly yesterday afternoon, set it to run, and left for the night, thinking it would finish in a few dozen minutes and I’d see the results when I came in this morning.  Instead I found that the script had only muscled its way through about 15% of the grid points.  That’s right, my 8-processor, 8 Gb RAM Mac tower, in all of its computational glory, couldn’t make it through more than 15% of the domain in about 14 hours.  Pathetic.

Enter tic and toc.  Tic and toc you ask?  They are little functions in Matlab which essentially act as a stopwatch.  Tic starts the timer and toc displays the time passed since tic was invoked.  This turns out to be very handy for finding the slow spots in a piece of code.

Example:

>> tic; weights = zeros(last_range,360); toc

>> Elapsed time is 0.00095 seconds.

This shows how long this particular process is going to take to execute within the main program.  In my case, this piece of code will be executed for each grid point in a 300*360 polar coordinate system.  Taking this execution time and multiplying it by 300*360, I get ~103 seconds.

If you then go ahead and do this for each part of your script you can really see where computational time is going.  It turned out that I had some ill-placed “for” loops which added far too much computational complexity.  For instance, a simple task of rounding off an array of (300,360) grid points was going to end up taking 972 hours.  That’s right.  42 days.  Obviously that won’t fly, and is a massive waste of resources.  Now that I have rearranged the looping structures, that same rounding procedure will take a grand total of 259 seconds.  Much better, right?

I would recommend everyone looking at their code and looking at its efficiency, especially if it takes too long to execute.  It might seem like a waste of time, but would I have been wasting time if I waited 42 days for my code to run?  It’s much more efficient to spend the time and find the slow spots before having to run the code over and over.  You also become aware of what function combinations speed the code up and what slows it down.  For example, combining simple multiplication and addition within a rounding function actually slows the code down compared to splitting them up in two lines.  This kind of knowledge will help you structure your code better in the future.

P.S. – For anyone who doesn’t use Matlab, there may be functions out there that do the same thing.  In Python one can do:

import time

start = time.time()

do_snippet()

print “It took”, time.time() – start, “seconds to execute”

Another module in Python you can use is “timeit”.  Other languages, as in the example above, can use the system or CPU clock to do similar tasks.

Advertisement

2 Responses to “Lessons Learned in the World of Looping Variables”

  1. Nick Gasperoni Says:

    That’s pretty much excatly why Ming harps about efficiency all the time…it really does matter for large computations! All these years I’ve done programming, it never was a big deal because the scale of the problem was small enough…but now, with enormous matrices one must seriously consider efficiency while coding.

    I’m curious, what did you change with your looping structures, specifically?

  2. Dan Michaud Says:

    Well I essentially had two loops which went through azimuth angle and range (1:360 and 1:300, respectively), and THEN I had loops which calculated the weights and the range from pixel-to-pixel. It was this second level of loops that took forever. Most of the computational power was being wasted just going through all of the extra iterations.

    I was able to take the loops from 360*300*360*300 (=11664000000!) down to 360*300 (=108000). I heavily rely on the fact that Matlab can operate functions like round(), zeros() and sum() over an entire array without explicitly looping through each element in the array. That explicit looping is what took so long.

    Also, Matlab can do things like multiply N-dimensional arrays element-by-element (as long as the dimension lengths are the same, e.g. size(array) = (360,300) and size(array2) = (360,300)), which allowed me to take the looping away altogether. In smaller scripts this kind of wasteful programming was fine because it would take 10 vs. 5 seconds to run. But when execution times reach hours to days… efficiency certainly becomes key!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.