Skip to content

Code Refactoring pt. II: Not Without Tests

April 11, 2012

This post is a continuation of my previous post thoughts on refactoring.

Refactoring without tests is like going for a trip in an inhabited city without a map – you can sense you’re going the right way, but you aren’t really sure. And you can’t ask anyone — there is no one to help you.

Changing software without tests it quite the same — you might think you are doing the right thing, but you might be wrong.

Yes, just like little steps, there are changes that can be done safely without automatic tests, but most of the time they will be either trivial, or you won’t feel as comfortable making them. Checking manually if a minor change broke something is out of question – how many times did you change some misery thing and didn’t even bother checking it because it would take even longer than it took to make that change? “Knowing” it will work just fine. I know I did.

Usually it really doesn’t do any damage, but there is that 1% when it does, and frankly, it is the only one that really counts. Would you really think your client cares if 99 times out of 100 you’ve safely extracted method and broke it just this one time with this really minor and easy to fix bug? No one really cares.

The Highest Level Automatic Testing

The first thing to do is to put some piece of code I wanted to change into a test harness. I’ve tried and it failed. Even when I was able to compile it, it would just crash after executing first lines of code and without tests in place I did not want to make any modifications in code to change that. It was already messed up, and I’d have surely mess it up even more, needless to say introduced some fresh bugs on the way.

The idea was to keep on going to higher level of implementation and see what I could execute that code in my test harness. The easiest way to do this was to find where it is actually called from the dialog. Then to run code in test harness and see what other objects should I add.

For example, Graph would compile without edges, but it wouldn’t do anything. It would also compile without a reference to MapObject, but would just crash. Creating a valid MapObject was bizarre to say the least. But in application it is created from the archive, so why wouldn’t I do the same? Because MapObject is linked to edges, I had to upload edges from the archive too.

Eventually after some hard time, it compiled, ran, executed and give me the expected result. It work exactly like the real application. But that required to drag in half of the solution and my test data was dependent on external files and database. It took long to compile and even longer to load all the archives it was depending on. Even worse – it was really hard to write a test case for this “test harness”.

Because it was so hard to get some fake data, at first it wasn’t really normal testing. The only thing that I could really do was to simulate user behavior. First, I have generated huge amount of random input automatically from the historical data. Then, from that data I have created input, generated an output file and saved it. That was like a base file for future changes. Whenever I made a change, I’d generate another output file and just confirm if they are still the same.

It might not seem like a very elegant way, and surely it isn’t, but at least it got me in the state of some kind of automatic testing. From there it was possible to move forward, make changes to code and be sure it still works as it did before without checking manually. It might be cumbersome as it required a lot of work to prepare those weird tests and they took forever to run. But in the end, it is still pays a whole lot. And even if it doesn’t seem so, it encourages to move towards a better, more testable design.

Moving Forward from Highest Testing Level

Having very high level automated test in place I was able to create some that would be closer to the problem domain. The Key here was to change code a little, confirm that higher level tests still pass, create lower level tests.

For example, I was able to remove dependency from external files from the main algorithm library (and of course to write some much more elegant tests!) by removing references to these objects and placing them at the higher level. Now the Graph algorithm library didn’t have to rely on some external input. When doing this, I would always run my heavyweight high level tests to make sure if it is still doing okay.

This might seem like a lot of work and that it would be just faster to do the job. But now I am really glad I’ve decided to do it better, not faster. In the end I was able to write code unbelievably fast compared on how it used to be.

Graph library is not fully covered by test yet, but its key algorithms are, and now it is a pleasure to work with them. Whenever I need to make a change, I just run the tests and know that everything is as fine as it was.


Calculate Road Direction in MapInfo Files

March 12, 2012

Mapnik has a neat feature to label lines – text nicely fallows them. However in Lithuania we have a lot of small streets with huge names and the final result isn’t as nice. Placing labels horizontally wasn’t the option as well, because it just look terrible.

So I wanted to place labels with some angle, here is my result:

I think OGR should have a feature to do it, but if we’re working with MapInfo files (and they are just text), it is easy to just do it in Python if you don’t want to investigate the API. Here is the quick and dirty hardcoded Python code to do it:

import math

def radians(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    m = math.sqrt((dx*dx) + (dy*dy))

    if m == 0:
        return 0
    return math.atan2(dy/m, dx/m)

def radiansToDegress(r):
    return 180 * r / math.pi

def normalize(degrees):
    while degrees < 0:
        degrees += 360
    if degrees >= 180 and degrees < 270:
        return degrees - 180
    if degrees >= 90 and degrees < 180:
        return degrees + 180
    return degrees

f = open('C:\Roads1Name.mif', 'r')
mifLines = f.readlines()

f = open('C:\Roads1Name.mid', 'r')
midLines = f.readlines()

mifOut = open('C:\Roads1NameO.mif', 'w')
midOut = open('C:\Roads1NameO.mid', 'w')

i = 0
x1 = 0
y1 = 0
x2 = 0
y2 = 0
currentLine = 0
lineCount = -1
for line in mifLines:
    newMifLine = line
    currentLine += 1

    if 'Columns' in line:
        number = int(line[8:-1])
        newMifLine = 'Columns ' + str(number+1) + '\n'

    if 'Data' in line:
        newMifLine = 'ORIENTATION Integer\n' + line

    if 'Line' in line:
        coords = line.split()[1:5]
        x1 = float(coords[0])
        y1 = float(coords[1])
        x2 = float(coords[2])
        y2 = float(coords[3])
        midLine = midLines[i][0:-1] + ',' + str(normalize(radiansToDegress(radians(x1, y1, x2, y2)))) + '\n'
        i += 1

    if 'Pline' in line:
        lineCount = int(line[6:-1])
        currentLine = 0

    if currentLine == 1 and lineCount != -1:
        coords = line.split()
        x1, y1 = float(coords[0]), float(coords[1])

    if currentLine == lineCount:
        coords = line.split()
        x2, y2= float(coords[0]), float(coords[1])
        str(normalize(radiansToDegress(radians(x1, y1, x2, y2))))
        midLine = midLines[i][0:-1] + ',' + str(normalize(radiansToDegress(radians(x1, y1, x2, y2)))) + '\n'
        i += 1


The Pragmatic Programmer – Review

March 4, 2012

So I have finally finished The Pragmatic Programmer from cover to cover, including Foreword, solving the given problems and rethinking what every tip could do for me and how I could improve the way I work.

I’ve mentioned having this book finally ticked as “done” to my friend and great coder Martynas, he (quite skeptically) said:

All of it should have been quite obvious, wasn’t it?

Like an example he mentioned the Tip No. 4 “Don’t Live with Broken Windows”. Well yes, isn’t it quite obvious? I’ve even wrote it myself in Conditions Suck:

The thing is, if it is done wrong, most probably, while implementing some small change, it will force you to keep the existing bad style. Eventually, if code smells, it will start to suffocate you as the stench grows.

Or “Always Use Source Control”. I mean, who does not use source control these days for everything they write?

So depending of how long you have been in the industry, you might find that you already are using all the techniques described in this book. They aren’t magical, they are just some clever things that you’d eventually find out yourself as you work. Just like Design Patterns or absolutely everything else.

Furthermore, I think that at least half of them are put in our heads almost as soon as we start writing code. For example tips about testing, I remember coding in first courses in university when I was implementing such things as Dijkstra algorithm, Stack, List and other stuff. I haven’t heard about Unit Testing at that time, however instead of using debugger, I wrapped my functions in little tests to see if they really do what I want them to, which I’m sure most of us did. And now testing is becoming a basic routine for majority of coders.

The most obvious Tip on the book is “Don’t Repeat Yourself”. I’m sure we all heard it the day we’ve learned about functions. The book takes it much more serious though — it is the main tip that almost all the book is based upon (for example in “Code Generators”). But still it is the same principle and when you’ll find yourself repeating, you will think how to avoid it and finally get rid of it, right? Well, not always, at least for me. As written in chapter “… Boiled Frogs”, you could easily forget something, then suddenly it goes out of hand and next thing you know — you are boiled. Maybe you’re repeating some simple actions every week that take 15 minutes or so and don’t even think about it as a big deal, but after reading chapter on automation, in 30 minutes you will write a script that does it for you. It might save you from some trouble that would happen if you would forget something. Or any other tip. It might just hit you in the head.

And that is the main strength of this book, it MAKES you think. Yes, you might know all the principles in the book, work by them, but isn’t there a place for improvement? The great analogies in The Pragmatic Programmer might just give you a hint on what could be done better. Isn’t to buy list just the reminder not to forget milk? But you still read them? Well, The Pragmatic Programmer might be just that, a great reminder, so you won’t forget.

For anyone with less experience, the recommendation would be a no brainer — this absolutely is a must-read.

While I work with Legacy Code, the biggest issue for me was that some of the tips are really useful for new project but might not really work well existing code base. And none that would aim straight for legacy code. That’s why now I’ve picked “Working Efficiently with Legacy Code” by Michael C. Feathers.

Oh and favorite tip? That would be No. 44:

Don’t Program by Coincidence

Execute, don’t Factor

February 20, 2012

Design Patterns are great, however what I really dislike about them is the “Design Pattern Driven Development”. I’ve heard such things as “programming is all about patterns” and “every solution should be based on design patterns”. While using patterns is great, studying them should not lead to conclusion, that solving any situation by the book is a great thing. It will without doubt lead to some really bizarre solutions. Also, even if the SOLID approach is truly a solid base for writing good software, it should be not be looked as the only truth in software development. Here I want to show, how coding to DIP principle can be harmful and that coding to concrete classes is not always a bad thing.
Consider the fallowing problem: we want to implement a solution that would create a distance matrix from one point to another. If direct line distance is greater than 50 kilometers, we would like to use A* algorithm and Dijkstra algorithm otherwise.

For that, lets create two classes: AStar and Dijkstra that inherit from common interface ShortestPath.

class ShortestPath
  virtual int findPath(int startNodeID, int endNodeID) const = 0;

class Dijkstra : public ShortestPath
  int findPath(int startNodeID, int endNodeID) const;

class AStar : public ShortestPath
  int findPath(int startNodeID, int endNodeID) const;

After we have this, to avoid using concrete implementation in clients, we might want to create some kind of Factory Design Pattern for ShortestPath to avoid using a concrete classes. Lets do it in the simplest possible way:

class ShortestPathFactory
  //returns Dijkstra if directDistance < 50 and AStar otherwise.
  std::unique_ptr<ShortestPath> create(int directDistance) const;

Now we can use it as such:

ShortestPathFactory algorithmFactory;
for (std::vector<int>::const_iterator it = myNodes.begin(); it != myNodes.end(); it++)
  for (std::vector<int>::const_iterator jt = myNodes.begin(); jt != myNodes.end(); jt++)
    int directLineDistance = getDirectLineDistance(*it, *jt);
    std::unique_ptr<ShortestPath> algorithm = algorithmFactory.create(directLineDistance); 
    int realDistance = algorithm->findPath(*it, *jt);

Looks great! But one thing bothers me here — creating and releasing resources gives an overhead in performance. Instead of using Factory, we could implement it as this:

class ShortestPathExecutor
  AStar aStarAlgorithm;
  Dijkstra dijkstraAlgorithm;
  int execute(int directDistance, int startNodeID, int endNodeID) const
    if (directDistance < 50)
      return dijkstraAlgorithm.findPath(startNodeID, endNodeID);
      return aStarAlgorithm.findPath(startNodeID, endNodeID);

and use it:

ShortestPathExecutor algorithmExecutor;
for (std::vector<int>::const_iterator it = myNodes.begin(); it != myNodes.end(); it++)
  for (std::vector<int>::const_iterator jt = myNodes.begin(); jt != myNodes.end(); jt++)
    int directLineDistance = getDirectLineDistance(*it, *jt);
    int realDistance = algorithmExecutor.execute(directLineDistance, *it, *jt);

Here, now we don’t have overhead in performance, whenever we need to calculate the distance, we use Executor and expanding it is as easy as expanding Factory.
Though I am for sure not saying that Creation Patters are useless, but I do think they are abused. I remember first time when I used this approach to solve the problem, I’ve almost felt guilty for doing it “wrong”, not how I read somewhere in the books.

C++11 Training by Scott Meyers

January 24, 2012

The legendary author of  classics and probably most influential books on C++  “Effective C++” and “More Effective C++” is announcing a C++11 training material that finally fully corresponds to the new standard.

Find out how you can get touch of it here or check the sample.

What I can say from sample – once again Scott Meyers does a great job. While the covered subjects seem to be trivial and probably will be available on numerous blogs as a c++11 tutorials for free, I’d still rather learn from the trusted source instead of random blog like this one… : )

Magic Buffer

January 11, 2012

Today I was digging into some of the deepest realms of my code base, the code there has not been touched for ages. It is a 3000 code line incomprehensible monster back from “11.8.1999” with methods it should not have, full of #defines, pointers and other crap.

While I was reading it to understand how some of the things work, I particularly struggled with one code block. Skimmed through it few times and still couldn’t get it. So I’ve started reading it more carefully, when suddenly I’ve realized how main variable is named:

char magicbuf[10];

A magic buffer, sweet. I just love magic in software development!

C++ and Beyond 2011 Sessions

January 9, 2012

You can find them on Scott Meyers blog here, or just go these Microsoft channel 9 links:

Concurrency and Parallelism

Ask Andrei Alexandrescu, Scott Meyers and Herb Sutter anything