Code Refactoring pt. II: Not Without Tests
This post is a continuation of my previous post thoughts on refactoring.
Refactoring without tests is like going for a trip in an inhabited city without a map – you can sense you’re going the right way, but you aren’t really sure. And you can’t ask anyone — there is no one to help you.
Changing software without tests it quite the same — you might think you are doing the right thing, but you might be wrong.
Yes, just like little steps, there are changes that can be done safely without automatic tests, but most of the time they will be either trivial, or you won’t feel as comfortable making them. Checking manually if a minor change broke something is out of question – how many times did you change some misery thing and didn’t even bother checking it because it would take even longer than it took to make that change? “Knowing” it will work just fine. I know I did.
Usually it really doesn’t do any damage, but there is that 1% when it does, and frankly, it is the only one that really counts. Would you really think your client cares if 99 times out of 100 you’ve safely extracted method and broke it just this one time with this really minor and easy to fix bug? No one really cares.
The Highest Level Automatic Testing
The first thing to do is to put some piece of code I wanted to change into a test harness. I’ve tried and it failed. Even when I was able to compile it, it would just crash after executing first lines of code and without tests in place I did not want to make any modifications in code to change that. It was already messed up, and I’d have surely mess it up even more, needless to say introduced some fresh bugs on the way.
The idea was to keep on going to higher level of implementation and see what I could execute that code in my test harness. The easiest way to do this was to find where it is actually called from the dialog. Then to run code in test harness and see what other objects should I add.
For example, Graph would compile without edges, but it wouldn’t do anything. It would also compile without a reference to MapObject, but would just crash. Creating a valid MapObject was bizarre to say the least. But in application it is created from the archive, so why wouldn’t I do the same? Because MapObject is linked to edges, I had to upload edges from the archive too.
Eventually after some hard time, it compiled, ran, executed and give me the expected result. It work exactly like the real application. But that required to drag in half of the solution and my test data was dependent on external files and database. It took long to compile and even longer to load all the archives it was depending on. Even worse – it was really hard to write a test case for this “test harness”.
Because it was so hard to get some fake data, at first it wasn’t really normal testing. The only thing that I could really do was to simulate user behavior. First, I have generated huge amount of random input automatically from the historical data. Then, from that data I have created input, generated an output file and saved it. That was like a base file for future changes. Whenever I made a change, I’d generate another output file and just confirm if they are still the same.
It might not seem like a very elegant way, and surely it isn’t, but at least it got me in the state of some kind of automatic testing. From there it was possible to move forward, make changes to code and be sure it still works as it did before without checking manually. It might be cumbersome as it required a lot of work to prepare those weird tests and they took forever to run. But in the end, it is still pays a whole lot. And even if it doesn’t seem so, it encourages to move towards a better, more testable design.
Moving Forward from Highest Testing Level
Having very high level automated test in place I was able to create some that would be closer to the problem domain. The Key here was to change code a little, confirm that higher level tests still pass, create lower level tests.
For example, I was able to remove dependency from external files from the main algorithm library (and of course to write some much more elegant tests!) by removing references to these objects and placing them at the higher level. Now the Graph algorithm library didn’t have to rely on some external input. When doing this, I would always run my heavyweight high level tests to make sure if it is still doing okay.
This might seem like a lot of work and that it would be just faster to do the job. But now I am really glad I’ve decided to do it better, not faster. In the end I was able to write code unbelievably fast compared on how it used to be.
Graph library is not fully covered by test yet, but its key algorithms are, and now it is a pleasure to work with them. Whenever I need to make a change, I just run the tests and know that everything is as fine as it was.