Tuesday, August 5, 2014

Why do You Always Seem to be Refactoring?

Wikipedia defines refactoring as:

Code refactoring is the process of restructuring existing computer code – changing the factoring – without changing its external behavior. Refactoring improves nonfunctional attributes of the software. Advantages include improved code readability and reduced complexity to improve source code maintainability, and create a more expressive internal architecture or object model to improve extensibility.
.. and this definition is good but it doesn't capture how refactoring fits into the overall picture of software development on a long-lived project. On a software project that has been going for a while most of what developers do is refactoring.

When you first start a brand new project there's tons of new code to write. There's the game engine and the code that talks to the website and then there's the website code itself. However, as the project continues you will find yourself transitioning from writing brand new things to reusing existing things in a new way.

What has happened is that over the years your project has built up a toolkit for dealing with problems in your domain. You don't need a "users" database because you already have one. Similarly you don't need a messaging system because one already exists. If you project has gone on long enough it even has an email client in there somewhere. The project becomes more about refactoring existing code to do new things and less about adding new code.

Why build when you can reuse? As Joel pointed out in his now classic article, re-writing something is surprisingly hard. I'll let him explain it:

Yes, I know, it's just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I'll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn't have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.
Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it's like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.
When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.

Re-writing something is very time consuming with the added problems of violating "don't repeat yourself". These reasons are the reasons why you refactor a lot on a long-lived project.

Before I go I want to point out how this fits into the previous article on Technical Debt.

On a long lived project, a new feature will typically be implemented by changing the existing code base. All aspects of the existing codebase that aren't compatible with the new direction become technical debt. You refactor the code to get rid of the technical debt and magically you have the new feature with barely any new code. This last part can confuse people because refactoring isn't supposed to change external behaviour and it doesn't here either. What's changing is the software's organization. You're taking a design that was never intended to have the new feature and turning it into a design that expects the new feature. Once you do that, the feature can sometimes be implemented in just a handful of lines of code.

If you're curious about whether what you're doing is the sort of refactoring I'm talking about then read this article by Steve Rowe about when to refactor.

Until next time here is a picture of a  bunny:


No comments: