Wednesday, July 08, 2009

Refactor your code from the command-line

While the refactoring support in Xcode 3 has been something of a headline feature for the development environment, in fact there's been a tool for doing Objective-C code refactoring in Mac OS X for a long time. Longer than it's been called Mac OS X.

tops of the form



My knowledge of the early days is very sketchy, but I believe that tops was first introduced around the time of OPENSTEP (so 1994). Certainly its first headline use was in converting code which used the old NextStep APIs into the new, shiny OpenStep APIs. Not that this was as straightforward as replacing NX with NS in the class names. The original APIs hadn't had much in the way of foundation classes (the Foundation Kit was part of OpenStep, but had been available on NeXTSTEP for use with EOF), so took char * strings rather than NSStrings, id[]s rather than NSArrays and so on. Also much rationalision and learning-from-mistakes was done in the Application Kit, parts of which were also pushed down into the Foundation Kit.

All of this meant that a simple search-and-replace tool was not going to cut the mustard. Instead, tops needed to be syntax aware, so that individual tokens in the source could be replaced without any (well, alright, without too much) worry that any of the surrounding expressions would be broken, without too much inappropriate substitution, and without needing to pre-empt every developer's layout conventions.

before we continue - a warning



tops performs in-place substitution on your source code. So if you don't like what it did and want to go back to the original… erm, tough. If you're using SCM, there's no problem - you can always revert its changes. If you're not using SCM, then the first thing you absolutely need to do before attempting to try out tops on your real code is to adopt SCM. Xcode project snapshots also work.

replacing deprecated methods



Let's imagine that, for some perverted reason, I've written the following tool. No, scrub that. Let's say that I find myself having to maintain the following tool :-).

#import <Foundation/Foundation.h>

int main(int argc, char **argv, char **envp)
{
NSAutoreleasePool *arp = [[NSAutoreleasePool alloc] init];
NSString *firstArg = [NSString stringWithCString: argv[1]];
NSLog(@"Argument was %s", [firstArg cString]);
[arp release];
return 0;
}


Pleasant, non? Actually non. What happens when I compile it?

heimdall:Documents leeg$ cc -o printarg printarg.m -framework Foundation
printarg.m: In function ‘main’:
printarg.m:6: warning: ‘stringWithCString:’ is deprecated (declared at /System/Library/Frameworks/Foundation.framework/Headers/NSString.h:386)
printarg.m:7: warning: ‘cString’ is deprecated (declared at /System/Library/Frameworks/Foundation.framework/Headers/NSString.h:367)


OK so we obviously need to do something about this use of ancient NSString API. For no particular reason, let's start with -cString:

heimdall:Documents leeg$ tops replacemethod cString with UTF8String printarg.m


So what do we have now?

#import <Foundation/Foundation.h>

int main(int argc, char **argv, char **envp)
{
NSAutoreleasePool *arp = [[NSAutoreleasePool alloc] init];
NSString *firstArg = [NSString stringWithCString: argv[1]];
NSLog@"Argument was %s", [firstArg UTF8String], length);
[arp release];
return 0;
}


Looking good. But we still need to fix the -stringWithCString:. That could be just as easy, replacemethod stringWithCString: with stringWithUTF8String: would do the trick. However let's be a little
different here. Why don't we use -stringWithCString:encoding:? If we do that, then we're going to need to take a guess at the second argument, because we've got no idea what the encoding should be (that's why -stringWithCString: is deprecated, after all. However if we're happy to assume UTF8 is fine for the output, let's do that for the input. We'd better let everyone know that's what happened, though.

So this rule is starting to look quite complex. It says "replace -stringWithCString: with -stringWithCString:encoding:, keeping the C string argument but adding another argument, which should be NSUTF8StringEncoding. While you're at it, warn the developer that you've had to make that assumption". We also (presumably) want to combine it with the previous rule, so that if we see the original file we'll catch both of the problems. Luckily tops lets us write scripts, which comprise of one or more rule descriptions. Here's a script which encapsulates both our cString rules:

replacemethod "cString" with "UTF8String"
replacemethod "stringWithCString:<cString>" with "stringWithCString:<cString>encoding:<encoding>" {
replace "<encoding_arg>" with "NSUTF8StringEncoding"
} warning "Assumed input encoding is UTF8"


So why does the <encoding> token become <encoding_arg> in the sub-rule? Well that means "the thing which is passed as the encoding argument". This avoids confusion with <encoding_param>, the parameter as declared in the class interface (yes, you can run tops on headers as well as implementations).

Now if we save this script as cStringNoMore.tops, we can run it against our source file:

heimdall:Documents leeg$ tops -scriptfile cStringNoMore.tops printarg.m


Which results in the following source:

#import <Foundation/Foundation.h>

int main(int argc, char **argv, char **envp)
{
NSAutoreleasePool *arp = [[NSAutoreleasePool alloc] init];
#warning Assumed input encoding is UTF8
NSString *firstArg = [NSString stringWithCString:argv[1] encoding:NSUTF8StringEncoding];
NSLog(@"Argument was %s", [firstArg UTF8String]);
[arp release];
return 0;
}


Now, when we compile it, we no longer get told about deprecated API. Cool! But it looks like I need to verify that the use of UTF8 is acceptable:

heimdall:Documents leeg$ cc -o printarg printarg.m -framework Foundation
printarg.m:6:2: warning: #warning Assumed input encoding is UTF8


exercises for the reader, and caveats



There's plenty more to tops than I've managed to cover here. You could (and indeed Apple do) use it to 64-bit-cleanify your sources. Performing security audits is another great use - particularly using constructs such as:

replace strcpy with same error "WTF do you think you're doing?!?"


However, notice that tops is a blunter instrument than the Xcode refactoring capability. Its smallest unit of operation is the source file; refactoring only within particular methods is not quite easily achieved. Also, as I said before, remember to check your source into SCM before running a script! There is a -dont option to make tops output its proposed changes without applying them, too.

Finally tops shouldn't be used fully automated. Always assume that you need to inspect the output carefully, don't just Build and Go.

3 comments:

Phil Nash said...

"don't just Build and Go."

Of course not. You should Build and Go, then run your unit tests :-)

Good article, at least from a historical perspective.

Is there any advantage to using tops over the XCode tools?

leeg said...

The first that springs to mind is that tops was able to refactor use of a compiler macro which Xcode couldn't - a Previous Developer™ had defined his own LocalisedString() which didn't play with genstrings as it didn't take the correct arguments. By using tops across the whole project I was able to replace it with NSLocalizedString(), even in places where the parameter was a nontrivial expression.

The second thing to use tops for is that it scales better than the Xcode UI. Take a look at the refactorings in /Developer/Extras/64BitConversion, now imagine doing them through the Edit->Refactor menu :-)

Or Ron said...

Thank you for this information! It is super helpful.