Refactoring Ruby programmatically.
When you join a new team, one of of the first things you’ll figure out is their preferred coding style. They probably have a linter like rubocop or flake8 to delegate style arguments to computers who are the supreme pedantics. Sometimes though, you’ll find reasons to change a repository’s coding style or to merge in another code base with different style choices.
At a certain scale, you’ll probably just fixing things by by hand, but for projects that span thousands of files, no amount of caffeine can mask the pain.
Linting is just one example of a broader category of problems: how do you refactor large codebases? Although a common answer is simply not to refactor at scale, that tends to cause codebases to degrade rapidly over time. It can be better!
Most languages come with libraries to represent source code as s-expressions, which you can then modify in new ways to generate modified source code. For Ruby, two libraries that do just that are:
ruby_parser which parses a Ruby file and outputs its s-expressions,
ruby2ruby which translates s-expressions into Ruby code.
Let’s try them out. (Full source code is available on Github.)
Imagine we have a method incr
which used to require two parameters, but most invocations incremented by 1
, so we added 1
as the default value for the second parameter. Now we want to rewrite all calls to incr
to only pass a second value if it is different from the default.
So we could imagine some code looking like this:
def incr(x, i = 1) x + i end incr(5, 100) incr(3, 1) incr(10, 1) incr(17, 17)
That we want to rewrite to look like this:
def incr(x, i = 1) x + i end incr(5, 100) incr(3) incr(10) incr(17, 17)
The interesting parts of our code will be in a rewrite
function, so first let’s write the scaffolding that function will live within, and then rush to the fun parts.
The scaffolding is thankfully pretty short, requiring a few libraries, parsing stdin to s-expressions, rewriting those s-expressions into Ruby code, and then outputting to stdout
.
require 'ruby_parser' require 'ruby2ruby' def rewrite(expr) expr end parsed = RubyParser.new.parse(ARGF.read) puts Ruby2Ruby.new.process rewrite(parsed)
Assuming you have the example input above in a file named refactor.input
and you name this file refactor.rb
, then you can run it using:
ruby refactor.rb < refactor.input
This is actually pretty cool, because we’re taking in some code, parsing it, and then recombining it, but the really fun part is what comes next: modifying it!
Astute readers will notice the output version has some extraneous parentheses. I’m skimming over because it’s equivalent Ruby code, but it’s a bit annoying, and perhaps an astute reader will propose a non-regex based solution.
The rewrite
function gets called by ruby_parser
on the top-level s-expression, from which you
can recursively explore all the program’s s-expressions. To explore the structure of individual
s-expressions a bit, consider the input:
incr(3, 1)
Which is represented by a Ruby object whose structure is:
s(:call, nil, :incr, s(:lit, 1), s(:lit, 2))
In order, these values are:
:call
is the kind of s-expression (some other common kinds are:block
,:lasgn
and:defn
) for invoking a function,the second value,
nil
, doesn’t contain anything interesting for:call
, although it does for other kinds,:incr
is the name of the function invoked,remaining values are the parameters passed to invoked function.
Reminding ourselves of our original problem statement: can we remove the second parameter of calls to the :incr
function if they specify the same value as the default parameter? Yup, we now know enough to write that function:
def rewrite(expr) if expr.is_a? Sexp if expr[0] == :call && expr[2] == :incr && expr.size == 5 && expr[4][0] == :lit && expr[4][1] == 1 # remove the second parameter expr.pop() end # descend into children expr.each { |x| rewrite(x) } end expr end
There are three interesting parts here:
- We should only rewrite objects that are s-expressions!
- If we’re calling
incr
and the second parameter is the new default parameter, alit
of value1
, then we should remove it. - Recursively descend into the contents of each s-expression. Otherwise you’ll only see the top-level
:block
s-expression which is pretty boring.
Stepping back, I think this is pretty awesome! We’re now programmatically rewriting code. We can use this to maintain even large codebases without doing huge amounts of manual toil.
Let’s try it again, doing something a bit more ambitious. Imagine you’ve hired a bunch of Python programmers on our team who keep writing Python-style for
loops instead of learning Ruby’s each
idiom, and that we want to rewrite them to use each
.
Your input might be something like:
def count(lst) i = 0 for ele in lst i += 1 end end
And you’d want this output:
def count(lst) i = 0 lst.each { |ele| i = i + 1 } end
Taking another stab at our rewrite function, this is a bit messier:
def rewrite(expr) if expr.is_a? Sexp if expr[0] == :for lst = expr[1] param = expr[2] func = expr[3] expr.clear expr[0] = :iter expr[1] = Sexp.new(:call, lst, :each) expr[2] = Sexp.new(:args, param[1]) expr[3] = func end # descend into children expr.each { |x| rewrite(x) } end expr end
A bit messier, but also a pretty neat demonstration of what you can do is once you start playing around with this technique. For example, you could imagine only doing this if the complexity of the refactored for
loop is low enough.
What next?
These are very contrived examples, but I think are enough to let you start dreaming about ways this technique could be applied usefully to your work, particularly if your work involves migrating large codebases to new implementations. Google’s Large-Scale Automated Refactoring Using ClangMR is an interesting case study of doing that at immense scale, and Source Code Rejuvenation is Not Refactoring is another exploration of this topic.
Most importantly, I think this is a good reminder to avoid falling into the “I’ll just work through it” mindset for large migrations, which I believe can become the limit on your company’s overall throughput.