roman numerals and python

| No Comments | No TrackBacks

This Thursday past, AC and I decided to try a simple kata that I’d seen posted on another website. The problem is pretty simple: translate strings of roman numerals into integers. I suggested doing the kata in Python. We’re just starting a Python project at work, so I thought this would be an easy way to get a quick introduction to the language. As is typical in the dojo, we decided to work test-first.

The code shown for these behaviours (tests) is the result of a later refactoring of our test code. I’ve left it this way rather than reverting to the original tests because I think that it’s easier to read and I won’t need to repeat parts of the code as we progress.

AC started the ball rolling by writing the first behaviour that we needed to satisfy: all of the primitive roman numerals (I, V, X, L, C, D, M):

import unittest

def roman_to_int(numeral)
  return 0

class RomanNumeralTest(unittest.TestCase):
  def verify(self, expectations):
    for k in expectations:
        actual = roman_to_int(k)
        self.failUnlessEqual(expectations[k], actual, "for %s: expected %s, got %s" % (k, expectations[k], actual) )

  def testSingularNumeral(self):
    expectations = { 'I': 1, 'V': 5, 'X': 10, 'L': 50, 'C': 100, 'D': 500, 'M': 1000 }
    self.verify(expectations)

Once we got this test running under a test harness, I took the keyboard to produce code to satisfy this behaviour. Since we started with set of values, I decided that the simplest way to solve the problem would be to use a dictionary which mapped a numeral string to an integer.

def roman_to_int(numeral):
  numerals = { 'I': 1, 'V': 5, 'X': 10, 'L': 50, 'C': 100, 'D': 500, 'M': 1000 }
  return numerals[numeral]

I wrote the next expected behaviour: combinations of primitive elements.

def testRepetition(self):
    expectations = { "II": 2, "III": 3, "XX": 20, "XXX": 30, "CC": 200, "CCC": 300, "MM": 2000, "MMM" : 3000 }
    self.verify(expectations)

Since Python allows you to use a string as if it were an array, AC was able to very quickly implement a solution using Python’s ‘for’ construct.

def roman_to_int(numeral):
  numerals = { 'I': 1, 'V': 5, 'X': 10, 'L': 50, 'C': 100, 'D': 500, 'M': 1000 }
  rv = 0
  for n in numeral:
      rv += numerals[n]

  return rv

This change got us very close to completing the kata. Or, so we thought until AC implemented the next behaviour test: the set of numerals where the preceding numeral specifies the number less than the following numeral.

def testPreceeding(self):
    expectations = { 'IV' : 4,  'IX' : 9, 'XL': 40, 'XC': 90, 'CD': 400, 'CM' : 900}
    self.verify(expectations)

The failure output from our test (“AssertionError: for IX: expected 9, got 11”) clearly indicated that our current solution had been a bit naive. We’d assumed that we would always be able to traverse the numeral from left to right using addition to build up the result. As you can tell from the test failure, this meant we were unable to distinguish ‘IX’ from ‘XI’.

We spent a few minutes discussing the different ways that we should approach this problem. AC suggested that we might be able to detect when we’re in a situation like ‘IX’ and use subtraction over addition. This seemed like a plausible solution, but I was enjoying the simplicity of the current solution (a single loop). My concern was that introducing some conditional logic to determine which arithmetic operation to use in the loop would complicate the procedure, making the code more difficult to read.

Since I was having trouble explaining a different solution, I decided to just show it in code. It starts with a simple refactoring: extracting a new method to break a numeral into it’s parts.

def atomize(numeral):
    parts = numeral
    return parts

def roman_to_int(numeral):
    numerals = { 'I': 1, 'V': 5, 'X': 10, 'L': 50, 'C': 100, 'D': 500, 'M': 1000 }
    rv = 0
    for n in atomize(numeral):
        rv += numerals[n]

    return rv

We verified that out previous working tests still functioned after this refactoring, then continued. The next step was to introduce a predicate into atomize() which determined whether a particular numeral could precede another.

def can_precede(l, r):
    return 0

def atomize(numeral):
    parts = []
    i = 0
    inc = 1
   while i < len(numeral):
    if (i + 1) < len(numeral) and can_precede(numeral[i], numeral[i + 1]):
        parts.append(numeral[i] + numeral[i + 1])
        inc = 2
    else:
        parts.append(numeral[i])

    i += inc

    return parts

With this code in place, it was easy to complete the failing test by introducing a precedence table into can_precede() and by adding the require atoms to the original numeral lookup table.

def can_precede(l, r):
  precedence = {
    'I': ['V', 'X'],
    'X': ['L', 'C'],
    'C': ['D', 'M']
  }
  valid = []
  if (l in precedence):
    valid = precedence[l]
  return (r in valid)

def atomize(numeral):
  parts = []
  i = 0
  inc = 1
  while i < len(numeral):
    if (i + 1) < len(numeral) and can_precede(numeral[i], numeral[i + 1]):
        parts.append(numeral[i] + numeral[i + 1])
        inc = 2
    else:
        parts.append(numeral[i])

    i += inc

  return parts

def roman_to_int(numeral):
  numerals = {
    "I": 1,   'IV': 4,
    "V": 5,   "IX": 9,
    "X": 10,  "XL": 40,
    "L": 50,  'XC': 90,
    "C": 100, 'CD': 400,
    "D": 500, 'CM': 900,
    "M": 1000
  }

  rv = 0
  for n in atomize(numeral):
    rv += numerals[n]

  return rv

At this point, we were pretty confident that we’d completed the kata. AC wrote a number of other behaviour tests to verify that this was correct.

def testFollowing(self):
    expectations = {
        "VI" : 6,  "XI" : 11, "LX": 60, "CX": 110, "DC": 600, "MC" : 1100,
        "VII" : 7,  "XII" : 12, "LXX": 70, "CXX": 120, "DCC": 700, "MCC" : 1200,
        "VIII" : 8,  "XIII" : 13, "LXXX": 80, "CXXX": 130, "DCCC": 800, "MCCC" : 1300,
    }
    self.verify(expectations)

def testSimpleCombinations(self):
    expectations = {
        "XIV" : 14,  "XVI" : 16, "XVII": 17, "CXL": 140, "CLX": 160, "MCD" : 1400, "MDC" : 1600
    }
    self.verify(expectations)

def testYears(self):
    expectations = {
        "MDCCCXCIX" : 1899,  "MDCCCLXVII" : 1867, "MCMLXVII": 1967, "MMVIII": 2008, "MDCCCXII": 1812, "MDCCLXXVI" : 1776, "MCDXCII" : 1492
    }
    self.verify(expectations)

The final test demonstrated a subtle bug in our implementation of atomize(). We incorrectly initialize inc outside of the loop rather than at each iteration through the loop. This was quickly corrected and, after running our tests, we determined that we’d completed the behavioural requirements of the kata: our code implemented a Roman numeral parser.

As far as I was concerned, we weren’t yet finished the kata. The current implementation of atomize() didn’t really please me aesthetically. So, we continued into the next part of any programming session: refactoring for clarity, readability and aesthetics. Throughout this process, we used our existing tests to verify that we continued to meet all of the behavioural requirements.

Our first refactoring eliminated a branch from the conditional expression during the loop. This simplification makes the loop easier to understand.

def atomize(numeral):
  parts = []
  i = 0
  while i < len(numeral):
    part = numeral[i]
    inc = 1
    if (i + 1) < len(numeral) and can_precede(numeral[i], numeral[i + 1]):
        part += numeral[i + 1]
        inc = 2

    parts.append(part)
    i += inc

  return parts

Our second refactoring eliminated inc, an unnecessary variable and the i + 1 expressions were stored in a variable. Once again, reducing the amount of code in a procedure makes it easier to read.

def atomize(numeral):
  parts = []
  i = 0
  while i < len(numeral):
    part = numeral[i]
    next_i = (i + 1)
    if next_i < len(numeral) and can_precede(numeral[i], numeral[next_i]):
        part += numeral[next_i]

    parts.append(part)
    i += len(part)

 return parts

We spent a little more time looking at the code to see if there was any other refactoring remaining, but determined that this was the best we could do.

After a bit of discussion, we decided that this had been a pretty fun kata to perform. The problem was simple enough to finish in 60-70 minutes, but there was enough complexity that we’d taken a moment to step back from the keyboard an discuss how we might approach the problem. I’ll keep this kata in the repertoire of the dojo for future sessions; I think it would be an excellent one to pull out on evenings when new people attend the gathering.

No TrackBacks

TrackBack URL: http://blog.strangeware.ca/cgi-bin/movabletype/mt-tb.cgi/5

Leave a comment

About this Entry

This page contains a single entry by Don Kelly published on October 11, 2008 5:43 PM.

a container traversal kata was the previous entry in this blog.

a comment to "roman numerals and python" is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.