Saturday, December 09, 2006

The magic of Python

One of my customers sells novelty toys. Cool gizmos to play with. They advertise on infomercials which are done by as third party. Buyers call the third party to place orders, and the orders are later sent to my customer in a comma separated values (csv) file. These orders then need to be loaded into their accounting system and sent to the credit card processing gateway; however, the csv file from the call center is not in a format that matches the csv file needed by the accounting system or credit card gateway.

Since Microsoft Excel will read and write csv files, I wrote a fairly large macro to read the file from the call center and rearrange the values into the formats the other systems wanted. I was pretty happy, until I noticed two items where Excel "helped" me.

  1. Credit card expiration dates are recorded as 03/09 meaning end of March, 2009. Excel sees that and says, "That looks like a date, but not quite all the digits are there." So it guesses that you are really talking about 03/09/2006 (this year), so now when I say output the date as MM/YY or 03/06, the credit card is already expired. I can correct this by telling Excel to out put the date as MM/DD. A hack to be sure, but it works until 2032.

  2. When Excel encounters numbers over 9,999,999,999 it decides to represent them in scientific notation. These numbers can be reformatted to display correctly but after 15 digits, Excel starts dropping digits off. So lets pretend your credit card number is 16 digits long (most are) like 1234 5678 9012 3456. Excel reads it as 1.234567 E+15, when I reformat it, the credit card number becomes 1234567890123450, losing the last digit -- a check digit by the way.
I'd like it if Microsoft gave me a don't-help-me-mode. Which they do, it is called Visual Studio. But someplace in between do-it-all and mess-the-data-up-right-away would be nice. What I'd really like is to have Excel read these values as sequences of characters instead of numbers. In fact, if I surround any value with =" value ", no matter what is in there, Excel treats it as a string when it reads it with no second guessing, and no "help."

Once the file was open by Excel it was too late. So I needed to find a way to put those =" " characters into the csv beforehand. I considered writing a small .net program, but then I thought about Python and did a quick google search.

From that search I ended with this.
# -*- coding: cp1252 -*-
import sys
import csv

args = sys.argv
inputCsv = args[1] + '.csv'
outputCsv = args[1] + '-post.csv'

reader = csv.reader(open(inputCsv, "rb"))
writer = csv.writer(open(outputCsv, "wb"))
for row in reader:
row[20] = '="' + row[20] + '"'
writer.writerow(row)

Which comes down to:
Open the input csv, and create an output csv, for each row in the file, wrap the 20 item with =" value ". That was easy. I should have written the whole thing in Python.