Friday, April 23, 2010

OFX python

Part 1. Introduction:

According to Wikipedia:

The OFX standard was announced on the 16 January 1997 by Microsoft, Intuit and CheckFree and was designed as a unified technical specification to converge their respective mechanisms. The specification allows for bank and application specific extensions, although only a subset is necessary to describe a financial transaction. Since the current version is a flavour of XML, it is simple to create well-formed OFX documents.

Basically, OFX is the underlying protocol for Microsoft Money, Quicken, GnuCash, etc. that allows direct connection to financial institutions for the purpose of downloading transactions, statements, tax information, and other useful things. Having recently done my taxes in TurboTax for 2009, I noticed that it is not possible to log into your brokerage accounts and download tax documents automatically. This too is done through OFX. The best part about OFX is that it is open source and free! Well unless you're Chase Bank. But usually free! So we computer-savy people can write our own tools to take advantage of an open protocol. (standard available for free here )

My primary interest in the OFX protocol is for the purpose of budgeting. I have recently begun working, and I am saving money impulsively to invest primarily for retirement. My mom always had a budget when I was growing up, but it took her hours sometimes at the end of the month to enter each transaction. These days, you can download them instantly, and categorize most of them.

In any case, I use GnuCash for my budgeting. Although AqBanking is supposed to allow for OFX download, I have found it to be very buggy and essentially unusable. Hence my search for a better solution...

Part 2. The goals

For this project, my goal was to download all transactions from my various brokerage accounts and import them into Gnucash.

Phase 2 was optimizing the process so that I didn't have to put in the passwords over and over.

Phase 3: Profit.

Part 2a. Brief XML Description

OFX is an XML based protocol. Everything is defined in terms of tags. All information is contained inside the tags defined in the specification. Everything is split up into message requests and responses. The client initiates the conversation with an unencrypted OFX XML file sent over an SSL connection. There is an interesting system for encrypting the password inside the OFX file, but it's not necessary for me.

The first request is always a login request, as is the first response from the server. Subsequent requests are added on to the OFX file. The server responds to them in order and references what should be a unique transaction (request) number. The number is sent by the client for each request. That way responses match up to requests without ambiguity. Tags contain things like investment transactions (buy, sell, dividend), checking account debits and credits, and other things. This is the information we're interested in...

Part 3. Meet Python

Since this is the first real post, I have not yet revealed that I am an avid linux user at home. So everything is done in linux this time.

Python is my new friend. I happened across ofx.py and ofx-ba.py at http://www.jongsma.org/gc/ . I'll post the original file here to discuss.

#!/usr/bin/python import time, os, httplib, urllib2 import sys join = str.join sites = { "ucard": { "caps": [ "SIGNON", "CCSTMT" ], "fid": "24909", "fiorg": "Citigroup", "url": "https://secureofx2.bankhost.com/citi/cgi-forte/ofx_rt?servicename=ofx_rt&pagename=ofx", }, "discover": { "caps": [ "SIGNON", "CCSTMT" ], "fiorg": "Discover Financial Services", "fid": "7101", "url": "https://ofx.discovercard.com/", }, "ameritrade": { "caps": [ "SIGNON", "INVSTMT" ], "fiorg": "ameritrade.com", "url": "https://ofx.ameritrade.com/ofxproxy/ofx_proxy.dll", } } def _field(tag,value): return "<"+tag+">"+value def _tag(tag,*contents): return join("\r\n",["<"+tag+">"]+list(contents)+["</"+tag+">"]) def _date(): return time.strftime("%Y%m%d%H%M%S",time.localtime()) def _genuuid(): return os.popen("uuidgen").read().rstrip().upper() class OFXClient: """Encapsulate an ofx client, config is a dict containg configuration""" def __init__(self, config, user, password): self.password = password self.user = user self.config = config self.cookie = 3 config["user"] = user config["password"] = password if not config.has_key("appid"): config["appid"] = "PyOFX" config["appver"] = "0100" def _cookie(self): self.cookie += 1 return str(self.cookie) """Generate signon message""" def _signOn(self): config = self.config fidata = [ _field("ORG",config["fiorg"]) ] if config.has_key("fid"): fidata += [ _field("FID",config["fid"]) ] return _tag("SIGNONMSGSRQV1", _tag("SONRQ", _field("DTCLIENT",_date()), _field("USERID",config["user"]), _field("USERPASS",config["password"]), _field("LANGUAGE","ENG"), _tag("FI", *fidata), _field("APPID",config["appid"]), _field("APPVER",config["appver"]), )) def _acctreq(self, dtstart): req = _tag("ACCTINFORQ",_field("DTACCTUP",dtstart)) return self._message("SIGNUP","ACCTINFO",req) def _ccreq(self, acctid, dtstart): config=self.config req = _tag("CCSTMTRQ", _tag("CCACCTFROM",_field("ACCTID",acctid)), _tag("INCTRAN", _field("DTSTART",dtstart), _field("INCLUDE","Y"))) return self._message("CREDITCARD","CCSTMT",req) def _invstreq(self, brokerid, acctid, dtstart): dtnow = time.strftime("%Y%m%d%H%M%S",time.localtime()) req = _tag("INVSTMTRQ", _tag("INVACCTFROM", _field("BROKERID", brokerid), _field("ACCTID",acctid)), _tag("INCTRAN", _field("DTSTART",dtstart), _field("INCLUDE","Y")), _field("INCOO","Y"), _tag("INCPOS", _field("DTASOF", dtnow), _field("INCLUDE","Y")), _field("INCBAL","Y")) return self._message("INVSTMT","INVSTMT",req) def _message(self,msgType,trnType,request): config = self.config return _tag(msgType+"MSGSRQV1", _tag(trnType+"TRNRQ", _field("TRNUID",_genuuid()), _field("CLTCOOKIE",self._cookie()), request)) def _header(self): return join("\r\n",[ "OFXHEADER:100", "DATA:OFXSGML", "VERSION:102", "SECURITY:NONE", "ENCODING:USASCII", "CHARSET:1252", "COMPRESSION:NONE", "OLDFILEUID:NONE", "NEWFILEUID:"+_genuuid(), ""]) def ccQuery(self, acctid, dtstart): """CC Statement request""" return join("\r\n",[self._header(), _tag("OFX", self._signOn(), self._ccreq(acctid, dtstart))]) def acctQuery(self,dtstart): return join("\r\n",[self._header(), _tag("OFX", self._signOn(), self._acctreq(dtstart))]) def invstQuery(self, brokerid, acctid, dtstart): return join("\r\n",[self._header(), _tag("OFX", self._signOn(), self._invstreq(brokerid, acctid,dtstart))]) def doQuery(self,query,name): # N.B. urllib doesn't honor user Content-type, use urllib2 request = urllib2.Request(self.config["url"], query, { "Content-type": "application/x-ofx", "Accept": "*/*, application/x-ofx" }) if 1: f = urllib2.urlopen(request) response = f.read() f.close() f = file(name,"w") f.write(response) f.close() else: print request print self.config["url"], query # ... import getpass argv = sys.argv if __name__=="__main__": dtstart = time.strftime("%Y%m%d",time.localtime(time.time()-31*86400)) dtnow = time.strftime("%Y%m%d",time.localtime()) if len(argv) < 3: print "Usage:",sys.argv[0], "site user [account]" print "available sites:",join(", ",sites.keys()) sys.exit() passwd = getpass.getpass() client = OFXClient(sites[argv[1]], argv[2], passwd) if len(argv) < 4: query = client.acctQuery("19700101000000") client.doQuery(query, argv[1]+"_acct.ofx") else: if "CCSTMT" in sites[argv[1]]["caps"]: query = client.ccQuery(sys.argv[3], dtstart) elif "INVSTMT" in sites[argv[1]]["caps"]: query = client.invstQuery(sites[argv[1]]["fiorg"], sys.argv[3], dtstart) client.doQuery(query, argv[1]+dtnow+".ofx")
I'd like to draw your attention to a couple of parts. Lines 7-25 create a dictionary that contains all information necessary to connect to a financial institution. Of course, you have to supply the username, password, and account number. Those are not stored here. Lines 27-31 represent a mini xml implementation. This is some cool code. You can basically make whatever XML construct you want with it. Lines 56-71 are an example of generating an XML OFX request. All of the requests are basically done in the same way, so you can skip the rest unless you want to examine the standard implementation.

So I see several places to improve the functionality of the script.

First off, the date is hard coded to 1 month ago (i think). Would be nice to pass that on the command line

It prompts for a password every time, which gets old after a while. So I will need a way to store passwords.

Multiple accounts from the same FI will overwrite each other when you download.

Part 4. Improvement of the script

By the way, an already improved version (ofx-ba.py) is on that same site. That version adds bank statement functionality to the script.

The changes I've mentioned are really not that interesting to discuss except for storing the passwords. Basically, I need to store the passwords in a format that the program will be able to decrypt and interpret. I didn't want to do a whole lot of parsing of the file either. That gets tedious to code.

Luckily, Python provides pretty much everything we need here. The easiest way to store passwords and associate them with FI's is in a dictionary. We already use a dictionary to store the data for each FI in the beginning of the file.

Storing a Dictionary in a file

You can retrieve the declaration of a dictionary using the repr() function in Python. This returns a string that, if passed to eval(), should return exactly the same construct that we had before. So it seems like a pretty simple matter to construct the dictionary by hand in a Python prompt and write out its repr to a file, from whence it can be read and eval'ed later to yield the original dictionary.

Encrypting the file

So once we've created this dictionary and established how to store it reliably, we need to be able to encrypt it. I'm not wild about the idea of my passwords to all my accounts just sitting somewhere in a plaintext file. I don't know about you.

Encryption is in itself a really interesting subject. I took a pretty good course on it my last semester at Tech. It's really not the complicated to understand, but man it is difficult to develop in the first place. One of the best encryptions out there is AES. It's a replacement for the older DES, and it is much stronger.

PyCrypto is an excellent implementation of a handful of encryption protocols that is not included in the base Python library. It's very easy to use, and it has good documentation. It will have to be compiled from source, but if you're used to linux, you've probably compiled more programs than you'd care to think about anyway.

So download PyCrypto and install it. The correct import line is "from Crypto.Cipher import AES" . From there, you just create the encryption/decryption object with a password that is a multiple of 16 characters long. I personally used string.ljust(16) to pad my password to 16 chars.

Finally we encrypt the repr() of the dict and write it out to a file. Voila! Encrypted passwords are stored for later recovery.

Part 5. Trading one password for another!

Ok not quite voila yet. We've succeeded in storing all the passwords in an encrypted file, but now we have to decrypt the file every time we run the program! So far we've really accomplished nothing as far as time savings goes. I created a bash script that simply called the program multiple times, once for each account. So every time it prompts me to decrypt the password file. This is unacceptable!

My solution was to use an environment variable to pass the password on to each run of the ofx-ba.py script. So we need to modify the script to check for a specific environment variable, $PASS, before prompting the user for a password. If $PASS is present, then we use that password and hope for the best. If the decryption fails, then the program exits anyway. So no harm done.

Environment Variables

Python to the rescue once again! The os module allows us to examine the environment variables, which are pulled into a convenient dictionary form. So we can just check the dictionary for the $PASS variable

if "PASS" in os.environ: print "Using environmental variable $PASS to decrypt passwords file" passwd = os.environ["PASS"]
Getting the Password in bash

So the last thing that needs to be done is to get the password in bash to put in the $PASS variable. A little googling turned up the following

if [ -e passwords ] then read -s -p "Enter password for decrypting stored passwords: " PASS fi

"passwords" is the name of the encrypted file. Then we can call the Python script with the desired env variable set.

if [ $PASS ] then echo "Schwab IRA..." PASS=$PASS ./ofx-ba.py schwab_brokerage username 12345 $Start

$Start is the starting date that is calculated elsewhere. When the script is run, i store the date to a file called "last_run". That way, I can always get only the transactions since my last update.

Part 6. Conclusion

Armed with the Python script and my bash script, I am now capable of running a single command to download all the transactions from all my various financial institutions. And now you are capable of the same feat. I will try to figure out how to attach a file to this post. I don't see a way right now.

So a nice short little project, though not a short write-up. I tried to give an overall view of what I tried to accomplish, some background about OFX, and some resources. I spent most of the time on my own additions to the script. I have emailed the webmaster that hosts that ofx.py site, but I haven't received a response yet.

I use the generic OFX importer in GnuCash to import each file individually. Although it takes a little bit of time to import the ofx files individually, it's not too big a deal. It's definitely better than entering all the transactions by hand. Dividends, stock splits, automatic investments. I shudder to think how complicated it would be to do by hand. Thank you Microsoft and others for developing OFX to ease our burdens. Thanks to all the other developers too, PyCrypto, GnuCash, ofx.py and ofx-ba.py. I love open source software, and I'm trying to do my own little part on this blog to further the cause.

Here's my final script...

#!/usr/bin/python import time, os, httplib, urllib2 import sys join = str.join sites = { "MYCreditUnion": { "caps": [ "SIGNON", "BASTMT" ], "fid": "31337", # ^- this is what i added, for checking/savings/debit accounts- think "bank statement" "fiorg": "MyCreditUnion", "url": "https://ofx.mycreditunion.org", "bankid": "21325412453", # bank routing # }, "vanguard": { "caps": [ "SIGNON", "INVSTMT"], "fiorg": "vanguard.com", "url" : "https://vesnc.vanguard.com/us/OfxDirectConnectServlet", }, "schwab_brokerage": { "caps": [ "SIGNON", "INVSTMT"], "fiorg": "SCHWAB>COM", "url": "https://ofx.schwab.com/cgi_dev/ofx_server", }, "schwab_bank": { "caps": [ "SIGNON", "BASTMT"], "fid" : "101", "fiorg": "ISC", "url": "https://ofx.schwab.com/bankcgi_dev/ofx_server", "bankid" : "121202211", }, "edward_jones" : { "caps" : [ "SIGNON", "INVSTMT"], "fiorg" : "www.edwardjones.com", "url" : "https://ofx.edwardjones.com", } } def _field(tag,value): return "<"+tag+">"+value def _tag(tag,*contents): return join("\r\n",["<"+tag+">"]+list(contents)+["</"+tag+">"]) def _date(): return time.strftime("%Y%m%d%H%M%S",time.localtime()) def _genuuid(): return os.popen("uuidgen").read().rstrip().upper() class OFXClient: """Encapsulate an ofx client, config is a dict containg configuration""" def __init__(self, config, user, password): self.password = password self.user = user self.config = config self.cookie = 3 config["user"] = user config["password"] = password if not config.has_key("appid"): config["appid"] = "QWIN" # i've had to fake Quicken to actually get my unwilling test server to talk to me config["appver"] = "1200" def _cookie(self): self.cookie += 1 return str(self.cookie) """Generate signon message""" def _signOn(self): config = self.config fidata = [ _field("ORG",config["fiorg"]) ] if config.has_key("fid"): fidata += [ _field("FID",config["fid"]) ] return _tag("SIGNONMSGSRQV1", _tag("SONRQ", _field("DTCLIENT",_date()), _field("USERID",config["user"]), _field("USERPASS",config["password"]), _field("LANGUAGE","ENG"), _tag("FI", *fidata), _field("APPID",config["appid"]), _field("APPVER",config["appver"]), )) def _acctreq(self, dtstart): req = _tag("ACCTINFORQ",_field("DTACCTUP",dtstart)) return self._message("SIGNUP","ACCTINFO",req) # this is from _ccreq below and reading page 176 of the latest OFX doc. def _bareq(self, acctid, dtstart, accttype): config=self.config req = _tag("STMTRQ", _tag("BANKACCTFROM", _field("BANKID",sites [argv[1]] ["bankid"]), _field("ACCTID",acctid), _field("ACCTTYPE",accttype)), _tag("INCTRAN", _field("DTSTART",dtstart), _field("INCLUDE","Y"))) return self._message("BANK","STMT",req) def _ccreq(self, acctid, dtstart): config=self.config req = _tag("CCSTMTRQ", _tag("CCACCTFROM",_field("ACCTID",acctid)), _tag("INCTRAN", _field("DTSTART",dtstart), _field("INCLUDE","Y"))) return self._message("CREDITCARD","CCSTMT",req) def _invstreq(self, brokerid, acctid, dtstart): dtnow = time.strftime("%Y%m%d%H%M%S",time.localtime()) req = _tag("INVSTMTRQ", _tag("INVACCTFROM", _field("BROKERID", brokerid), _field("ACCTID",acctid)), _tag("INCTRAN", _field("DTSTART",dtstart), _field("INCLUDE","Y")), _field("INCOO","Y"), _tag("INCPOS", _field("DTASOF", dtnow), _field("INCLUDE","Y")), _field("INCBAL","Y")) return self._message("INVSTMT","INVSTMT",req) def _message(self,msgType,trnType,request): config = self.config return _tag(msgType+"MSGSRQV1", _tag(trnType+"TRNRQ", _field("TRNUID",_genuuid()), _field("CLTCOOKIE",self._cookie()), request)) def _header(self): return join("\r\n",[ "OFXHEADER:100", "DATA:OFXSGML", "VERSION:102", "SECURITY:NONE", "ENCODING:USASCII", "CHARSET:1252", "COMPRESSION:NONE", "OLDFILEUID:NONE", "NEWFILEUID:"+_genuuid(), ""]) def baQuery(self, acctid, dtstart, accttype): """Bank account statement request""" return join("\r\n",[self._header(), _tag("OFX", self._signOn(), self._bareq(acctid, dtstart, accttype))]) def ccQuery(self, acctid, dtstart): """CC Statement request""" return join("\r\n",[self._header(), _tag("OFX", self._signOn(), self._ccreq(acctid, dtstart))]) def acctQuery(self,dtstart): return join("\r\n",[self._header(), _tag("OFX", self._signOn(), self._acctreq(dtstart))]) def invstQuery(self, brokerid, acctid, dtstart): return join("\r\n",[self._header(), _tag("OFX", self._signOn(), self._invstreq(brokerid, acctid,dtstart))]) def doQuery(self,query,name): # N.B. urllib doesn't honor user Content-type, use urllib2 request = urllib2.Request(self.config["url"], query, { "Content-type": "application/x-ofx", "Accept": "*/*, application/x-ofx" }) if 1: f = urllib2.urlopen(request) response = f.read() f.close() f = file(name,"w") f.write(response) f.close() else: print request print self.config["url"], query # ... import getpass from Crypto.Cipher import AES argv = sys.argv if __name__=="__main__": # check for correct incantation if len(argv) < 3: print "Usage:",sys.argv[0], "site user [account] [dtstart] [CHECKING/SAVINGS/.. if using BASTMT]" print "available sites:",join(", ",sites.keys()) sys.exit() passwd = "" # get encrypted passwords from file passwords try: f = open("passwords", "r") ciphertext = f.read() f.close() # handle any errors opening password file except IOError: print "Stored passwords file not found! Please enter password for ", argv[1] passwd = getpass.getpass() #for param in os.environ.keys(): # print "%20s %s" % (param,os.environ[param]) if passwd == "": # get a password to decrypt the file if "PASS" in os.environ: print "Using environmental variable $PASS to decrypt passwords file" passwd = os.environ["PASS"] else: print "Stored password file found! Enter password to decrypt stored password file" passwd = getpass.getpass() # set up decryption obj = AES.new(passwd.ljust(16),AES.MODE_ECB) dictstr = obj.decrypt(ciphertext) # convert the stored dict definition string to an actual dict try: passdict = eval(dictstr) except SyntaxError: print "Invalid Password! Password file not decrypted successfully!" sys.exit() # do another sanity check if passdict["decryption"] != "OK" : print "Invalid Password! Password file not decrypted successfully!" sys.exit() # look for stored password for desired account if not argv[1] in passdict : print "Password for ", argv[1], " not found in stored password directory!" passwd = getpass.getpass() else: passwd = passdict[argv[1]] # construct time range #dtstart = time.strftime("%Y%m%d",time.localtime(time.time()-31*86400)) dtstart = "19700101000000" dtnow = time.strftime("%Y%m%d",time.localtime()) # set up OFXClient class client = OFXClient(sites[argv[1]], argv[2], passwd) if len(argv) < 4: # no account number passed. get accounts from FI query = client.acctQuery(dtstart) client.doQuery(query, argv[1]+"_acct.ofx") else: # start date passed as 5th argument if len(argv) > 4: dtstart = sys.argv[4] if "CCSTMT" in sites[argv[1]]["caps"]: query = client.ccQuery(sys.argv[3], dtstart) elif "INVSTMT" in sites[argv[1]]["caps"]: query = client.invstQuery(sites[argv[1]]["fiorg"], sys.argv[3], dtstart) elif "BASTMT" in sites[argv[1]]["caps"]: if len(argv) < 6 : print "Bank Account Type not Specified for statement download!" sys.exit() query = client.baQuery(sys.argv[3], dtstart, argv[5]) client.doQuery(query, argv[1]+argv[3]+"-"+dtnow+".ofx") # Code to generate passwords file #encpass = getpass.getpass() #obj = AES.new(encpass.ljust(16), AES.MODE_ECB) ##construct password dict #passdict = { "institution" : "password", "institution" : "password", "decryption" : "OK" } #dictdef = repr(passdict) #ciphertext = obj.encrypt( dictdef.ljust( (len(dictdef)/16+1)*16 ) ) #f = open("passwords", "w") #f.write(ciphertext) #f.close()

Thanks for reading,

Bill