2013-01-06

DL/ID Parser Library For Go

I’ve published an initial version of my US DL/ID barcode data parser library here:

This is a library for Go that can parse the data extracted from the PDF417 barcode on the back of a US driving licence into a Go struct. It doesn’t handle many of the encoded fields yet but it’s a start. I’ve grumbled about the multiple problems with the spec and its various implementations before.

2012-12-04

More Barcode Disasters

Here’s a follow-up to my last rant about driving licence barcode standards that aren’t standard.

Header

The barcode header is supposed to be “@\n\x1e\rANSI “. If the first nine characters of the decoded data equal this string, we’ve got a valid driving licence barcode. South Carolina’s driving licences have the file separator character (ASCII 0x1c) as the third byte instead of the record separator character (ASCII 0x1e) as defined by the standard.

ZIP Codes

US zip codes come in two formats. The first is the standard 5-digit code, such as “90210”. This was found to be insufficiently accurate, so a “+4” extension is often tagged on the end giving the format “90120+1234”.

In the first version of the DLID spec, the zip code field was 11 characters long. If the zip code didn’t fill the entire 11 characters, the extra places were padded with spaces.

Consider the file format. Each record in the file is split into two parts: an identifier, which is a 3-character header (“DAQ”, “DBC”, etc) that indicates what the data represents; and the data itself (“JOE”, “BLOGGS”, etc). Records are separated by line breaks. If the data has separators, why does the zip code field have a fixed width? Most (all?) of the other fields are variable width.

Assuming there’s a reason for the field to be fixed-width, you should be able to see immediately that the spec is still broken. If all zip codes are at most 10 characters long, why does the field allow for 11 characters? Even 10 characters is too long. If the field is 5 characters long, a parser can infer that it has no +4 extension. If it is 9 characters long a parser can infer that it has the extension and split it up accordingly.

Version 3 of the spec tried to rectify the situation. The field was shortened to 9 characters, but this time zeros were used as padding instead of spaces. The upshot is that every parser must extract the zip and +4 sections by dividing up based on expected data lengths (5 and 4 respectively) and then ditch the extension if it is equal to “0000”. Why not just make the field a variable width? Why not pad it with spaces that can be trimmed without potentially losing a trailing zero in the zip?

There is no documentation as to how to format the field in versions 1 and 2 of the DLID spec. Thus, Colorado just uses the 5-character zip. South Carolina uses both the zip and the extension and smooshes them together into one 9-character string padded with two spaces. Massachusetts includes both sections of the zip separated by a hyphen and pads with one space. Who knows what the other states do.

At version 3 of the spec, the standard embraced the Canadians. Canadians have a 6-character post code that looks just like the UK standard. There is no documentation anywhere as to which padding character is used when representing these post codes or their format, if indeed one is used, nor if the post codes should have their two 3-character sections divided by a space or not.

Names

The 7 versions of the DLID spec include 3 ways of storing names. They started out with a single record that stored a comma-delimited list of names in the format “LAST,FIRST,MIDDLE,…”. Colorado, being unique and special, uses the format “FIRST,MIDDLE,…,LAST”.

Presumably to prevent this foolishness, the standards body changed this in the second version of the spec. This version included a standalone “last name” field and a field for other names in the format “FIRST,MIDDLE,…”. Actually, that’s not strictly true; the documented format is “FIRSTxMIDDLEx…”, where “x” is an undocumented separator. Wisconsin used a space whilst Virginia used a comma.

The fourth version finally seems to have fixed it. Names are divided into three fields: “first”, “last” and “middle”, where “middle” can contain multiple comma-separated names. Documentation at last!

Social Security Numbers

Version 1 of the spec optionally allowed states to include their drivers’ social security numbers on their licences. Careful with that licence, now…

Gender

Version 1 of the spec allowed gender to be expressed using 6 possible values: M, F, 0, 1, 2 and 9. “M” and “F” are self-explanatory. The others are pulled from the ANSI-D20 gender codes, in which the values mean “Unknown”, “Male”, “Female” and “Not specified” respectively. Obviously two ways of representing the same piece of data is better than one. Version 2 dumped all but values “1” and “2”. I imagine that the standards body figured that, if they were going to allow someone to be in control of a 26,000lb vehicle, they should take enough of an interest in the driver to know his or her gender.

2012-11-30

Parsing US Driving Licence Barcodes

Most states in the US and some Canadian provinces include a PDF417 barcode on the back of their driving licences. The barcode contains a host of information about its owner, such as names, address, height, weight, eye colour, date of birth, etc. There are currently 7 different versions of the standard, which you can download here (click on the “Documentation” tab):

Unfortunately, the standards are full of breathtakingly stupid mistakes. Dates are currently my favourite.

This is the date format used in version 1:

yyyymmdd

That’s one of the ISO-8601 standards for representing a date.

In version 2 they switched to this:

mmddyyyy

That’s the standard US way of representing dates (I like to think of them as “lumpy” dates, because the format goes “large-small-large”, whereas ISO dates are big-endian). I have no idea why they did this. I presume they got a lot of complaints from Americans who were stumped by the unusual date format whilst decoding the PDF417 barcodes with nothing more sophisticated than their eyes. Any automated parser would naturally re-format the date into the local standard, so they must have been doing it manually. An impressive skill.

In version 3 the Canadians decided to get in on the barcode action. Canadians use the big-endian date format, so the spec now states that date fields can store the dates in one of two ways:

yyyymmdd
mmddyyyy

Any parsers need to check the licence’s country code before they can parse dates. Not only does this version of the spec introduce a new standard but it contains multiple standards within a single field.

Wow.

2012-11-08

Barcodes in iOS

Recently I’ve been investigating methods for getting iOS apps to read one and two dimensional barcodes. There’s a huge variety of formats, but the three I’m mainly interested in are UPCA and EAN-13 (linear barcodes you’ll find on boxes in shops) and PDF417 (matrix barcodes you’ll find on some US driving licences).

There appear to be three main competitors for your attention if you’re trying to read two-dimensional barcodes in an iOS app:

ZXing is an open-source offering that supports a smorgasbord of formats for a variety of platforms and languages, including iOS, Android, Java, C#, ActionScript and others. Unfortunately, its PDF417 support is listed as “alpha quality” and the iOS port only supports QR codes (another matrix format). The iOS port hasn’t been updated in about 3 years, so ZXing is out.

Manatee Works is a closed-source library that claims to be smaller and more efficient than ZXing. It supports Windows, Android and iOS, and supports all of the bar codes I’m interested in. Looks good! How much does it cost? And can I see some code so I get an idea of how well the API was put together? No. No I can’t.

Manatee Works appears to be one of those companies that believes ardently that keeping its prices secret will encourage people to contact its sales team so they can presumably engage in the hard sell. Either that, or it’s like one of those restaurants that don’t put prices on the menus because, if you have to ask how much it costs, you probably can’t afford it. In any case, their product does indeed cost more than I’d pay.

Lastly, there’s SD-Toolkit. Again, they support multiple platforms: Windows, Android, iOS, Mac OSX and Linux, and again it supports all of the barcodes I need. Their relatively reasonable prices are published on their website and they even provide a trial SDK. My only issue with SD-Toolkit was that their trial SDK doesn’t yet support the armv7s architecture used in the iPhone 5.

If you’re trying to include a barcode reader in your iOS app, these are your options:

  • An extremely limited open-source effort that’s been abandoned;
  • An extremely expensive closed-source library that you can’t see until you talk to the sales team;
  • A more reasonably-priced closed-source library, with a trial SDK, that’s a little behind the times.

At this point I came up with an alternative solution: don’t try to read barcodes on an iOS device at all. Instead:

  • Grab a photo of the barcode using the phone;
  • Rotate, scale, crop and compress the image down to ~40K;
  • Post the image to a web server running a barcode reader SDK behind a RESTful web service;
  • Perform all of the barcode parsing on the web server;
  • Return the parsed data as JSON objects to the iOS device.

The downsides to this are obvious. Barcode parsing will only work on an iOS device that has an internet connection, and parsing times will include data transmission time. However, the advantages are compelling. The average web server is so much faster than an iPhone that the time taken to transmit in both directions and decode via a web service appears to be no longer than the time to parse directly on an iPhone. ZXing becomes an option again if you’re a Java shop. If you can’t use ZXing, you should be able to find an SDK for your prefered language for - at most - the cost of Manatee Works’ library, but it will work on all devices. Yep, a web service will work with Android, iOS, Windows, Linux, BSD or anything else with an internet connection.

That’s the option I’ve plumped for. It’s working nicely so far. The most troublesome part was rotating, scaling and cropping the images from the iOS camera correctly on all iOS devices.

One final note: Don’t bother trying to read barcodes on an iOS device that doesn’t have an autofocusing camera. It just doesn’t work reliably. Stick to the iPhone 3GS+, iPod Touch 5g+ or the iPad 3+.