DL/ID Parser Library For Go

I’ve published an initial version of my US DL/ID barcode data parser library here:

This is a library for Go that can parse the data extracted from the PDF417 barcode on the back of a US driving licence into a Go struct. It doesn’t handle many of the encoded fields yet but it’s a start. I’ve grumbled about the multiple problems with the spec and its various implementations before.


More Barcode Disasters

Here’s a follow-up to my last rant about driving licence barcode standards that aren’t standard.


The barcode header is supposed to be “@\n\x1e\rANSI “. If the first nine characters of the decoded data equal this string, we’ve got a valid driving licence barcode. South Carolina’s driving licences have the file separator character (ASCII 0x1c) as the third byte instead of the record separator character (ASCII 0x1e) as defined by the standard.

ZIP Codes

US zip codes come in two formats. The first is the standard 5-digit code, such as “90210”. This was found to be insufficiently accurate, so a “+4” extension is often tagged on the end giving the format “90120+1234”.

In the first version of the DLID spec, the zip code field was 11 characters long. If the zip code didn’t fill the entire 11 characters, the extra places were padded with spaces.

Consider the file format. Each record in the file is split into two parts: an identifier, which is a 3-character header (“DAQ”, “DBC”, etc) that indicates what the data represents; and the data itself (“JOE”, “BLOGGS”, etc). Records are separated by line breaks. If the data has separators, why does the zip code field have a fixed width? Most (all?) of the other fields are variable width.

Assuming there’s a reason for the field to be fixed-width, you should be able to see immediately that the spec is still broken. If all zip codes are at most 10 characters long, why does the field allow for 11 characters? Even 10 characters is too long. If the field is 5 characters long, a parser can infer that it has no +4 extension. If it is 9 characters long a parser can infer that it has the extension and split it up accordingly.

Version 3 of the spec tried to rectify the situation. The field was shortened to 9 characters, but this time zeros were used as padding instead of spaces. The upshot is that every parser must extract the zip and +4 sections by dividing up based on expected data lengths (5 and 4 respectively) and then ditch the extension if it is equal to “0000”. Why not just make the field a variable width? Why not pad it with spaces that can be trimmed without potentially losing a trailing zero in the zip?

There is no documentation as to how to format the field in versions 1 and 2 of the DLID spec. Thus, Colorado just uses the 5-character zip. South Carolina uses both the zip and the extension and smooshes them together into one 9-character string padded with two spaces. Massachusetts includes both sections of the zip separated by a hyphen and pads with one space. Who knows what the other states do.

At version 3 of the spec, the standard embraced the Canadians. Canadians have a 6-character post code that looks just like the UK standard. There is no documentation anywhere as to which padding character is used when representing these post codes or their format, if indeed one is used, nor if the post codes should have their two 3-character sections divided by a space or not.


The 7 versions of the DLID spec include 3 ways of storing names. They started out with a single record that stored a comma-delimited list of names in the format “LAST,FIRST,MIDDLE,…”. Colorado, being unique and special, uses the format “FIRST,MIDDLE,…,LAST”.

Presumably to prevent this foolishness, the standards body changed this in the second version of the spec. This version included a standalone “last name” field and a field for other names in the format “FIRST,MIDDLE,…”. Actually, that’s not strictly true; the documented format is “FIRSTxMIDDLEx…”, where “x” is an undocumented separator. Wisconsin used a space whilst Virginia used a comma.

The fourth version finally seems to have fixed it. Names are divided into three fields: “first”, “last” and “middle”, where “middle” can contain multiple comma-separated names. Documentation at last!

Social Security Numbers

Version 1 of the spec optionally allowed states to include their drivers’ social security numbers on their licences. Careful with that licence, now…


Version 1 of the spec allowed gender to be expressed using 6 possible values: M, F, 0, 1, 2 and 9. “M” and “F” are self-explanatory. The others are pulled from the ANSI-D20 gender codes, in which the values mean “Unknown”, “Male”, “Female” and “Not specified” respectively. Obviously two ways of representing the same piece of data is better than one. Version 2 dumped all but values “1” and “2”. I imagine that the standards body figured that, if they were going to allow someone to be in control of a 26,000lb vehicle, they should take enough of an interest in the driver to know his or her gender.


Parsing US Driving Licence Barcodes

Most states in the US and some Canadian provinces include a PDF417 barcode on the back of their driving licences. The barcode contains a host of information about its owner, such as names, address, height, weight, eye colour, date of birth, etc. There are currently 7 different versions of the standard, which you can download here (click on the “Documentation” tab):

Unfortunately, the standards are full of breathtakingly stupid mistakes. Dates are currently my favourite.

This is the date format used in version 1:


That’s one of the ISO-8601 standards for representing a date.

In version 2 they switched to this:


That’s the standard US way of representing dates (I like to think of them as “lumpy” dates, because the format goes “large-small-large”, whereas ISO dates are big-endian). I have no idea why they did this. I presume they got a lot of complaints from Americans who were stumped by the unusual date format whilst decoding the PDF417 barcodes with nothing more sophisticated than their eyes. Any automated parser would naturally re-format the date into the local standard, so they must have been doing it manually. An impressive skill.

In version 3 the Canadians decided to get in on the barcode action. Canadians use the big-endian date format, so the spec now states that date fields can store the dates in one of two ways:


Any parsers need to check the licence’s country code before they can parse dates. Not only does this version of the spec introduce a new standard but it contains multiple standards within a single field.