2011-02-17

The BitBucket API, Python and urllib

If you’ve ever tried googling for anything related to Python 3, you’ll have encountered difficulties. This is because “Python 3” is also known as both “Python 3000” and “Py3k”. Search for just one name and you’ll miss one third of the potential results. This is somewhat contrary to the Python philisophy:

There should be one-- and preferably only one --obvious way to do it.

We’re off to a good start already.

As discussed recently, BitBucket has an incomplete but promising REST API that allows for the automation of things like repository creation and issue tracking. I’ve written a messy C# program that takes advantage of some of the repository features, but wanted to try writing a complete API library in Python 3.

Python is currently having problems. Zed Shaw over at sheddingbikes.com has complained about the unwillingness of operating system providers to upgrade from ancient versions of Python. I personally can’t quite see why so much of Python 3 has been backported to Python 2, when encouraging people to upgrade where they need to by offering new features would be far more beneficial for the new version of the language.

I’ve been using Python’s urllib to access the BitBucket API. The BitBucket API requires the use of basic authentication in order to get access to the full feature set. Unfortunately, the “obvious way to do it” doesn’t work.

This StackOverflow question gives the “canonical” way of achieving basic authentication using urllib. I’ve ported it from Python 2 to Python 3:


import urllib.request

username = "username"
password = "password"
url = "https://api.bitbucket.org/1.0/users/ant512/"

passman = urllib.request.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, username, password)

authhandler = urllib.request.HTTPBasicAuthHandler(passman)
opener = urllib.request.build_opener(authhandler)

urllib.request.install_opener(opener)

pagehandle = urllib.request.urlopen(theurl)
print(pagehandle.read())

Looks great! Doesn’t work. The problem is that urllib expects this sequence of events to happen:

  • Client issues a request
  • Server issues a 401 Not Authenticated error
  • Client re-issues the request with authentication
  • Server sends data with a 200 response.

BitBucket doesn’t work like that, though. BitBucket does this if no credentials are supplied:

  • Client issues a request
  • Server sends data with a 200 response.

The data returned in this case is a list of public repositories.

When the initial request is sent with credentials, BitBucket does this:

  • Client issues a request with authentication
  • Server sends data with a 200 response.

In this case, the data returned is a list of public and private repositories to which the authenticated user has access. For this situation, urllib doesn’t work.

What we actually need to do is this:


import urllib.request
import base64

username = "username"
password = "password"
url = "https://api.bitbucket.org/1.0/users/ant512/"

credentials = base64.b64encode("{0}:{1}".format(username, password).encode()).decode("ascii")
headers = {'Authorization': "Basic " + credentials}
request = urllib.request.Request(url=url, headers=headers)

connection = urllib.request.urlopen(request)
content = connection.read()

Of course! Why didn’t I see it in the first place? All I have to do is ignore the library’s API and instead encode a username/password tuple into base 64, decode it back into ASCII, manually inject it into the request header, and then use urllib. Obvious. Yes.

At least we can now retrieve a list of a user’s repositories. Let’s make things a little more interesting and update an existing issue in the BitBucket issue tracker.


import urllib.request
import base64

username = "username"
password = "password"
url = "https://api.bitbucket.org/1.0/issues/ant512/woopsi/issues/10/"

credentials = base64.b64encode("{0}:{1}".format(username, password).encode()).decode("ascii")
headers = {'Authorization': "Basic " + credentials}
data = {"title": "Issue title", "content": "Issue content"}
request = urllib.request.Request(url=url, headers=headers, data=data)

connection = urllib.request.urlopen(request)
content = connection.read()

This won’t work. A REST API requires the use of 4 HTTP verbs:

  • GET
  • POST
  • PUT
  • DELETE

Out of the box, urllib only gives us GET and POST. It automatically determines whether to GET or POST depending on whether or not a request has a data payload. The urllib thus reveals itself to be an abstraction so leaky it should be on the bottom of the sea by now. It also explains why so many other people have written their own replacements for it.

What we need to do instead is subclass the Request object so that it is possible to specify which HTTP verb to use:


class RESTRequest(urllib.request.Request):
    __method = None

    def __init__(self, url, data=None, headers={},
                 origin_req_host=None, unverifiable=False, method=None):
        super().__init__(url, data, headers, origin_req_host, unverifiable)
        self.__method = method

    def get_method(self):

        # Uses the standard method choosing logic if no other method has been
        # explicitly set.

        if self.__method is None:
            return super().get_method()
        else:
            return self.__method

We can now alter our code so that we update issues in the issue tracker:


import urllib.request
import urllib.parse
import base64

username = "username"
password = "password"
url = "https://api.bitbucket.org/1.0/issues/ant512/woopsi/issues/10/"

credentials = base64.b64encode("{0}:{1}".format(username, password).encode()).decode("ascii")
headers = {'Authorization': "Basic " + credentials}
data = urllib.parse.urlencode({"title": "Issue title", "content": "Issue content"}).encode()
request = RESTRequest(url=url, headers=headers, data=data, method="PUT")

connection = urllib.request.urlopen(request)
content = connection.read()

Note that the data we send has to be in the format of a set of URL-encoded tuples.

Here’s one I haven’t been able to solve yet. Suppose we want to give bob@example.com write access to one of our repositories:


import urllib.request
import urllib.parse
import base64

username = "username"
password = "password"
url = "https://api.bitbucket.org/1.0/privileges/ant512/woopsi/bob@example.com"

credentials = base64.b64encode("{0}:{1}".format(username, password).encode()).decode("ascii")
headers = {'Authorization': "Basic " + credentials}
data = urllib.parse.urlencode({"data": "write"}).encode()
request = RESTRequest(url=url, headers=headers, data=data, method="PUT")

connection = urllib.request.urlopen(request)
content = connection.read()

The data we’re sending over consists of the following tuple:


{"data": "write"}

The BitBucket API says that the only data that should be sent as part of the request should be the words “read”, “write” or “admin”, depending on the level of access that should be given. We shouldn’t be sending a tuple at all. As expected, running this gives us a 400 “bad request” error. The problem we have is that the urllib will only allow us to send tuples. We can’t send a single word on its own. If we try it, Python throws a ValueError. The leaky abstraction comes back to bite us again.

Despite the obstacles put in the way, I’ve managed to get almost all of BitBucket’s API wrapped up in a Python 3 library:

The library probably doesn’t adhere amazingly well to Python’s coding standards (though there really is one— and only one —obvious way to perform any action), but it does provide access to almost all of the functionality offered by BitBucket’s current API. The only exceptions are:

  • The grantPrivileges() method does not work (broken due to the limitations of urllib, but there might be a way to work around it)
  • The updateRepository() method currently doesn’t offer much of the possible functionality
  • The createWiki() method does not work (this could either be a bug in BitBucket’s API, a limitation of urllib or a bug in the library itself)

All data is returned as JSON objects. It is parsed from strings using the json library. The library can use basic authentication if a username/password combination is supplied. If not, it will interact with the server without authenticating where possible.

Comments

Jesper Noehr on 2011-05-16 at 08:57 said:

Well, we do this, as we also allow anonymous access to most resources. Thus, we can’t meet the client with a 401.

A better choice here would be to use OAuth.

Ant on 2011-05-16 at 09:20 said:

I don’t think there are any problems with the design of BitBucket’s API (though it would be nice if I could send a name/value pair to grant permissions). It’s Python 3’s urllib that’s the problem – it’s a leaky abstraction intended to solve a very narrow set of problems. Trying to make it interact with a REST API pushes it almost to breaking point.

New comments are not allowed on this post.