Archive for the 'Programming' Category

Is SQLAlchemy Ready For Production?

SQLAlchemyI have built my most recent project using SQLAlchemy (SQLA) 0.4. It is a reasonably young library, so I thought others considering using SQLAlchemy might benefit from my experiences.

The most recent version is 0.4, so it obviously does not claim to be ‘finished’ (can software ever be?) However, there are plenty of benefits in using it already:

  • Ridding your beautiful Python code of any SQL. It’s untidy and pain to work with (which means bugs).
  • ORM (Object Relational Model) to fit in with a nice OO architecture.
  • Free sharding to split your database horizontally.
  • DB independant code, so you can switch if you feel the need.
  • Hassle free transactions and DB connection management.

Sounds great doesn’t it? In general this is exactly what you get. However, there are a few points which are very important to consider before committing yourself to using SQLA.

Forcing Indices

Anyone who has had to do something reasonably involved in MySQL has spent time optimising their indices. Sadly, MySQL is not very clever about how it chooses the index to use for a particular query. In some cases the only option is to force it with FORCE.

This is not available in SQLA, so anytime you need to do this, you’ll have to manually specify the SQL. Not only does this partly defeat the point of SQLA, but it also happens that the place you most need the SQL abstraction is often the same place you need to force an index. Consider a complicated search page. You are searching over a large data set using a number of filters. MySQL is probably going to get the index wrong and you’re going to have to generate a complicated SQL query without SQLAlchemy’s help.

Commit ORM Objects

SQLAlchemy can be a little silly about detecting a change in an ORM object. If you assign to a member variable which is part of the schema, it will be marked as dirty even if the value doesn’t actually change.

The solution is to check for a change before assigning, which does not make for neat code!

Sharding

It’s a good idea to plan for scaling as early as possible. It can be very difficult to build the necessary bits in later when you need it (and will also under pressure to fix things fast). One common way to deal with scale is to split large tables over several machines, this is known as sharding (each independant chunk of table is known as a shard).

SQLAlchemy has some code to help you there. All you need to do is to define 3 functions which tell SQLA which shard a particular row is in. Really simple. Sadly, this part of the code is not very mature at all:

  • query.count() doesn’t work (nor any scalar query). I had to write a function to query each shard in turn and sum the result. The real problem here is that it was not clear at all which bits of functionality will or won’t work with sharding (expect long debugging sessions, digging into the SQLA code).
  • The ORM caches objects and identifies them by their primary key. However, a common MySQL trick when sharding is to have an auto_incremented INT as the primary key of each shard but use something like a UUID as the ‘real’ primary key recognised by the code (this speeds things up quite a bit). Of course, the auto_incremented key will not be unique across shards and this will confuse SQLAlchemy. I think the best solution here (suggested by someone on the SQLAlchemy group) is to have a 2 column primary key with an INT and another integer shard identifier, making it unique.

No Server Side Cursors

A minor point, but might be important for some. Server side cursors come in handy sometimes when you are dealing with large amounts of data.

Tips For Optimising

  • Periodically check the SQL queries being made with the echo option. There might be some surprises in there (though usually easily fixed). This kind of thing usually pops up because you will use the ORM instance and forget / not realise that it will result in a query (from a software design point of view, great. From an optimisation point of view, awful).
  • Use set_shard on a query whenever you are able. If you know which shard the column you want is, no need to go checking the other ones. A common example is when the shard identitifier is in one of the query parameters.
  • Design for scale right from the beginning and develop / test on a distributed architecture (i.e. have at least 2 shards). This doesn’t need to be difficult, for example: just create two databases on your developement box to simulate two machines.

Conclusion

I still think that SQLAlchemy is worth using. Overall it will save time and effort as long as you are careful and not afraid to get your hands dirty when the going gets tough. I expect most of these problems will be addressed in (near) future releases.

C# BHO Tutorial

ie7I get a fair bit of traffic from people looking for help with C# and BHOs (my event handling post). There isn’t much information about and the only beginners tutorial went missing a few months back. A new one has appeared, anyone wanting to get going should check it out.

Vim: The Word Processor

Vim LogoI love Vim. It took me a while to get there, but I've been using it exclusively for coding for a couple of years now and it has become second nature. I first learnt to use it during a practical exercise for an operating systems course at uni. We had to write a Minix driver so all the work had to be done on the command line. A powerful text editor was a must. I think I used Vi (rather than Emacs) because it was available and I had been told it was great. It was a steep learning curve, but I got a hang of the basics after a few days.

It was a couple of years before I started to use it again. In the meantime I'd mostly been working with .NET and Java so I'd been using some pretty decent development environments and it didn't seem necessary to use anything else (especially considering the power of their debuggers). But I moved into the world of the web and started writing PHP and then Python, these did not really have especially good IDEs and so it was back to a text editor and my choice was Vim. It also coincided with worsening RSI, for which Vim is great.

I tend to learn a couple of features in a burst every few months when something really bugs me. This is probably not ideal, but Vim is so powerful I will never learn it all. I've been keeping a to do list in a text file recently (rather than on paper - go planet!) but Vim's defaults are not great for editing prose:

  • Vim's word wrap is by character, not word.
  • k and j (up and down) work on a line basis. If you have a wrapped line, you cannot move inside it with j and k (like you would with a normal text editor).

Of course, Vim is hugely powerful and can be tweaked to be much more useable when writing prose. These .vimrc commands:

  • Wrap lines by cutting lines off at word boundaries.
  • The word wrap is virtual, no extra line break is inserted (so that it's easy to edit afterwards).
  • j and k are replaced with gj and gk which allow you to move up and down inside a wrapped line.
  • I've also added the spell checker (I've not played around with it properly yet, but it looks a little weak).
  • smartindent for bullet points.


CODE:
  1. autocmd BufRead *\.txt setlocal formatoptions=l
  2. autocmd BufRead *\.txt setlocal lbr
  3. autocmd BufRead *\.txt map <buffer> j gj
  4. autocmd BufRead *\.txt  map <buffer> k gk
  5. autocmd BufRead *\.txt setlocal smartindent
  6. autocmd BufRead *\.txt setlocal spell spelllang=en_us



On a slightly different note: if you're using OS X, get this port of Vim. It's aim (and it does) is to integrate better into Macs. It's worth getting just because it has pretty Carbon tabs ;-)

Facebook Chat Firefox Plugin

FacebookFacebook finally released their chat functionality to all their users today. It's fantastic! I quickly came to the conclusion that it did have one major flaw: you have to be on the Facebook page to see any incoming messages. So to solve this, I quickly knocked up a Firefox add-on to alert you if you have any new chat messages. You can download it here.

Firefox Facebook Chat Add-on Screenshot

It uses the built-in Firefox alert system, so you'll need Firefox 3 on OS X (for Growl) and 2 for other OSs. At the moment you also have to leave the Facebook page open, if I have time later, I'll deal with this so that you can close the Facebook page.

Zen of Programming

ZenProgramming is a difficult discipline. It requires a phenomenal knowledge of difficult technologies: (usually several) programming languages, operating systems, knowledge of hardware, protocols, databases, etc. It takes years of using them to build up the know-how to use them all effectively to solve something (and isn't that the point of programming). However, knowledge and understanding are not enough. Having a certain type of brain helps a lot too. Most good programmers are excellent logical thinkers and have a solid background in maths.

So you've got a few years experience, you know a couple of languages and can bash out code with your eyes closed. This is the point where you start worrying about the quality of your code. How buggy is it? How robust is it? How maintainable is it? Dealing these problems is not like before - there is no solution, but there are ways you can cope. The Zen of Programming:

  1. Re-read what you just coded - before running it / compiling it.
  2. Think as far ahead as you can. But don't spend so long that you don't get anything done.
  3. Refactor a lot.
  4. Try and automate your testing.
  5. Re-read the documentation of a language / library after you become competent in it.

Most of these points boil down to having patience. If you are naturally so, then you probably do most of these anyway. Otherwise, it is very much worth forcing yourself to change your style - you will save time and your code will be much, much better.

Handling International Dialling Codes (in Python)

The website I'm working on at the moment collects a user's phone number. This must work with phones from any country and each number must be converted into a standard format so that it can be used with an SMS API. For example, the phone number +44 (0)7912345678 would become 447912345678.

To make it as easy as possible for the user to not make a mistake, the international dialling code is in a <select>. I compiled a list* of dialling codes for all countries. In Genshi, the template code might look like this:

HTML:
  1. <select name="dialling_code">
  2.     <option py:for="country, dialling_code, ndd in internationalDiallingCodes" value="${dialling_code}">${country} (+${dialling_code})</option>
  3. </select>

When the form is submitted, we need another function to turn the dialling code plus the rest of the number into the right format (note: this is also where internationalDiallingCodes comes from):

PYTHON:
  1. def makeStandardPhoneNumber(internationalCode, rest):
  2.     """
  3.     Make a standard phone number by appending rest to the
  4.     internationalCode. Check the first digits rest to see if they
  5.     match the NDD which must be removed from the number.
  6.     i.e. 0 in the UK
  7.     Example: makeStandardPhoneNumber('44', '073749135381')
  8.         -> '4473749135381'
  9.     """
  10.     # Get the NDD code for this internationalCode
  11.     ndd = getNDD(internationalCode)
  12.  
  13.     # if the country has an NDD, check for it and remove if it
  14.     # exists
  15.     if ndd is not None:
  16.         index = len(ndd)
  17.         if rest[:index] == ndd:
  18.             rest = rest[index:]
  19.     result = internationalCode + rest
  20.     return result
  21.    
  22. def getNDD(internationalCode):
  23.     for country, iCode, ndd in internationalDiallingCodes:
  24.         if iCode == internationalCode:
  25.             return ndd
  26.     return None
  27.  
  28. internationalDiallingCodes = [
  29.     ("Afghanistan ", "93", "0"),
  30.     ("Albania", "355", "0"),
  31.     ("Algeria", "213", "7"),
  32.     ("Andorra", "376", None),
  33.     ("Angola", "244", "0"),
  34. ...
  35. ]

You can download the python source file here. It is trivial to modify the functions to put the phone number into the format you want. If you needed a performance boost, this function could be optimised by inversing the index on internationalDiallingCodes.

* Also: You can download a CSV of phone codes here. That should make it easy to include them in any program. These do not include satellite phone companies.

Facebook Notification / Request Buckets

FacebookOne of the most common ways for a Facebook application to spread is through users inviting their friends once they have installed it. Unfortunately, this has lead to some pretty dubious practices (i.e. spam) to force people to invite others. The Facebook API team have responded by removing requests from the API, and more recently, allocating a maximum number of requests you may have per user based on how "spammy" your application is.

It uses metrics like how often people choose to "ignore, hide and report notifications as spam". This is an interesting way to suppress spam applications, but there have been reports that things can be difficult early on because a single 'ignore' will be more statistically significant. Based on the metrics, your application gets placed in one of 9 buckets for notifications and 13 buckets for requests, each bucket having a "limit threshold" which is the maximum per user per day. I've put together a table showing these based on a number of forum posts:

Notifications

Notifications Bucket Limit Threshold
1 Blocked for 1 month
2 4
3 5
4 8
5 10
6 15
7 20
8 28 (estimated)
9 35

Requests

Request Buckets Limit Threshold
1 Blocked for 1 month (estimate)
2 2 (estimate)
3 4
4 5
5 8 (estimate)
6 10
7 12
8 15
9 18
10 21
11 25
12 30
13 35

C# Browser Helper Object Event Handling

I have recently needed to write an internet explorer plugin. I do have some experience writing Firefox plugins, and if you've even done that you'll know that its quite easy. You can just hack together some Javascript and you're away. If you need some sort of interface, you get to use XUL, which isn't bad at all.

Internet Explorer is a completely different animal. Whereas Mozilla, the makers of Firefox, are very keen for people to extend it, Microsoft do not seem to be. It is however possible, you must create a COM object which Internet Explorer will instantiate, this what they call a Browser Helper Object (BHO). OK, that's a bit of a pain, but it will have to do. The real difficulty is in the documentation or lack thereof. There are a few tutorials which use C++ and ATL. Being used to modern languages such as C#, Javascript, Python etc, going back to C++ is horrible! You end up spending huge amounts of time getting minor issues sorted out. I had almost resigned myself to my fate when I came across an explanation of how to build a BHO using C#, that was one of the happiest days of my life!

Steven Cohn's post walks you through the setting up of a C# BHO. It's a great post, however it does not cover one important issue: how to attach event handlers to DOM element events. This turned out to be quite tricky, needing some fiddly code. It took a lot of experimentation and digging through newsgroups to find a solution which I will now explain.

The DOM is a COM object and the event handlers must also be COM objects implementing the IDispatch interface. So you must build a COM visible class to proxy your event and use a delegate to attach the real event handler to it. First our event handler class:

C#:
  1. [ComVisible(true)]
  2. public class DHTMLFormEventHandler
  3. {
  4.     public DHTMLFormEvent Handler;
  5.     private HTMLFormElement Form;
  6.     public DHTMLFormEventHandler(HTMLFormElement f)
  7.     {
  8.         this.Form = f;
  9.     }
  10.  
  11.     [DispId(0)]
  12.     public void Call()
  13.     {
  14.         Handler(this.Form);
  15.     }
  16. }

This class is, of course, exposed to COM. Its constructor keeps a reference to the element we will be interested in later when the event is raised. When the DOM raises the event it will call Call(). The DispId(0) is a sneaky way of implementing IDispatch (I think, I'm not too strong on COM yet). Call() will call the event handler which we attach using a delegate as normal:

C#:
  1. public delegate void DHTMLFormEvent(HTMLFormElement form);
  2.  
  3. private void AttachHandler(HTMLFormElementClass form)
  4. {
  5.     DHTMLFormEventHandler FormHandler = new DHTMLFormEventHandler(form);
  6.     FormHandler.Handler += new DHTMLFormEvent(SubmitHandler);
  7.     form.onclick = FormHandler;
  8. }

The final step is to set the event to our event handler object. You might want to experiment with the attachEvent method instead of doing this assignment to make sure you don't override other event handlers.