This is cross-posted from a forum post I made on mahara.org. I know the PostgreSQL vs. MySQL debate has been had a million times before, but don't see the harm in registering my opinion on the whole issue. I know it's already convinced some people to switch to PostgreSQL, maybe it will convert you too...
Rather than simply saying MySQL is terrible and Postgres is better, I'll try and outline a few good reasons based on the years I've spent (initially using MySQL, but now exclusively on Postgres if at all possible), and based on my "developer's viewpoint" - that is, looking whether the databases behave and perform the way they should, and not so much at whether they're "easy to use/newbie friendly", whatever that means.
And a disclaimer from the start.. firstly this is a long post, and secondly that we are committed to MySQL support in Mahara.
--
The most important thing you can ever have in a relational database system is data integrity - which means that under no circumstances does the database allow your data to get into an incorrect state. Many databases have lots of safeguards built in to ensure your data remains sane - foreign keys, check constraints, transactions etc... and so when we write Mahara we make want to make sure we can use these features to prevent people having issues like poor William had when he upgraded his site.
The main reason for this is because we're not perfect! Sometimes we make a mistake in the upgrade process, or a bug in the software, and then other people run into our bugs. The last thing we want is for someone to hit a bug that corrupts their data, because without that data, their site is worthless.
So hopefully that gives you a background into what we care most about. We also want to use a solution that performs well (speed when returning a query or doing any database operation), scales well (is still speedy when many people are accessing it), is well documented (so we can find out how it works easily) and has a good security policy (so our users don't get stuck using something that hackers can take advantage of), among many other things. But data integrity is the most important thing.
Given our need for data integrity, the first thing we see is that we can't use MySQL's "MyISAM" storage engine for any tables. It doesn't support transactions, which clearly will allow data to become corrupt. It doesn't matter that the engine is fast - it doesn't do the most important job a database should do. So MyISAM is out. From Mahara 1.1, if your MySQL setup only allows MyISAM tables, you won't be allowed to install Mahara.
Also, MySQL's "NDB Advanced Clustering Engine" is also out. If you want to try and scale your installation this way, it's a bad idea - it requires you turn off foreign keys.
The only MySQL engine that has most of the major data integrity features is InnoDB. It supports some transactions, and it supports foreign keys. But it is still missing some important features too: CHECK constraints and transactions on database modification operations (CREATE/ALTER TABLE etc.). So we see that we can use InnoDB and it will work "most" of the time, but it's not perfect.
On the other hand, PostgreSQL doesn't have the concept of database engines. It's just got one way to store tables, that uses its MVCC model for data integrity. This model fully supports transactions (on database modification as well as INSERT/UPDATE etc.), supports CHECK constraints, foreign keys, and also supports some other useful features such as indexes on functions of columns, that Mahara sometimes uses. Here are some examples of where Mahara uses these data integrity features:
- Transactions - these are used everywhere! I won't bother to explain what they are, I'm assuming you know what they are. But I will re-iterate the point that PostgreSQL supports transactions for altering the database structure itself, something that MySQL can't do. Not even Oracle can do this! This is explained on the installation instructions page, but I'll point it out again that this means we can do upgrades without fear that if they fail, the database will be in a corrupted state. The benefits of this are obvious!
- Foreign keys - used everywhere again! Everyone knows what these are. But it amazes me that so few open source software projects don't use them. Drupal doesn't, Moodle doesn't, and so many more - yet it's a near criminal offence. Corrupted data is a nightmare to work with, it causes software bugs and "strange behaviour" that can be a mess to deal with.
- Check constraints - again, common everywhere except open source, for some reason. Mahara 1.1 ships with one in particular that will be useful. In the view and artefact tables, only one of the owner, group or institution columns can have a value. In Postgres, we have a check constraint for this, in MySQL we can't have one. So users on MySQL are more likely to suffer if there is a bug that results in more than one of these columns being set.
- Indexes on functions of columns - best demonstrated with an example. In Mahara, usernames are case insensitive, so we want to make sure that nobody can insert the user 'BOB' if there is already a user 'bob'. In postgres, we can simply create a unique index on LOWER(username), and our problem is solved. We can't create this index in MySQL, so again MySQL users miss out.
The other things that people seem to care about most are performance and scalability. As explained before, we can't use MyISAM, so we compare InnoDB against Postgres. The simple answer is that, speed wise, they're comparable - although Postgres provides all the extra integrity features. The truth is though, the database isn't the bottleneck on small sites - the main bottleneck is CPU when processing the PHP pages.
On bigger sites, the scalability of the database becomes important. And sadly, MySQL simply falls over flat. Under high concurrent user access, MySQL's performance drops through the floor. What's worse is that it has bad locking characteristics. For example, when moodle.org does its backups, the site is basically unusable. Whereas Postgres is much smarter about locking, and doesn't need full table locks for almost any operation it performs.
The last thing I'll mention is this: have you ever had to use REPAIR TABLE in MySQL? What a joke. It's sad the command even exists, let alone the fact that it's needed so often. There are clearly some deep-seated bugs in MySQL to do with integrity that they need to sort out.
--
Anyway, this has been an extremely long post, but I hope it can serve as a reference for everyone when choosing what database to use for their Mahara installation. I'll summarise with my personal opinion, and a bunch of links you can read to get some more background.
Personally, I wish we didn't have to support MySQL in Mahara. It requires us to do more testing, work around more problems and make compromises. If anyone tells me they're a "MySQL expert", I just laugh to myself, because I think that if they were a "database expert" they'd be using Postgres. But most people use MySQL, because it's the "database" with all the mindshare right now, so we support it in Mahara.
MySQL support in Mahara isn't ever going to disappear, don't get me wrong. But I think you're selling yourself short if you use it when you could use Postgres instead.
Here are some links:
Like this post? Subscribe to my RSS feed and follow me on twitter to hear about new posts early.
Want to share this post?