Sep 8 2010

I couldn't resist parodying this:

I believe the original is a parody too.

The first MySQL fanboy to defend it in the comments gets laughed at hysterically. Then I stab them in the face.

Like this post? Subscribe to my RSS feed and follow me on twitter to hear about new posts early.

Want to share this post?

Apr 22 2010

You have a SQL database and you want a column to be restrained to one of a few values. MySQL fanbois will tell you to use an enum. What should you use if you're running PostgreSQL?

There's a few possible solutions. David Wheeler wrote a great article on enforcing a set of values in PostgreSQL. You should definitely check it out to understand all the available options in detail, but here's the summary as taken from Josh Berkus' comment:

  • If the list will never change, and ordering is preset or doesn't matter: use an ENUM
  • If the list might change but changes require programming, OR the list will not change but needs to be alpha sorted: use a DOMAIN
  • If the list will change, and changes do not require other programming: use a lookup table.

You're probably familiar with ENUMs, and a lookup table is not hard to understand, but what is a domain?

Domains are a datatype that you can attach constraints to. They're different from ENUMs because you pick a datatype (e.g. TEXT), and then apply any arbitrary constraint to it. For example, you can simply re-implement an ENUM:

CREATE DOMAIN color TEXT
    CONSTRAINT valid_value CHECK (VALUE = 'red' OR VALUE = 'blue');

Or you can go one step further:

CREATE DOMAIN url TEXT
    CONSTRAINT valid_value CHECK (VALUE ~ '^https?://');

Let's see you do that with an ENUM!

Note that I said "attach constraints to". You can have more than one constraint apply to a domain at once:

CREATE DOMAIN url TEXT
    CONSTRAINT starts_right CHECK (VALUE ~* '^https?://')
    CONSTRAINT min_length CHECK (LENGTH(VALUE) > 10);

How does this differ from simply declaring a column constraint? Well for one thing, you can use the domain on columns across as many tables as you please. And if you have to change it, you can do so in one place:

ALTER DOMAIN url DROP CONSTRAINT min_length;
ALTER DOMAIN url ADD CONSTRAINT min_length CHECK (LENGTH(VALUE) > 15);

To use a domain, use its name as the type for a column:

CREATE TABLE rss_feed (
    url URL NOT NULL,
    content TEXT
);

I've put the name of the domain in capitals in the above example, which is allowed (and also is aesthetically pleasing given I generally specify all my datatypes using capitals).

To view all domains in your database, use \dD:

                                                                           List of domains
 Schema |       Name        |  Type   | Modifier |                         Check
--------+-------------------+---------+----------+--------------------------------------------------------------
 public | url               | text    |          | CHECK (VALUE ~ '^https?://'::text) CHECK(LENGTH(VALUE) > 15)

Dropping a domain is as simple as using DROP DOMAIN. Easy!

I have found a few gotchas with the PostgreSQL implementation of domains:

  • You can create domain constraints without a name, they will be named domainname_check, domainname_check1, and so on. Given that you need the name to alter the constraints later (one of the big advantages of domains after all!), it's better to just name them: CREATE DOMAIN url AS TEXT CONSTRAINT starts_right CHECK (VALUE ~* 'https?://');
  • ALTER DOMAIN DROP CONSTRAINT always returns success, even if the constraint doesn't exist.

These points aside, domains are a solid tool in a PostgreSQL admin's arsenal. As you can see, they're easy to use and more powerful than ENUMs. They're also SQL standard conforming - something that ENUMs are not!


Resources

Like this post? Subscribe to my RSS feed and follow me on twitter to hear about new posts early.

Want to share this post?

Mar 20 2009

This is cross-posted from a forum post I made on mahara.org. I know the PostgreSQL vs. MySQL debate has been had a million times before, but don't see the harm in registering my opinion on the whole issue. I know it's already convinced some people to switch to PostgreSQL, maybe it will convert you too...


Rather than simply saying MySQL is terrible and Postgres is better, I'll try and outline a few good reasons based on the years I've spent (initially using MySQL, but now exclusively on Postgres if at all possible), and based on my "developer's viewpoint" - that is, looking whether the databases behave and perform the way they should, and not so much at whether they're "easy to use/newbie friendly", whatever that means.

And a disclaimer from the start.. firstly this is a long post, and secondly that we are committed to MySQL support in Mahara.

--

The most important thing you can ever have in a relational database system is data integrity - which means that under no circumstances does the database allow your data to get into an incorrect state. Many databases have lots of safeguards built in to ensure your data remains sane - foreign keys, check constraints, transactions etc... and so when we write Mahara we make want to make sure we can use these features to prevent people having issues like poor William had when he upgraded his site. [1]

The main reason for this is because we're not perfect! Sometimes we make a mistake in the upgrade process, or a bug in the software, and then other people run into our bugs. The last thing we want is for someone to hit a bug that corrupts their data, because without that data, their site is worthless.

So hopefully that gives you a background into what we care most about. We also want to use a solution that performs well (speed when returning a query or doing any database operation), scales well (is still speedy when many people are accessing it), is well documented (so we can find out how it works easily) and has a good security policy (so our users don't get stuck using something that hackers can take advantage of), among many other things. But data integrity is the most important thing.

Given our need for data integrity, the first thing we see is that we can't use MySQL's "MyISAM" storage engine for any tables. It doesn't support transactions, which clearly will allow data to become corrupt. It doesn't matter that the engine is fast - it doesn't do the most important job a database should do. So MyISAM is out. From Mahara 1.1, if your MySQL setup only allows MyISAM tables, you won't be allowed to install Mahara.

Also, MySQL's "NDB Advanced Clustering Engine" is also out. If you want to try and scale your installation this way, it's a bad idea - it requires you turn off foreign keys.

The only MySQL engine that has most of the major data integrity features is InnoDB. It supports some transactions, and it supports foreign keys. But it is still missing some important features too: CHECK constraints and transactions on database modification operations (CREATE/ALTER TABLE etc.). So we see that we can use InnoDB and it will work "most" of the time, but it's not perfect.

On the other hand, PostgreSQL doesn't have the concept of database engines. It's just got one way to store tables, that uses its MVCC model for data integrity. This model fully supports transactions (on database modification as well as INSERT/UPDATE etc.), supports CHECK constraints, foreign keys, and also supports some other useful features such as indexes on functions of columns, that Mahara sometimes uses. Here are some examples of where Mahara uses these data integrity features:

  • Transactions - these are used everywhere! I won't bother to explain what they are, I'm assuming you know what they are. But I will re-iterate the point that PostgreSQL supports transactions for altering the database structure itself, something that MySQL can't do. Not even Oracle can do this! This is explained on the installation instructions page, but I'll point it out again that this means we can do upgrades without fear that if they fail, the database will be in a corrupted state. The benefits of this are obvious!
  • Foreign keys - used everywhere again! Everyone knows what these are. But it amazes me that so few open source software projects don't use them. Drupal doesn't, Moodle doesn't, and so many more - yet it's a near criminal offence. Corrupted data is a nightmare to work with, it causes software bugs and "strange behaviour" that can be a mess to deal with.
  • Check constraints - again, common everywhere except open source, for some reason. Mahara 1.1 ships with one in particular that will be useful. In the view and artefact tables, only one of the owner, group or institution columns can have a value. In Postgres, we have a check constraint for this, in MySQL we can't have one. So users on MySQL are more likely to suffer if there is a bug that results in more than one of these columns being set.
  • Indexes on functions of columns - best demonstrated with an example. In Mahara, usernames are case insensitive, so we want to make sure that nobody can insert the user 'BOB' if there is already a user 'bob'. In postgres, we can simply create a unique index on LOWER(username), and our problem is solved. We can't create this index in MySQL, so again MySQL users miss out.

The other things that people seem to care about most are performance and scalability. As explained before, we can't use MyISAM, so we compare InnoDB against Postgres. The simple answer is that, speed wise, they're comparable - although Postgres provides all the extra integrity features. The truth is though, the database isn't the bottleneck on small sites - the main bottleneck is CPU when processing the PHP pages.

On bigger sites, the scalability of the database becomes important. And sadly, MySQL simply falls over flat. Under high concurrent user access, MySQL's performance drops through the floor. What's worse is that it has bad locking characteristics. For example, when moodle.org does its backups, the site is basically unusable. Whereas Postgres is much smarter about locking, and doesn't need full table locks for almost any operation it performs.

The last thing I'll mention is this: have you ever had to use REPAIR TABLE in MySQL? What a joke. It's sad the command even exists, let alone the fact that it's needed so often. There are clearly some deep-seated bugs in MySQL to do with integrity that they need to sort out.

--

Anyway, this has been an extremely long post, but I hope it can serve as a reference for everyone when choosing what database to use for their Mahara installation. I'll summarise with my personal opinion, and a bunch of links you can read to get some more background.

Personally, I wish we didn't have to support MySQL in Mahara. It requires us to do more testing, work around more problems and make compromises. If anyone tells me they're a "MySQL expert", I just laugh to myself, because I think that if they were a "database expert" they'd be using Postgres. But most people use MySQL, because it's the "database" with all the mindshare right now, so we support it in Mahara.

MySQL support in Mahara isn't ever going to disappear, don't get me wrong. But I think you're selling yourself short if you use it when you could use Postgres instead.

Here are some links:

[1]William is a Mahara user who was affected by a bug in the MySQL upgrade path from Mahara 1.0 to 1.1. Unfortuntely, he did not have a backup of his database to restore from.

Like this post? Subscribe to my RSS feed and follow me on twitter to hear about new posts early.

Want to share this post?