Robert Haas: SURGE Recap

Bruce Momjian and I spent Thursday and Friday of last week in Baltimore, attending Surge. It was a great conference. I think the best speakers were Bryan Cantrill of Joyent (@bcantrill), John Allspaw of Etsy (@allspaw), and Artur Bergman of Wikia (@crucially), but there were many other good talks as well. The theme of the conference was scalability, and a number of speakers discussed how they'd tackled scalability challenges. Most seem to have started out with an infrastructure based on MySQL or PostgreSQL and added other technologies around the core database to improve scalability, especially Lucene and memcached. But there were some interesting exceptions, such as a talk by Mike Malone wherein he described building a system to manage spatial data (along the lines of PostGIS) on top of Apache Cassandra.

Some general themes I took away from the conference:

1. Make use of the academic literature. Inventing your own way to do something is fine, but at least consider the possibility that someone smarter than you has thought about this problem before. Systems engineering, operations engineering, human factors design, and crisis management are all fields with a much longer history than web engineering.

2. Failures are inevitable, so plan for them. Try to minimize the possibility of cascading failures, and plan in advance how you can operate in degraded mode if disaster (or the Slashdot effect) strikes.

3. Disk technology matters. Drive firmware bugs are common and nightmarish, and you can expect very limited help from the manufacturer, especially if the drive is billed as consumer-grade rather than enterprise-grade. SSDs can save you a lot of money, both because a given number of dollars buys more IOs-per-second, and because electricity isn't free.

4. Large data sets require horizontal scalability. In the era of 1TB drives, "large" doesn't mean quite what it used to; one of the first computers we had in our house when I was growing up had a 10MB hard drive. But even though the amount of data you can manage with one machine is growing all the time, the amount of data people want to manage is growing even faster.

Since I spent most of my time thinking about PostgreSQL, it was great to take a few steps back and look at the larger picture. What kinds of things are people doing with their databases? How are they using them to solve business problems? The lessons mentioned above have applicability to the future direction of PostgreSQL, as well. Our project already has excellent practices with regards to defensive programming (point #2); we must be take care to keep up the good work. A number of our algorithms are taken from the academic literature (point #1), but I think there's room for improvement, especially in the areas of performance and user-interface design. And, of course, we need better scalability (point #4). Despite the present popularity of distributed key-value stores (i.e. NoSQL), the traditional RDBMS seems unlikely to go away any time soon. But, over time, we'll probably see some convergence between the feature-rich-but-less-scalable RDBMS and the feature-poor-but-massively-scalable distributed key-value store. It would be nice to get out ahead of the curve.

4 comments:

AnonymousOctober 16, 2010 3:31 PM
Is voltDB the convergence solution to look forward to?
Robert HaasOctober 16, 2010 10:33 PM
VoltDB is an in-memory only database without a query language, so I would say no.
AnonymousOctober 17, 2010 3:03 AM
Considering that it supports scale-out (as with nosql solutions) and supports ACID and SQL as the DBMS interface... I see it as a possible convergence solution b/w the two worlds. So what if it is in-memory
Robert HaasOctober 17, 2010 11:44 PM
Because VoltDB is an in-memory database, a cluster-wide power outage can result in data loss. While that may be suitable for some applications, it's not the same type of durability guarantee that a traditional RDBMS provides.

I'm not familiar enough with VoltDB to comment on what it provides in terms of SQL, but I suspect it is not as full-featured as traditional systems. That doesn't mean it isn't great for certain things, of course.

Monday, October 04, 2010

SURGE Recap

4 comments: