Go and SQLite: when database/sql chafes

2018-04-02, David Crawshaw

The Go standard library includes database/sql, a generic SQL interface. It does a good job of doing exactly what it says it does, providing a generic interface to various SQL database servers. Sometimes that is what you want. Sometimes it is not.

Generic and simple usually means lowest-common-denominator. Fancy database features, or even relatively common but not widely used features, like nested transactions, are not well supported. And if your SQL database is conceptually different from the norm, it can get awkward.

Using SQLite extensively, I am finding it awkward. SQLite is an unusual SQL database because it lives in process. There is no network protocol so some kinds of errors are not possible, and when errors are possible, they fall into better-understood categories (which means they sometimes should be Go panics, not errors). SQLite is often used as a file format, so streaming access to blobs is very useful.

So I wrote my own SQLite interface: https://crawshaw.io/sqlite.

Connection Oriented

The most awkward part of database/sql when using SQLite is the implicit connection pool. An *sql.DB is many connections, and when you call Prepare the *sql.Stmt you get back returns a statement that will execute on some connection. That's usually fine except for transactions, which database/sql handles specially. Unfortunately the interesting features of SQLite are far more connection-oriented than most client-server databases, so working with queries without holding the connection does not work well with SQLite. The database/sql package does now expose an *sql.Conn object to help with this, but this makes Stmt tracking difficult.

Statement caching

On the long-lived, frequently-executing paths through a program you want any SQL that is executed to have already been parsed and planned by the database engine. Part of this is for the direct CPU savings from avoiding parsing. Part is also to offer the database engine a chance to do more analysis of the query or to change allocation strategies. Almost all compilers have to trade off compilation time and execution time.

With database/sql, this is typically done by calling Prepare on a connection pool and producing an *sql.Stmt. You can also call Prepare on an *sql.Conn, and get a statement object specific to that connection. It is then up to you to keep track of this object.

This has always irked me because it means defining a variable in some long-lived object somewhere. The name of that variable is never long enough to be useful, and never short enough to stay out of the way. To avoid this I tend to interpose an object that stores a mapping of query strings to *sql.Stmt objects, so I can use the query string itself inline in hot-path as the name of the statement. Experience with this suggests it works, so I have made it the foundation of the sqlite package:

func doWork(dbpool *sqlite.Pool, id int64) {
	conn := dbpool.Get(ctx)
	defer dbpool.Put(conn)

	stmt := conn.Prep("SELECT Name FROM People WHERE ID = $id;")
	stmt.SetInt64("$id", id)
	if hasRow, err := stmt.Step(); err != nil {
		// ... handle err
	} else !hasRow {
		// ... handle missing user
	}
	name := stmt.GetText("Name")
	// ...
}

If the connection has never seen this query before, the Prep method builds a prepared statement, parsing the SQL. In the process it adds the statement to map keyed by the query string and returns it on subsequent calls to Prep. Thus after a handful of calls to doWork cycling through all the connections in the pool, calls to Prep are simple map lookups.

Parameter names

For relatively simple queries with only a handful of parameters, lining up a few question marks with the positional arguments in the Query method is straightforward and quite readable. The same is possible here using the sqlitex.Exec function.

For complex queries with a dozen parameters, the sea of parameters can be quite confusing. Here instead we take advantage of SQLite's parameter names:

stmt := conn.Prepare(`SELECT Name FROM People
	WHERE Country = $country
	AND CustomerType = $customerType`)

stmt.SetText("$country", country)
stmt.SetInt("$customerType", customerType)

Any errors that occur setting a field are reported when Step is called.

Similarly column names can be used to read values from the result:

stmt.GetText("Name")

Errors or bugs

Everything can produce an error in database/sql. This is the correct design given its requirements: databases are separate processes with communication going over the network. The connection to the database can disappear at any moment, and a process needs to handle that.

SQLite is different. The database engine is in-process and not going anywhere. This means we should treat its errors differently.

In Go, errors that are part of the standard operation of a program are returned as values. Programs are expected to handle errors.

Program bugs that cannot be handled should not be returned as errors. Doing so leads to unnecessarily passing around of useless error objects and makes it easy to introduce more bugs (in particular, losing track of where in the program the bug happened).

Here is a program bug that no-one can usefully handle:

conn.Prep("SELET * FRO t;") // panics

Almost all programs making SQL queries define the text of those queries statically. (The only obvious exception is if you are writing an SQL REPL.) Doing otherwise is a security risk. It does not make sense to try and handle the error from an SQL typo at run time. So the standard way to prepare a statement, the Prep method, does not return an error. Instead it panics if the SQL fails to compile.

The behavior of the Prep method is spiritually similar to regexp.MustCompile, which is designed to be used with regular expression string literals. As a side effect this means slightly fewer lines of code are required to execute a query, but most importantly, it means the bug is treated correctly.

Savepoints

One of the concepts I find hardest to use well in database/sql is the Tx object. It represents a transaction, that is, statements executed via it are wrapped in BEGIN; and COMMIT;/ROLLBACK;. This sqlite package has no equivalent object. Instead, it encourages you to exclusively use savepoints.

For those not familiar with the concept, the SQL SAVEPOINT foo; ... RELEASE foo; is semantically the same as BEGIN DEFERRED; ... COMMIT;. What distinguishes savepoints is they can be nested. (Hence the user-defined names, so you can specify from whence to commit or rollback.)

If you can spare a few microseconds, savepoints provide an easy form of transaction support in Go that can integrate well with error and panic handling.

The fundamental principle is: for functions doing serial database work, pass it a single connection, create a savepoint on function entry, and defer the savepoint release.

Functions that follow this principle compose.

For example, using the helper function sqlitex.Savepoint:

func doWork(conn *sqlite.Conn) (err error) {
	defer sqlitex.Save(conn)(&err)
	// ...
	if err := doOtherWork(conn); err != nil {
		return err
	}
	// ...
}

func doOtherWork(conn *sqlite.Conn) (err error) {
	defer sqlitex.Save(conn)(&err)
	// ...
}

In this example, if doOtherWork returns an error, the doWork savepoint will unwind. Elsewhere in the program doOtherWork can be safely called as an independent, fully functional piece of code that is already wrapped in a database transaction.

This ties committing or rolling back a database transaction to whether or not a function returns an error. The bookkeeping is a little easier and that makes it much easier to move code around.

Contexts and SetInterrupt

The context package was retrofitted onto database/sql and it shows. The exported API surface is much larger than it should be because the retrofit happened after Go 1.0 was released and the API could not be broken.

In database/sql just about everything has to take a context object, and in some cases more than one context may be in play, as every single function call is potentially a network event communicating with a database server. As SQLite does not have that, the context story can be simpler. No need for PrepareContext or PingContext.

Instead, a context can be associated with an SQLite connection:

conn.SetInterrupt(ctx.Done())

Now ctx is in charge of interrupting calls to that connection until SetInterrupt is called again to swap the context out. There is also a shortcut for associating a context when using a connection pool:

conn := dbpool.Get(ctx.Done())
defer dbpool.Put(conn)

Concurrency

A database/sql driver for SQLite should be fully capable of taking advantage of threads for multiple readers (and in some cases, effectively multiple writers). However when I tried out the Go SQLite drivers, I found a few limits. What I saw was SQLite used in thread serialization mode (slow!), not using the in-process shared cache by default, not using the write-ahead log by default, and no built-in handling for unlock_notify.

This package does these things by default, to maximize the concurrency a program can get out of a connection pool.

What's next

Does the world really need another Go sqlite package? Maybe!

It is a lot of fun rethinking a general interface for a specific case. You get to simultaneously throw a lot of things away and add new things.

There are a few things I would like to look at for improving error reporting. For example, if you call conn.OpenBlob inside a savepoint, then try to open a nested savepoint, SQLite will produce an error, SQLITE_BUSY. It won't tell you what it is that's open. If a connection is tracking its own blobs, that would give us a good chance to report what is open or in-progress.

This package will keep evolving, that is, I will keep breaking its API, for the next few months. If anyone wants to consider a database/sql overhaul as part of Go 2, maybe there are some useful ideas in here. Perhaps this package can serve as an experience report.