2018-04-02, David Crawshaw
The Go standard library includes database/sql, a generic SQL interface. It does a good job of doing exactly what it says it does, providing a generic interface to various SQL database servers. Sometimes that is what you want. Sometimes it is not.
Generic and simple usually means lowest-common-denominator. Fancy database features, or even relatively common but not widely used features, like nested transactions, are not well supported. And if your SQL database is conceptually different from the norm, it can get awkward.
Using SQLite extensively, I am finding it awkward. SQLite is an unusual SQL database because it lives in process. There is no network protocol so some kinds of errors are not possible, and when errors are possible, they fall into better-understood categories (which means they sometimes should be Go panics, not errors). SQLite is often used as a file format, so streaming access to blobs is very useful.
So I wrote my own SQLite interface: https://crawshaw.io/sqlite.
The most awkward part of database/sql when using SQLite is the
implicit connection pool.
An *sql.DB
is many connections, and when you call Prepare the
*sql.Stmt
you get back returns a statement that will execute on some
connection.
That's usually fine except for transactions, which database/sql
handles specially.
Unfortunately the interesting features of SQLite are far more
connection-oriented than most client-server databases, so working with
queries without holding the connection does not work well with SQLite.
The database/sql package does now expose an *sql.Conn
object to help
with this, but this makes Stmt tracking difficult.
On the long-lived, frequently-executing paths through a program you want any SQL that is executed to have already been parsed and planned by the database engine. Part of this is for the direct CPU savings from avoiding parsing. Part is also to offer the database engine a chance to do more analysis of the query or to change allocation strategies. Almost all compilers have to trade off compilation time and execution time.
With database/sql
, this is typically done by calling Prepare on a
connection pool and producing an *sql.Stmt
.
You can also call Prepare on an *sql.Conn
, and get a statement object
specific to that connection.
It is then up to you to keep track of this object.
This has always irked me because it means defining a variable in some
long-lived object somewhere.
The name of that variable is never long enough to be useful, and never
short enough to stay out of the way.
To avoid this I tend to interpose an object that stores a mapping of
query strings to *sql.Stmt
objects, so I can use the query string
itself inline in hot-path as the name of the statement.
Experience with this suggests it works, so I have made it the
foundation of the sqlite package:
func doWork(dbpool *sqlite.Pool, id int64) {
conn := dbpool.Get(ctx)
defer dbpool.Put(conn)
stmt := conn.Prep("SELECT Name FROM People WHERE ID = $id;")
stmt.SetInt64("$id", id)
if hasRow, err := stmt.Step(); err != nil {
// ... handle err
} else !hasRow {
// ... handle missing user
}
name := stmt.GetText("Name")
// ...
}
If the connection has never seen this query before, the Prep method
builds a prepared statement, parsing the SQL.
In the process it adds the statement to map keyed by the query string
and returns it on subsequent calls to Prep.
Thus after a handful of calls to doWork
cycling through all the
connections in the pool, calls to Prep are simple map lookups.
For relatively simple queries with only a handful of parameters, lining up a few question marks with the positional arguments in the Query method is straightforward and quite readable. The same is possible here using the sqlitex.Exec function.
For complex queries with a dozen parameters, the sea of parameters can be quite confusing. Here instead we take advantage of SQLite's parameter names:
stmt := conn.Prepare(`SELECT Name FROM People
WHERE Country = $country
AND CustomerType = $customerType`)
stmt.SetText("$country", country)
stmt.SetInt("$customerType", customerType)
Any errors that occur setting a field are reported when Step
is called.
Similarly column names can be used to read values from the result:
stmt.GetText("Name")
Everything can produce an error in database/sql. This is the correct design given its requirements: databases are separate processes with communication going over the network. The connection to the database can disappear at any moment, and a process needs to handle that.
SQLite is different. The database engine is in-process and not going anywhere. This means we should treat its errors differently.
In Go, errors that are part of the standard operation of a program are returned as values. Programs are expected to handle errors.
Program bugs that cannot be handled should not be returned as errors. Doing so leads to unnecessarily passing around of useless error objects and makes it easy to introduce more bugs (in particular, losing track of where in the program the bug happened).
Here is a program bug that no-one can usefully handle:
conn.Prep("SELET * FRO t;") // panics
Almost all programs making SQL queries define the text of those queries statically. (The only obvious exception is if you are writing an SQL REPL.) Doing otherwise is a security risk. It does not make sense to try and handle the error from an SQL typo at run time. So the standard way to prepare a statement, the Prep method, does not return an error. Instead it panics if the SQL fails to compile.
The behavior of the Prep method is spiritually similar to regexp.MustCompile, which is designed to be used with regular expression string literals. As a side effect this means slightly fewer lines of code are required to execute a query, but most importantly, it means the bug is treated correctly.
One of the concepts I find hardest to use well in database/sql
is the Tx object.
It represents a transaction, that is, statements executed via
it are wrapped in BEGIN;
and COMMIT;
/ROLLBACK;
.
This sqlite package has no equivalent object.
Instead, it encourages you to exclusively use savepoints.
For those not familiar with the concept, the SQL
SAVEPOINT foo; ... RELEASE foo;
is semantically the same as
BEGIN DEFERRED; ... COMMIT;
.
What distinguishes savepoints is they can be nested.
(Hence the user-defined names, so you can specify from whence to
commit or rollback.)
If you can spare a few microseconds, savepoints provide an easy form of transaction support in Go that can integrate well with error and panic handling.
The fundamental principle is: for functions doing serial database work, pass it a single connection, create a savepoint on function entry, and defer the savepoint release.
Functions that follow this principle compose.
For example, using the helper function sqlitex.Savepoint:
func doWork(conn *sqlite.Conn) (err error) {
defer sqlitex.Save(conn)(&err)
// ...
if err := doOtherWork(conn); err != nil {
return err
}
// ...
}
func doOtherWork(conn *sqlite.Conn) (err error) {
defer sqlitex.Save(conn)(&err)
// ...
}
In this example, if doOtherWork returns an error, the doWork savepoint will unwind. Elsewhere in the program doOtherWork can be safely called as an independent, fully functional piece of code that is already wrapped in a database transaction.
This ties committing or rolling back a database transaction to whether or not a function returns an error. The bookkeeping is a little easier and that makes it much easier to move code around.
The context package was retrofitted onto database/sql
and it shows.
The exported API surface is much larger than it should be because
the retrofit happened after Go 1.0 was released and the API could
not be broken.
In database/sql
just about everything has to take a context object,
and in some cases more than one context may be in play, as every
single function call is potentially a network event communicating
with a database server. As SQLite does not have that, the context
story can be simpler. No need for PrepareContext or PingContext.
Instead, a context can be associated with an SQLite connection:
conn.SetInterrupt(ctx.Done())
Now ctx is in charge of interrupting calls to that connection
until SetInterrupt
is called again to swap the context out.
There is also a shortcut for associating a context when using
a connection pool:
conn := dbpool.Get(ctx.Done())
defer dbpool.Put(conn)
A database/sql
driver for SQLite should be fully capable
of taking advantage of threads for multiple readers (and in
some cases, effectively multiple writers).
However when I tried out the Go SQLite drivers, I found a few
limits.
What I saw was SQLite used in thread serialization mode (slow!),
not using the in-process
shared cache
by default, not using the
write-ahead log by default,
and no built-in handling for
unlock_notify.
This package does these things by default, to maximize the concurrency a program can get out of a connection pool.
Does the world really need another Go sqlite package? Maybe!
It is a lot of fun rethinking a general interface for a specific case. You get to simultaneously throw a lot of things away and add new things.
There are a few things I would like to look at for improving error
reporting. For example, if you call conn.OpenBlob
inside a savepoint,
then try to open a nested savepoint, SQLite will produce an error,
SQLITE_BUSY
.
It won't tell you what it is that's open.
If a connection is tracking its own blobs, that would give us a good
chance to report what is open or in-progress.
This package will keep evolving, that is, I will keep breaking its
API, for the next few months.
If anyone wants to consider a database/sql
overhaul as part of Go 2,
maybe there are some useful ideas in here.
Perhaps this package can serve as an experience report.