Back to basics: Writing an application using Go and PostgreSQL
Monday, 22 November 2021.
By reading this tutorial, you’ll learn how to use PostgreSQL with the Go programming language using the pgx driver and toolkit in a very productive manner. Furthermore, with the provided source code, you’ll be able to learn how to write efficient and sound unit and integration tests, ready to be run locally or on a Continuous Integration environment, such as GitHub Actions. Use the Table of Contents to skip to a specific part of this long post. Don’t forget to check out the accompanying repository github.com/henvic/pgxtutorial.
Check out the api.proto file to see the gRPC API created for this tutorial.
See also my presentation Writing a Go application with PostgreSQL using pgx during a Go meetup @ HATCH in Amsterdam (Tuesday, 12 April 2022).
PostgreSQL, also known as Postgres, is an extendible feature-rich Object-Relational Database Management System that is almost 100% SQL standards-compliant and released as open source software under a permissive license.
Much of the content in this tutorial is based on experience I acquired working for HATCH Studio, even though I’m just shy of ten months there. My first assignments involved improving some data structures we used internally and led to a team discussion about moving from a document-based database to a more traditional relational database in the face of some challenges. Next, we held a brainstorming session where we assembled our backend team to analyze our situation and discuss options. I already had some limited experience using PostgreSQL with pgx for a pet project and was pleased to discover that the rest of the team also saw PostgreSQL as an excellent choice for meeting our needs: great developer experience, performance, reliability, and scalability.
For our search infrastructure, we started using Amazon OpenSearch Service. We listen to PostgreSQL database changes via its Logical Streaming Replication Protocol and ingest data into our OpenSearch/Elasticsearch search engine through a lightweight connector built in-house.
It works similar to hooking Apache Kafka, but is easier to use and allows us to move faster without breaking things: running integration tests on a developer machine takes only seconds: much of which is an overkill
time.Sleep() so we never waste time with flaky tests caused by the eventual consistency characteristics of the search engine.
This solution will not be presented now but in a future opportunity.
Then, use Evans to explore its gRPC endpoints by running the following command:
SQL is alive and well
What if there is a powerful domain-specific structured query language designed to manage data out there? It turns out there is one created almost half a century ago, in 1974: SQL, created by Donald D. Chamberlin and Raymond F. Boyce.
Considering trade-offs, for a typical project, I prefer being able to define sound data structures up-front and having a low cost of maintenance over time, rather than a reduced cost of prototyping promised by “no-SQL” databases.
Some technical benefits from using PostgreSQL include:
- You can use inheritance between tables.
- You can use JSON types when your requirements are fluid.
- Custom data types such as enum (to enumerate or restrict values for a given column) and composite (list of attribute names and data types, essentially the same as the row type of a table).
Why not SQLite? SQLite is the most used database engine in the world: from smartphones to jet engines. It also excels in performance, and it has aviation-grade quality and testing. However, it is not a client/server SQL database, and it is trying to solve a different problem.
There are several ways to run it (including Docker). For your development machine, just do whatever is more convenient for you. For deployments, you might want to take a look at the following:
- Server Administration (PostgreSQL documentation)
- Quickstart for Google Cloud SQL for PostgreSQL
- Amazon RDS for PostgreSQL
- Azure Database for PostgreSQL
- Managed PostgreSQL from Heroku
To connect to PostgreSQL, you can use the official terminal-based front-end psql or something else.
- pgcli is a CLI or REPL similar to psql, but has auto-completion, syntax highlighting and displays the docstring of functions as you type.
- Postico is a macOS GUI for PostgreSQL
- pgAdmin is a feature-rich open source web-based client for PostgreSQL
- Navicat for PostgreSQL
The following terminal screen recording shows me using pgcli on a half-baked personal project.
Environment variables for configuring PostgreSQL
The common and straightforward way to configure your PostgreSQL clients, tools, and applications is to use environment variables. Alternatively, you can use connection strings on your application for most everything but a few options.
Example of environment variables-based configuration accepted by most tools that use a PostgreSQL database:
PGCONNECT_TIMEOUT (or connect_timeout parameter) is the maximum time (in seconds) to wait for establishing a connection per host or IP address.
Generally speaking, please be advised that it’s important to set it to free resources and avoid resource starvation when things are not working correctly. For example, if you expect a request that requires a database connection to be fulfilled in 100ms on the worst case, but it already took, say, 3s because the application server cannot establish a connection to the database, it’s probably already time to return an error, and let the consumer of the API decide what to do next, instead of holding the connection and taking forever to respond.
If you’ve read my article Environment variables, config, secrets, and globals you might be wondering why I’m recommending environment variables here, given that I’m not a big fan of them. In short, I don’t think it’s worth fighting it in this case, especially with the convenience of using the very same configuration on your application and tooling, meaning you don’t need to configure your PostgreSQL connection settings in multiple places.
To make things more manageable, you might want to use direnv to load and unload environment variables depending on what current directory you’re working on at the moment. For doing that:
cdinto a directory where you want the context to be a specific database connection.
- Create a file called .envrc with your list of environment variables (and add it to your global .gitginore?)
- Run the command
direnv allowto load environment variables on your .envrc file.
- Run the command
direnv reloadto apply changes whenever you update your .envrc file.
Choosing a database driver for PostgreSQL
To connect to a Postgres database in Go, you’ll need to use a third-party library as the standard library package doesn’t provide official drivers for any databases.
However, the standard library provides interfaces for building or using database drivers (this has a good set of goals).
database/sqldefines a generic interface around SQL (or SQL-like) databases.
database/sql/driverdefines interfaces to be implemented by database drivers to be accessed via package sql, respectively.
pgx driver and toolkit
pgx provides two communication interfaces for connecting to a PostgreSQL database:
- pgx native interface to PostgreSQL
There are some advantages of using this native interface instead of
database/sql, as you can verify in its README.
The most interesting ones probably are:
- Faster binary format instead of a textual representation at the protocol level for datatypes different than
int64, float64, bool, byte, string, time.Time, or nil.
- JSON and BJSON support.
- Conversion of PostgreSQL arrays to Go slice mappings for integers, floats, and strings.
Database migrations and SQL schema
tern is a standalone migration tool for PostgreSQL that is part of the pgx toolkit. You can use it to write sequential migration (even for your tests, as we’ll see later).
To start using migrations, you can do something like this:
tern new product will generate a file named
001_initial_schema.sql with the following:
You can then replace this with a simple schema like the following for your first migration:
The exact line
---- create above / drop below ---- should be present to indicate what part of the SQL code should be executed on a migration up and on a migration down.
The following commands might be useful:
tern newto create a migration file prefixed by the next available sequence number (i.e.,
tern migrateto apply your migrations on your database.
tern migrate -d -1to go back one migration.
tern migrate -hfor help.
tern keeps track of your migration version by creating and maintaining a
schema_version table on your database.
Tip: Be extra careful doing data migrations! It can be destructive!
Let’s see how we can connect to PostgreSQL using pgx from a program written in Go.
First, to use pgx in a concurrency-safe and reliable manner, we must use a connection pool. We can do this by using pgx/pgxpool to manage a number of low-level pgconn database driver connections in a pool.
We’re passing an empty connString to pgxpool.ParseConfig as we’re using environment variables for configuration. The pgx package will read the environment variables directly. An equivalent connection string would be something along the lines of:
host=localhost port=5432 dbname=mydb connect_timeout=5
If you use a logger that isn’t supported out of the box, you must implement the pgx.Logger interface.
For example, for the sake of keeping this exercise simple, I’m going to create a log for Go package “log”.
The following pgx log levels are supported:
trace, debug, info, warn, error, and none.
For development, the default
info level provides a significant level of introspection into what is going on, but its verbosity is overwhelming for a deployment, and you certainly want to tune it down to
pgx.LogLevelWarn on your production deployment.
About pgxpool and concurrency
In Go, you’ve to manage concurrent access to resources yourself when you create goroutines.
pgxpool manages a pool of connections to your Postgres database, which you can safely use concurrently from multiple Goroutines, without having to worry about the thread-safety aspects with regards to individual database connections. pgxpool works by managing a dynamic number of individual TCP connections to the database and controlling access, ensuring each goroutine acquiring the database connection accesses them safely (without data races).
Remember that each request handled by net/http is handled by its own goroutine.
By reusing open connections, you also avoid the expensive round-trip costs of establishing a new TCP connection and (perhaps) performing a TLS handshake for each connection to the database. pgxpool’s default max number of connections is the number of CPUs on your machine. This setting is very conservative, and you might be able to gain performance for highly concurrent applications by increasing *pgxpool.Config.MaxConns.
Running benchmark tests or stress tests might help you find the sweet spot for your application. That said, avoid the urge to do premature optimization: the default setting is perfect for most applications and use cases.
Limited “safe” interface
I’ve found it mildly reassuring to create a limited Postgres interface with only a subset of whitelisted methods safe to be used on the business logic of my application instead of passing
By doing so, we reduce the risk of introducing a low-level pgx API method call (such as to
pool.Close()) without some explicit effort.
- This might live in an internal package, and you can probably use it most everywhere.
LISTEN / NOTIFYdoesn’t work with it (see chat example).
If you decide to use this interface, you can bypass it (helpful when debugging or testing) with the following code:
Now, you might want to create an interface responsible for bridging the communication of your business logic with the database. People call it multiple names (DAO, DAL, repository, database, data source, etc.), and those who love to recite design patterns as poetry might say they’re all different stuff. I don’t care about the naming or exact definition of such patterns. For me, the end goal is to create a sane layer for the separation of concerns of your application and database. In Go, the idiomatic thing to do is to define interfaces where they’re used, so let’s start by that.
Now is an excellent time to remind you that creating thousands of packages is awful! Keep things simple.
See the finalized implementation here: internal/postgres/postgres.go.
While you can just call the database directly from your business logic, having this in a separate layer:
- Is going to be easier to maintain.
- Simplifies testing, whether you’re going to be using mocks or a real implementation.
The preceding go:generate comment uses mockgen to generate mocks for you can use to test code importing this interface.
For now, let’s create a package named
postgres containing an implementation of this interface.
- Alternatively, a postgres.go file on your business logic package is good enough for a small application.
- Having a strict separation in terms of interfaces and data structures has some value here.
- However, breaking down your application into too many packages might hurt readability.
Anyhow, be advised that there are disadvantages to doing this separation too:
One way to reduce the burden of leaky abstractions is to ensure our DAO imports our business logic types, and not the other way around* – keeping the footprint of where our postgres package is imported to a minimum: hopefully, only the main package. One good thing is that Go doesn’t allow import cycles.
Two leaky abstractions to watch out for are:
- pgx returns the
pgx.ErrNoRowserror when rows are expected but none are returned. I’ve found it practical to consume this error on the database layer and return a
nilreference pointer (or empty slice) to represent this situation. Alternatively, you can replace the error with one defined on your service layer.
- Reading Rows one-by-one calling
rows.Next()is a memory-efficient way to do things if you’re doing an operation that’ll require you to read a large number of rows. You’ve two options here: accept the leaky abstraction and deal directly with
pgxfrom your service layer or do “the right thing”, and have abstraction inversion and code duplication to solve this in a more limited manner. Either way, you’ll have to remember to
defer rows.Close()to free resources.
Talking about Go interfaces
Section added thanks to a discussion about this blog post in HackerNews.
Jonathan Rockway wrote the following article criticizing my use of interfaces: We need to have a chat about interfaces in Go. I definitely agree with him that such mega interfaces are awful, and I will play with some of his suggestions in a week once I’m back from vacation (maybe I push code killing this deficiency).
I typically start by satisfying the interface I want to implement by just creating a DB struct with the methods I defined in the interface. Next, I call panic from them. This way I can start the implementation already starting fast integration tests right away.
Packages and tools
- pgx is the SQL driver.
- tern is a migration tool that is part of the pgx toolkit.
- scany is a package for scanning from a database into Go structs and more.
- pgtools is a library containing code for testing infrastructure and more.
- go-cmp is a package for comparing Go values in tests.
- GoMock is a mocking framework for the Go programming language.
I’ve had a bad experience with ORM in the recent past, and I generally recommend against introducing this sort of abstraction to code.
I prefer much more something much closer to using pgx with scany and
pgtools.Wildcard() than to a full-featured ORM.
I don’t have a strong opinion regarding query builders, but I see them more as a liability on the supply chain.
Inspiration-wise, the pkg.go.dev website is powered by pkgsite, which uses PostgreSQL as a database. Nowadays, I regularly check out their repository to learn more and see what they use.
Opt-in for tests that require a database connection by means of:
- test flags (
-integration, which works on local directory mode, and not with list mode)
- test build flags (
- environment variables
At work, we decided to use test build flags. Example:
go test -tags=integration.
I prefer environment variables instead, as I’m not too fond of the hassle introduced by build tags on tooling, such as text editor or gopls, which might stop processing them as intended.
Running tests without database
You want your tests to pass even when running without a database for the sake of keeping your developer experience smooth.
For example, in the pgxtutorial repository, I resorted to skipping all tests of an entire package at once for some packages if the
INTEGRATION_TESTDB environment variable is not set.
I did this thinking about maintainability: it was way easier and safer than to keep tests that didn’t rely on a database passing.
When this general approach is not welcome, we can skip an individual test directly:
Running test with pgtools and tern
The following code shows how we can use pgtools/sqltest to run integration tests against a real database:
Check the pgtools/sqltest documentation to learn about all existing options.
By default, a database is temporarily created for the test function that called
Using pgtools/sqltest, you can feel safe knowing that the created database is prefixed with “test” to ensure it doesn’t clash with existing databases.
If you’re running tests on multiple packages, set a unique value for the
I often use table-driven tests when writing integration tests, and I usually find it easier to call migration.Setup() in the parent test and pass it ahead.
However, you must be aware that this might introduce a problem of reliability: a test that depended on another might fail when running isolated with
go test -run=Foo/bar.
Finding the right balance is essential here, as we want to identify problematic behaviors that might not be perceived if we were to have fully isolated tests.
Real implementation vs. using test doubles
Taking a look at the
internal/inventory/inventory package tests you’ll notice it mostly uses a real implementation for its tests.
It contains only a thin layer of test doubles using GoMock to verify if parameters are being passed correctly or to simulate database errors.
We generated the mock in the repository by calling the following command:
You can add the following directive on your code to do the same when running
The code for your test cases relying on a mock is a little bit awkward to get used to but are powerful and easy to use once you understand how they work. Example:
For a full example, see TestServiceCreateProduct/database_error.
You can use the package go-cmp to compare different structs easily.
In some circumstances (test example), we’ve set the fields CreatedAt and ModifiedAt for some values returned from the database. In others, we decided to ignore it.
Alternatively, you can evaluate the returned values and then copy CreatedAt and ModifiedAt before comparing structs. For example:
Failed idea: replay testing
I’ve experimented with creating a replay testing framework built on top of pgmock to run tests without requiring a database, but the idea wasn’t practical. Besides, the main reason for having it wasn’t justifiable: the integration tests are fast.
You can also add to your .zshrc or .envrc and run
go test directly:
For running the integration tests on GitHub Actions, we need first to create a workflow such as the following:
I hope you enjoyed this tutorial. Now go ahead and checkout the repository github.com/henvic/pgxtutorial.
If you click and buy any of these from Amazon after visiting the links above, I might get a commission from their Affiliate program.Tweet