PL

PLRelational: Storage Formats

September 7, 2017, by Mike Ash

This is our latest entry in a series of articles on PLRelational. For more background, check out these other articles in the series:

We've been talking a lot lately about PLRelational and all the fancy stuff it does. However, we've been glossing over a fundamental part of it: how it actually stores data. After all, PLRelational is a data persistence framework at its core.

PLRelational's relations break down into two categories. There are relations which store rows, and there are relations which derive their contents by performing some operation on other relations. Many of the relations which store rows do so by persisting them to disk.

Memory Relations

Let's start with one that doesn't persist its contents, just for simplicity. The basic interface is the same, so it provides a good foundation.

The MemoryTableRelation class is a relation which, as the name indicates, stores its data in memory. You create one with a scheme:

let people = MemoryTableRelation(scheme: ["id", "name", "quest", "favorite-color"])

Once you have that, you can add, update, and delete rows:

people.asyncAdd(["id": 1, "name": "Lancelot",
                 "quest": "to find the holy grail", "favorite-color": "blue"])
people.asyncAdd(["id": 2, "name": "Galahad",
                 "quest": "to find the holy grail", "favorite-color": "blue"])
people.asyncUpdate(Attribute("id") *== 2, newValues: ["favorite-color": "yellow"])
people.asyncDelete(Attribute("id") *== 2)

Of course, this is purely in memory, and the data is lost once the people relation is destroyed. It can still be useful to represent transient data, but doesn't persist anything.

Property List Files

We might imagine taking this and building our own persistence on top of it by saving data to a property list file. We'd define a simple way to turn rows into property list objects. To save, we'd fetch the rows in the relation, convert them to property list objects, and save the whole thing into a file. To load, we'd load the property list file, convert it to rows, and add them to the relation.

This is essentially what PlistFileRelation is. It's an in-memory relation that's backed by a property list file. We can change people to use PlistFileRelation by changing the initialization:

let people = try PlistFileRelation.withFile(plisturl,
    scheme: ["id", "name", "quest", "favorite-color"],
    primaryKeys: ["id"], create: false
).orThrow()

It can then be used just like the in-memory version of people. It will automatically load data from the property list file, and when you want to save changes back to that file, call save:

let result = people.save()
if let error = result.err {
    // handle error
}

You may have noticed the primaryKeys parameter in the code that creates the relation. Specifying primary keys allows PLRelational to optimize queries involving those attributes, at the expense of some overhead when adding, removing, or changing rows. MemoryTableRelation is assumed to always contain relatively small amounts of data and doesn't have this option.

Property List Directories

A single property list file works well for moderate amounts of data, but it requires reading the entire file at once, and rewriting the entire file on save. This overhead can become significant once you start working with large amounts of data.

PlistDirectoryRelation solves this problem by storing the rows in individual files within a directory. This allows reading and writing individual rows separately.

We can use this by once again changing the initializer for people:

let people = try PlistDirectoryRelation.withDirectory(dirurl,
    scheme: ["id", "name", "quest", "favorite-color"],
    primaryKey: "id", create: true
).orThrow()

As with PlistFileRelation, changes are not saved automatically. Instead, a delta is kept in memory, and then changes can be flushed to disk by calling save:

let result = people.save()
if let error = result.err {
    // handle error
}

When creating a PlistDirectoryRelation, it takes a primaryKey parameter, singular. Unlike PlistFileRelation, PlistDirectoryRelation can only have a single primary key. This is due to how it efficiently locates the files for each row.

The filename for a row's plist is determined from the value for the primary key in that row. The filename is derived from that value by converting it to a canonical representation in raw bytes and then taking the SHA-256 hash of that representation. The result is something like:

4cd9b7672d7fbee8fb51fb1e049f690342035f543a8efe734b7b5ffb0c154a45.rowplist

Because the mapping is deterministic and based only on the value of the primary key, equality queries for the primary key can be done quickly. In this example, the deletion at the end will compute the filename for the ID of 2 and then mark that file for deletion. Joins or selects involving the primary key will efficiently look up only the data they need.

Property List Databases

Your app will probably use multiple stored relations for different kinds of data. This may be a mix of property list files and directories. For example, you might place metadata that's needed everywhere in property list files, while large data that should only be loaded on demand is in a directory.

The PlistDatabase class handles this. You create it with a list of relation specs, which describe each relation's scheme, primary keys, path, and name. It then handles creating, loading, and saving all of those relations for you, and bundles them all into a single directory structure. This example creates a people relation stored in a file and a people-images relation stored in a directory, since images are probably large:

let db = try PlistDatabase.create(rootURL, [
    .file(
        name: "people", path: "people.plist",
        scheme: ["id", "name", "quest", "favorite-color"],
        primaryKeys: ["id"]
    ),
    .directory(
        name: "people-images", path: "people-images",
        scheme: ["id", "image-data"],
        primaryKey: "id"
    )
]).orThrow()

let people = db["people"]!
let images = db["people-images"]!

people and images will contain whatever data currently exists there, and can be manipulated with the standard Relation methods:

people.asyncAdd(["id": 1, "name": "Lancelot",
                 "quest": "to find the holy grail", "favorite-color": "blue"])
people.asyncAdd(["id": 2, "name": "Galahad",
                 "quest": "to find the holy grail", "favorite-color": "blue"])
people.asyncUpdate(Attribute("id") *== 2, newValues: ["favorite-color": "yellow"])
people.asyncDelete(Attribute("id") *== 2)

images.asyncAdd(["id": 1, "image-data": imageData])

To save changes back to disk, call saveRelations:

let result = db.saveRelations()
if let error = result.err {
    // handle error
}

This will save all of the individual relations.

Dropbox/Cloud Sync

Property list storage works well for syncing to Dropbox and other cloud storage, which tend to work with file granularity. Placing a file relation in cloud storage allows the whole relation to be synced as a single unit, and using a directory relation allows individual rows to be added, updated, and deleted independently.

PLRelational doesn't directly talk to cloud storage, but it does have some special facilities to accommodate it. Ultimately there are two kinds of actions that need to be handled: local changes to files which need to be synced to the server, and remote changes to files which need to be synced locally.

Local changes are handled using PlistDatabase's addSaveObserver method. Pass it a function which will be invoked any time any of the relations in the database change a file on disk. It receives the URL of the file that was changed. It can then take whatever action is needed, like making an API call, to sync that file:

db.addSaveObserver({ url in
    cloudAPI.syncLocalFile(url)
})

Local changes are handled by asking the PlistDatabase to make a change to a local URL. To replace a local file with a new version, call replaceLocalFile. This can also be used to add a new file, by giving it the URL to where the new file should go:

func gotNewFile(tmpURL: URL, localFileURL: URL) {
    db.replaceLocalFile(url: localFileURL, movingURL: tmpURL)
}

func gotNewFileVersion(tmpURL: URL, localFileURL: URL) {
    db.replaceLocalFile(url: localFileURL, movingURL: tmpURL)
}

Note that the new file must first be saved to a temporary location, and the PlistDatabase takes care of moving it to its final location. This allows PlistDatabase to read any old data that was in the file previously, which is necessary to generate proper change notifications.

To delete a local file, call deleteLocalFile:

func deleteFile(url: URL) {
    db.deleteLocalFile(url: url)
}

These methods understand the layout of the database's files on disk and will look up the appropriate Relation for any given local URL. When applying the changes, the Relation in question will generate the appropriate change notifications, meaning that your UI remains automatically in sync with any changes generated by cloud sync activity. It's also smart enough to know which files are not part of the database, and will refuse to perform the operation if asked to operate on a URL that doesn't belong to the database. The return value of these methods will tell you whether the action was performed or not, making it easy to sync files outside the PlistDatabase too.

SQLite Databases

SQLite and PLRelational fit together well. SQLite was our first target for persistent storage, and PLRelational's data types match what SQLite provides.

PLRelational's SQLite support starts with the SQLiteDatabase class. As the name indicates, this represents an SQLite database, and it provides the individual tables in the database as Relation objects.

To create an SQLiteDatabase instance, initialize it with a path to the database file:

let db = try SQLiteDatabase(path)

If a database already exists at that location, it will open the existing database. Otherwise it will create a new one.

To create a new table, use the createRelation method:

let people = try db.createRelation(
    "people",
    scheme: ["id", "name", "quest", "favorite-color"]).orThrow()

To fetch an existing table, use subscripting with the table name:

let images = db["people-images"]

For the common case where you want to create the table if it doesn't exist and fetch it if it does, use the getOrCreateRelation method:

let people = try db.getOrCreateRelation(
    "people",
    scheme: ["id", "name", "quest", "favorite-color"]).orThrow()
let images = try db.getOrCreateRelation(
    "people-images",
    scheme: ["id", "image-data"]).orThrow()

As before, once you have the Relations, you can manipulate them with the usual calls:

people.asyncAdd(["id": 1, "name": "Lancelot",
                 "quest": "to find the holy grail", "favorite-color": "blue"])
people.asyncAdd(["id": 2, "name": "Galahad",
                 "quest": "to find the holy grail", "favorite-color": "blue"])
people.asyncUpdate(Attribute("id") *== 2, newValues: ["favorite-color": "yellow"])
people.asyncDelete(Attribute("id") *== 2)

images.asyncAdd(["id": 1, "image-data": Data(imageData)])

Unlike property list storage, the SQLite storage saves changes immediately. The asyncAdd call translates directly to an SQLite INSERT statement, for example. If you want to buffer changes in memory and only flush them to disk with an explicit save, you can accomplish this by wrapping the SQLiteDatabase in a ChangeLoggingDatabase:

let sqliteDB = try SQLiteDatabase(path)
_ = try sqliteDB.getOrCreateRelation(
    "people",
    scheme: ["id", "name", "quest", "favorite-color"]).orThrow()
_ = try sqliteDB.getOrCreateRelation(
    "people-images",
    scheme: ["id", "image-data"]).orThrow()

let db = ChangeLoggingDatabase(sqliteDB)
let people = db["people"]
let images = db["people-images"]

people.asyncAdd(["id": 1, "name": "Lancelot",
                 "quest": "to find the holy grail", "favorite-color": "blue"])
people.asyncAdd(["id": 2, "name": "Galahad",
                 "quest": "to find the holy grail", "favorite-color": "blue"])
people.asyncUpdate(Attribute("id") *== 2, newValues: ["favorite-color": "yellow"])
people.asyncDelete(Attribute("id") *== 2)

images.asyncAdd(["id": 1, "image-data": Data(imageData)])

ChangeLoggingDatabase will record all changes made to its relations rather than passing them directly to the SQLite layer. When you want to persist all changes, call save:

let result = db.save()
if let error = result.err {
    // handle error
}

This passes all of the changes to the SQLite layer, which writes them out all at once.

Advanced: Raw SQL in SQLite

For advanced uses, it's also possible to execute SQL queries directly against the database using the executeQuery method. This takes an SQL string and an optional array of RelationValue parameters and returns Rows for the results. PLRelational uses this to implement RelationTextIndex, which is a full text search API backed by SQLite's fts4 module.

If you use this, it's important to note that there is not a perfect mapping between RelationValue types and SQLite data types. The standard SQL NULL has a lot of weird behaviors. For example, comparing NULL for equality or inequality always produces NULL, which is evaluated as false. That means that, for example, a SELECT statement checking for == NULL or != NULL will always produce zero results regardless of the contents of the table.

SQLite matches this weird behavior in order to be compatible with other SQL databases. PLRelational doesn't have that constraint and so we decided to make NULL behave more consistently, and act like any other value. PLRelational's RelationValue.null is equal to itself, not equal to other values, gets sorted consistently, etc. However, this means that we can't use SQLite's NULL while still translating PLRelational select operations into SQLite SELECT statements. To work around this, we decided not to use SQLite's NULL at all, and instead translate RelationValue.null differently.

Instead, RelationValue.null is translated into an SQLITE blob containing the ASCII bytes "NULL". In order to ensure that actual blobs are never mistakenly interpreted as null, all blobs are prefixed with the ASCII bytes "BLOB". This works fine for normal PLRelational use and this translation step is invisible unless you go digging through the database by hand. However, if you're submitting raw SQL with executeQuery, it might get in the way. The optional bindBlobsRaw parameter allows you to control whether blobs are translated like this while going into and out of SQLite. It's false by default, meaning that translation is performed. By setting it to true, blobs will be passed in and out as-is. Note that this parameter does not influence how NULL is translated.

Try It Out

If you'd like to try out these code snippets or otherwise explore the available options, grab a copy of the official PLRelational repository (or GitHub mirror). The project includes example apps which you can examine and modify, and adding PLRelational to your own project is as easy as dropping the framework in and writing import PLRelational in your code.


Need help? Plausible Labs offers consulting services for software engineering. If you'd like some professional assistance, whether with PLRelational or for something entirely different, consider us. More information can be found on our consulting page.