💾 Archived View for silbernagel.dev › gemlog › distributed-sqlite-with-elixir captured on 2023-09-28 at 15:38:48. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-09-08)

➡️ Next capture (2023-11-04)

-=-=-=-=-=-=-

       __                          __         __  __
 ___ _/ /  ___ ____  ___  __ _____/ /__ ___ _/ /_/ /
/ _ `/ _ \/ _ `/ _ \/ _ \/ // / _  / -_) _ `/ __/ _ \
\_,_/_//_/\_,_/ .__/ .__/\_, /\_,_/\__/\_,_/\__/_//_/
             /_/  /_/   /___/

Distributed SQLite with Elixir

Posted on Sun, January 08 2023

I recently read about a newer SQLite database tool called Litestream which creates backups of the database in S3 compatible storage after every transaction. It will also restore from that backup, so when scaling horizontally to a new server, the latest version of the DB can be restored. Fly.io seems like the ideal platform and Elixir the perfect language for this type of a solution.

Check out Litestream

Check out Fly.io

Why

Getting Started

See the companion Repo for reference

Create a new phoenix app that uses SQLite as the database:

$ mix phx.new distributed_sqlite --database sqlite3

Launch it as a new fly application:

$ fly launch

An environment variable, DATABASE_PATH, is needed to indicate which file to use for the SQLite database. Open the fly.toml file and add DATABASE_PATH = /app/distributed_sqlite.db (use any database name you want here) under the [env] section and try to deploy.

$ flyctl deploy

Success! A Phoenix app running on Fly using SQLite! Now, to see it in action.

Counter Data Model

Lets build a simple, naive counter that just counts the views of each page.

$ mix phx.gen.schema Counter.PageCount page_counts page:string count:integer
$ mix ecto.migrate

Add a Counter module where we can add page view counts

# lib/distributed_sqlite/counter.ex
defmodule DistributedSqlite.Counter do
  alias DistributedSqlite.Counter.PageCount
  alias DistributedSqlite.Repo

  def count_page_view(page_name) do
    page_count = Repo.get_by(PageCount, page: page_name)
    case page_count do
      nil -> 
       %PageCount{}
       |> PageCount.changeset(%{count: 1, page: page_name})
       |> Repo.insert()
      %PageCount{} = page_count ->
        page_count
        |> PageCount.changeset(%{count: page_count.count + 1})
        |> Repo.update()
    end
  end
end

and update the page_controller to count views

  # lib/distributed_sqlite_web/controllers/page_controller.ex
  alias DistributedSqlite.Counter

  def index(conn, _params) do
    page_view = Counter.count_page_view("home")
    render(conn, "index.html", page_count: page_view.count)
  end

lastly, we can display the counter on our page

<!-- lib/distributed_sqlite_web/templates/page/index.html.heex -->
<h1> Page Views <%= @view_count %> </h1>

Now deploy again using flyctl deploy and then browse to your site to validate that the count is showing and updating when refreshing.

Restoring the Database on Deploy

The next problem to deal with is that the database will be wiped on our next deploy since it's using ephemeral storage.

One way to resolve this is to use a persistent volume (which should be done for production apps). But since this post is all about Litestream, so lets set that up and see how can help with this.

The first step is to create a bucket in some S3 compatible storage. I like to use Digital Ocean Spaces for this, but you can also use AWS if you want.

See Litestream docs for more options

Setup a Digital Ocean space

When you have your storage setup, you'll need 3 pieces of information

A configuration file, litestream.yml, is needed for Litestream to function. Create one in the root of the project with the following text (remember to replace the path with YOUR path):

access-key-id: ${LITESTREAM_ACCESS_KEY_ID}
secret-access-key: ${LITESTREAM_SECRET_ACCESS_KEY}

dbs:
  - path: /app/distributed_sql.db
    replicas:
      - url: ${REPLICA_URL}

Now set the three variables in Fly to the values you recorded from when setting up the bucket.

$ flyctl secrets set REPLICA_URL=... LITESTREAM_ACCESS_KEY_ID=... LITESTREAM_SECRET_ACCESS_KEY=...

Next, add Litestream to our Docker image. Add the following lines to Dockerfile as part of the builder phase:

ADD https://github.com/benbjohnson/litestream/releases/download/v0.3.9/litestream-v0.3.9-linux-amd64-static.tar.gz /tmp/litestream.tar.gz
RUN tar -C /usr/local/bin -xzf /tmp/litestream.tar.gz

And in the runner phase add:

COPY --from=builder /usr/local/bin/litestream /usr/local/bin/litestream
COPY litestream.yml /etc/litestream.yml

Finally, update the starting script so that the elixir release is a sub-process of litestream. The easiest way I've found to do this is to create a run script called run.sh with the following content:

#!/bin/bash
set -e

# Restore the database if it does not already exist.
if [ -f /app/distributed_sql.db ]; then
  echo "Database already exists, skipping restore"
else
  echo "No database found, restoring from replica if exists"
  litestream restore -v -if-replica-exists -o /app/distributed_sql.db "${REPLICA_URL}"
fi

# Run migrations
/app/bin/migrate

# Run litestream with your app as the subprocess.
exec litestream replicate -exec "/app/bin/server"
Be sure to remove the migration script from fly.toml since it runs in the run.sh script now.

Now update the Dockerfile to use this new script to start the app:

COPY run.sh /scripts/run.sh
RUN chmod 755 /scripts/run.sh

CMD ["/scripts/run.sh"]

Deploying should now start using Litestream to restore the database on deploys and push backups when data changes. You can verify in the monitoring interface of fly. Look for something similar to the image below:

Fly logs showing that Litestream is running

Distributing

With all of this in place, things would work great when running one instance of you app. But as soon as you add another node things get out of whack. Lets see this in action. Scale the app to 2 and see what happens to the data.

$ flyctl scale count 2

Enough refreshing the browser or opening in different windows/tabs and you'll start to see discrepancies in the view count. This is because we are not replicating the data between the instances. This is a problem Elixir is built for...

Setup the Cluster

Follow the Fly guide to get clustering working correctly

Once clustered, we can begin to replicate our database calls. Add a new GenServer with the following content:

# /lib/distributed_sqlite/repo_replication.ex
defmodule DistributedSqlite.RepoReplication do
  @moduledoc """
  Run on each node to handle replicating Repo writes
  """
  use GenServer

  alias DistributedSqlite.Repo

  def start_link(args) do
    GenServer.start_link(__MODULE__, args, name: __MODULE__)
  end

  @impl true
  def init(_args) do
    {:ok, []}
  end

  def handle_cast({:replicate, query, :insert}, state) do
    Repo.insert!(query)
    {:noreply, state}
  end

  def handle_cast({:replicate, changeset, :update}, state) do
    Repo.update!(changeset)
    {:noreply, state}
  end
end

and make sure to start it in the application.ex

# lib/distributed_sqlite/application.ex
children = [
  ...,
  {DistributedSqlite.RepoReplication, []}
]

Open the DistributedSqlite.Repo file and add a replicate/2 function

@doc """
Replicate the query on the the other nodes in the cluster
"""
def replicate({:ok, data_to_replicate} = ret, operation) when operation in [:insert, :update] do
  _ =
    for node <- Node.list() do
      GenServer.cast(
        {DistributedSqlite.RepoReplication, node},
        {:replicate, data_to_replicate, operation}
      )
    end

  ret
end

def replicate({:error, _changeset} = ret, _), do: ret

def replicate(%Ecto.Changeset{} = changeset, operation) when operation in [:insert, :update] do
  _ =
    for node <- Node.list() do
      GenServer.cast(
        {DistributedSqlite.RepoReplication, node},
        {:replicate, changeset, operation}
      )
    end

  {:ok, changeset}
end

def replicate(schema, :insert) do
  _ =
    for node <- Node.list() do
      GenServer.cast(
        {DistributedSqlite.RepoReplication, node},
        {:replicate, schema, :insert}
      )
    end

  {:ok, schema}
end

This gives us a function we can pipe into from an Repo.insert or the result of a Repo.update. Try this in the DistributedSqlite.Counter module:

case page_count do
  nil ->
    %PageCount{}
    |> PageCount.changeset(%{count: 1, page: page_name})
    |> Repo.insert()
    |> Repo.replicate(:insert)

  %PageCount{} = page_count ->
    page_count
    |> PageCount.changeset(%{count: page_count.count + 1})
    |> Repo.update()
    |> case do
      {:ok, cnt} ->
        cnt
        |> PageCount.replicate_changeset()
        |> Repo.replicate(:update)

        {:ok, cnt}
  end
end

With all of this in place, deploy again. Your data should now be consistent no matter which node your traffic is served from!

Wrap Up

I'm not sure how far this can be pushed and there are downsides to this approach, but I plan on continuing this journey that puts my data as close to the application as possible.

There is another worth exploring that removes the need to replicate the data on the application side. It is still in beta, but I plan on trying it out soon as well.

Another approach - Litefs

My Capsule Links

🏠 Home

🔑 My Public Key

💎 Gemlog