Quick Start

This guide will show you how to do a backup - scrub - restore - cleanup cycle.

Some Vocabulary

Backup source

A block device or image file to be backed up. Benji can’t backup folders or multiple files. The source should not be modified during backup, so it is best to either stop all writes or to create a snapshot.

Storages

One or more data storages (currently supported: filesystem, S3 and B2) to which the backed up data will be saved.

Database backend

An SQL database containing information on how to reassemble the stored blocks to get the original data back. For restores the database backend is not compulsory. See Restoring without a database.

Caution

Benji has been tested successfully with PostgreSQL and SQLite3. While support for MySQL and MariaDB has been kept in mind during development, it is untested- Reports are welcome.

Version

A version is a backup of a specific backup source at a specific point in time. A version is identified by a unique id.

Backup

  1. Minimal configuration:

This represents a minimal configuration with SQLite3 database backend and file-based block storage:

configurationVersion: '1'
databaseEngine: sqlite:////tmp/benji.sqlite
defaultStorage: storage-1
storages:
  - name: storage-1
    storageId: 1
    module: file
    configuration:
      path: /tmp/benji-data
ios:
  - name: file
    module: file

You might want to change the above paths. Benji will run as a normal user without problems, but it will probably need root privileges to access most backup sources.

Please see Configuration for a full list of configuration options.

  1. Initialize the database:

    $ benji database-init
        INFO: $ benji database-init
    

    Note

    Initializing the database multiple times does not destroy any data. Instead it will fail because it finds already existing tables.

  2. Create some demo data:

    For demonstration purpose create a 40MB test file:

    $ dd if=/dev/urandom of=TESTFILE bs=1M count=40
    40+0 records in
    40+0 records out
    41943040 bytes (42 MB, 40 MiB) copied, 0.231886 s, 181 MB/s
    
  3. Backup the image (works similar with a device):

    $ benji backup file:TESTFILE myfirsttestbackup
        INFO: $ benji backup file://TESTFILE myfirsttestbackup
        INFO: Backed up 1/10 blocks (10.0%)
        INFO: Backed up 2/10 blocks (20.0%)
        INFO: Backed up 3/10 blocks (30.0%)
        INFO: Backed up 4/10 blocks (40.0%)
        INFO: Backed up 5/10 blocks (50.0%)
        INFO: Backed up 6/10 blocks (60.0%)
        INFO: Backed up 7/10 blocks (70.0%)
        INFO: Backed up 8/10 blocks (80.0%)
        INFO: Backed up 9/10 blocks (90.0%)
        INFO: Backed up 10/10 blocks (100.0%)
        INFO: Exported version V0000000001 metadata to backend storage.
        INFO: New version: V0000000001 (Tags: [])
    
  4. List backups:

    $ benji ls
        INFO: $ benji ls
    +---------------------+-------------+-------------------+----------+----------+------------+--------+-----------+
    |         date        |     uid     | name              | snapshot |     size | block_size | status | protected |
    +---------------------+-------------+-------------------+----------+----------+------------+--------+-----------+
    | 2018-06-06T21:41:41 | V0000000001 | myfirsttestbackup |          | 41943040 |    4194304 | valid  |   False   |
    +---------------------+-------------+-------------------+----------+----------+------------+--------+-----------+
    

benji ls supports various options to filter the output:

$ benji ls --help
usage: benji ls [-h] [-l] [-s] [filter_expression]

positional arguments:
  filter_expression     Version filter expression

optional arguments:
  -h, --help            show this help message and exit
  -l, --include-labels  Include labels in output
  -s, --include-stats   Include statistics in output

Some commands can also produce machine readable JSON output for usage in scripts:

INFO: $ benji -m ls
{
  "versions": [
    {
      "uid": 1,
      "date": "2018-06-06T19:41:41.936087Z",
      "name": "myfirsttestbackup",
      "snapshot": "",
      "size": 41943040,
      "block_size": 4194304,
      "storage_id": 1,
      "status": "valid",
      "protected": false,
      "bytes_read": 41943040,
      "bytes_written": 41943040,
      "bytes_dedup": 0,
      "bytes_sparse": 0,
      "duration": 0,
      "labels": {}
    }
  ],
  "metadata_version": "2.0.0"
}

Specifying -m also automatically turns down the verbosity level to only output errors. Please see section Machine output for details.

Deep Scrub and Scrub

Deep scrubbing reads all the blocks of a particular version from the storage (or some of them if you use the -p option) and compares the checksums of these blocks to the checksums recorded in the database backend. If you pass the source option (-s) the blocks will also be compared to the original source data.

$ benji deep-scrub v1
    INFO: $ benji deep-scrub v1
    INFO: Deep scrubbed 1/10 blocks (10.0%)
    INFO: Deep scrubbed 2/10 blocks (20.0%)
    INFO: Deep scrubbed 3/10 blocks (30.0%)
    INFO: Deep scrubbed 4/10 blocks (40.0%)
    INFO: Deep scrubbed 5/10 blocks (50.0%)
    INFO: Deep scrubbed 6/10 blocks (60.0%)
    INFO: Deep scrubbed 7/10 blocks (70.0%)
    INFO: Deep scrubbed 8/10 blocks (80.0%)
    INFO: Deep scrubbed 9/10 blocks (90.0%)
    INFO: Deep scrubbed 10/10 blocks (100.0%)

If an error occurs (for example, if some blocks couldn’t be read or a checksum mismatch was detected), the output from deep-scrub looks like this:

$ benji deep-scrub v1
    INFO: $ benji deep-scrub v1
    INFO: Deep scrubbed 1/10 blocks (10.0%)
    INFO: Deep scrubbed 2/10 blocks (20.0%)
    INFO: Deep scrubbed 3/10 blocks (30.0%)
    INFO: Deep scrubbed 4/10 blocks (40.0%)
    INFO: Deep scrubbed 5/10 blocks (50.0%)
    INFO: Deep scrubbed 6/10 blocks (60.0%)
   ERROR: Checksum mismatch during deep scrub of block 6 (UID 1-7) (is: 729a77dc964e5f54... should-be: b70aeb070b95df31...).
    INFO: Marked block invalid (UID 1-7, Checksum b70aeb070b95df31. Affected versions: V0000000001
    INFO: Marked version invalid (UID V0000000001)
    INFO: Deep scrubbed 8/10 blocks (80.0%)
    INFO: Deep scrubbed 9/10 blocks (90.0%)
    INFO: Deep scrubbed 10/10 blocks (100.0%)
   ERROR: Marked version V0000000001 invalid because it has errors.
   ERROR: Deep scrub of version V0000000001 failed.

In case of a scrubbing error the exit code is non-zero. A failed scrub is signaled by EX_IOERR which is 74 on Linux.

Also, the version is marked invalid as you can see here:

$ benji ls
    INFO: $ benji ls
+---------------------+-------------+-------------------+----------+----------+------------+----------+-----------+
|         date        |     uid     | name              | snapshot |     size | block_size | status   | protected |
+---------------------+-------------+-------------------+----------+----------+------------+----------+-----------+
| 2018-06-06T21:41:41 | V0000000001 | myfirsttestbackup |          | 41943040 |    4194304 | invalid  |   False   |
+---------------------+-------------+-------------------+----------+----------+------------+----------+-----------+

Just in case you are able to fix the error, just scrub again and Benji will mark the version as valid again.

There also is a little brother to deep-scrub which only checks metadata consistency and block existence:

$ benji scrub v1
    INFO: $ benji scrub v1
    INFO: Scrubbed 1/10 blocks (10.0%)
    INFO: Scrubbed 2/10 blocks (20.0%)
    INFO: Scrubbed 3/10 blocks (30.0%)
    INFO: Scrubbed 4/10 blocks (40.0%)
    INFO: Scrubbed 5/10 blocks (50.0%)
    INFO: Scrubbed 6/10 blocks (60.0%)
    INFO: Scrubbed 7/10 blocks (70.0%)
    INFO: Scrubbed 8/10 blocks (80.0%)
    INFO: Scrubbed 9/10 blocks (90.0%)
    INFO: Scrubbed 10/10 blocks (100.0%)

scrub will only mark versions as invalid never as valid. This is because there isn’t enough information to determine if the version is really okay when only checking metadata consistency and block existence. A scrub of an invalid version will fail immediately.

Restore

Restore into a file or device:

$ benji restore v1 file://RESTOREFILE
    INFO: $ benji restore v1 file://RESTOREFILE
    INFO: Restored 1/10 blocks (10.0%)
    INFO: Restored 2/10 blocks (20.0%)
    INFO: Restored 3/10 blocks (30.0%)
    INFO: Restored 4/10 blocks (40.0%)
    INFO: Restored 5/10 blocks (50.0%)
    INFO: Restored 6/10 blocks (60.0%)
    INFO: Restored 7/10 blocks (70.0%)
    INFO: Restored 8/10 blocks (80.0%)
    INFO: Restored 9/10 blocks (90.0%)
    INFO: Restored 10/10 blocks (100.0%)

Benji prevents you from restoring into an existing file (or device). So if you try again, it will fail:

$ benji restore v1 file://RESTOREFILE
    INFO: $ benji restore v1 file://RESTOREFILE
   ERROR: Restore target RESTOREFILE already exists. Force the restore if you want to overwrite it.

If you want to overwrite data, you must --force.

Note

For more (and possibly faster) restore methods, please refer to the section Restore.

Version Removal and Cleanup

You can remove any given backup version by:

$ benji rm V0000000001
    INFO: $ benji rm V0000000001
   ERROR: Version V0000000001 is too young. Will not delete it.

What prevents this version from being deleted is the benji.yaml option:

disallowRemoveWhenYounger: 6

However, instead of changing this option, you can simply use --force:

$ benji rm -f v1
    INFO: $ benji rm -f v1
    INFO: Removed version V0000000001 metadata from backend storage.
    INFO: Removed backup version V0000000001 with 10 blocks.

Benji stores each distinct block (identified by its checksum and size) only once. If it encounters another block on the backup source with the same checksum 1, it will only write metadata which refers to the same backup target block. So if a version is deleted, Benji needs to check if there aren’t any other references to the blocks referenced by this version. This may be resource intensive but also introduces race conditions due to other backup sessions running in parallel. This is why there is a separate command to cleanup unreferenced blocks:

$ benji cleanup
    INFO: $ benji cleanup
    INFO: Cleanup: Cleanup finished. 0 false positives, 0 data deletions.

As you can see, nothing has been deleted. The reason for this is that only blocks which have been on the candidate list for a certain time (1h) are considered for deletion to prevent race conditions. If we would have waited on hour after removing the version, we’d get a slightly different output which indicated that ten blocks have been permanently deleted:

$ benji cleanup
    INFO: $ benji cleanup
    INFO: Cleanup: Cleanup finished. 0 false positives, 10 data deletions.
1

Benji uses blake2b with a 32 byte digest size but this can be configured in benji.yaml. blake2b is the recommended hash function as it is very fast on modern computers. However it’s possible to use any other algorithm which is included in PyCryptodome. The maximum supported digest length is 64. Smaller digest lengths have a higher chance of hash collisions which must be avoided. Digest lengths below 32 bytes are not recommended.