dataset/docs/freezefile.rst


Freezefiles and the ``datafreeze`` command
==========================================

``datafreeze`` creates static extracts of SQL databases for use in interactive
web applications. SQL databases are a great way to manage relational data, but
exposing them on the web to drive data apps can be cumbersome. Often, the
capacities of a proper database are not actually required, a few static JSON
files and a bit of JavaScript can have the same effect. Still, exporting JSON
by hand (or with a custom script) can also become a messy process.

With ``datafreeze``, exports are scripted in a Makefile-like description, making them simple to repeat and replicate.


Basic Usage
-----------

Calling DataFreeze is simple, the application is called with a
freeze file as its argument:

.. code-block:: bash

    datafreeze Freezefile.yaml

Freeze files can be either written in JSON or in YAML.


Example Freezefile.yaml
-----------------------

A freeze file is composed of a set of scripted queries and
specifications on how their output is to be handled. An example could look
like this:

.. code-block:: yaml

    common:

      database: "postgresql://user:password@localhost/operational_database"
      prefix: my_project/dumps/
      format: json

    exports:

      - query: "SELECT id, title, date FROM events"
        filename: "index.json"
      
      - query: "SELECT id, title, date, country FROM events"
        filename: "countries/{{country}}.csv"
        format: csv

      - query: "SELECT * FROM events"
        filename: "events/{{id}}.json"
        mode: item

      - query: "SELECT * FROM events"
        filename: "all.json"
        format: tabson

An identical JSON configuration can be found in this repository.


Options in detail
-----------------

The freeze file has two main sections, ``common`` and ``exports``. Both
accept many of the same arguments, with ``exports`` specifying a list of 
exports while ``common`` defines some shared properties, such as the 
database connection string.

The following options are recognized: 

* ``database`` is a database URI, including the database type, username 
  and password, hostname and database name. Valid database types include 
  ``sqlite``, ``mysql`` and ``postgresql`` (requires psycopg2).
* ``prefix`` specifies a common root directory for all extracted files.
* ``format`` identifies the format to be generated, ``csv``, ``json`` and
  ``tabson`` are supported. ``tabson`` is a condensed JSON
  representation in which rows are not represented by objects but by
  lists of values.
* ``query`` needs to be a valid SQL statement. All selected fields will
  become keys or columns in the output, so it may make sense to define 
  proper aliases if any overlap is to be expected.
* ``mode`` specifies whether the query output is to be combined into a 
  single file (``list``) or whether a file should be generated for each 
  result row (``item``).
* ``filename`` is the output file name, appended to ``prefix``. All
  occurences of ``{{field}}`` are expanded to a fields value to allow the
  generation of file names e.g. by primary key. In list mode, templating
  can be used to group records into several buckets, e.g. by country or
  category.
* ``wrap`` can be used to specify whether the output should be wrapped 
  in a ``results`` hash in JSON output. This defaults to ``true`` for 
  ``list``-mode output and ``false`` for ``item``-mode.