merge.

2013-04-03 22:27:06 +02:00 · 2013-04-03 22:27:06 +02:00 · f3533de1a7
commit f3533de1a7
parent 82153522cb 6d83953de3
19 changed files with 456 additions and 622 deletions
--- a/README.md
+++ b/README.md
@ -1,96 +1,8 @@
-SQLAlchemy Loading Tools
+dataset: databases for lazy people
-========================
+==================================
-A collection of wrappers and functions to make SQLAlchemy core easier 
+In short, **dataset** makes reading and writing data in databases as simple as reading and writing JSON files.
 to use in ETL applications. SQLAlchemy is used only for database
 abstraction and not as an ORM, allowing users to write extraction
 scripts that can work with multiple database backends. Functions
 include:
-* **Automatic schema**. If a column is written that does not
+[Read the docs](https://dataset.readthedocs.org/)
  exist on the table, it will be created automatically.
 * **Upserts**. Records are either created or updated, depdending on
  whether an existing version can be found.
 * **Query helpers** for simple queries such as all rows in a table or
  all distinct values across a set of columns.
 Examples
 --------
 A typical use of ``sqlaload`` would look like this:
 	from sqlaload import connect, get_table, distinct, update
 	engine = connect('sqlite:///customers.db')
 	table = get_table(engine, 'customers')
 	for entry in distinct(engine, table, 'post_code', 'city')
    	lon, lat = geocode(entry['post_code'], entry['city'])
 	    update(entry, {'lon': lon, 'lat': lat})
 In this example, we selected all distinct post codes and city names from an imaginary customers database, send them through our geocoding routine and finally updated all matching rows with the returned geo information.
 Another example, updating data in a datastore, might look like this:
 	from sqlaload import connect, get_table, upsert
 	engine = connect('sqlite:///things.db')
 	table = get_table(engine, 'data')
 	for item in magic_data_source_that_produces_entries():
    	assert 'key1' in item
 	    assert 'key2' in item
 	    # this will either insert or update, depending on 
 	    # whether an entry with the matching values for 
 	    # 'key1' and 'key2' already exists:
    	upsert(engine, table, item, ['key1', 'key2'])
 Here's the same example, but using the object-oriented API:
    import sqlaload
    db = sqlaload.create('sqlite:///things.db')
    table = db.get_table('data')
    for item in magic_data_source_that_produces_entries():
        assert 'key1' in item
        assert 'key2' in item
        table.upsert(item, ['key1', 'key2'])
 Functions
 ---------
 The library currently exposes the following functions:
 **Schema management**
 * ``connect(url)``, connect to a database and return an ``engine``. See the [SQLAlchemy documentation](http://docs.sqlalchemy.org/en/rel_0_8/core/engines.html#database-urls) for information about URL schemes and formats.
 * ``get_table(engine, table_name)`` will load a table configuration from the database, either reflecting the existing schema or creating a new table (with an ``id`` column).
 * ``create_table(engine, table_name)`` and ``load_table(engine, table_name)`` are more explicit than ``get_table`` but allow the same functions.
 * ``drop_table(engine, table_name)`` will remove an existing table, deleting all of its contents.
 * ``create_column(engine, table, column_name, type)`` adds a new column to a table, ``type`` must be a SQLAlchemy type class.
 * ``create_index(engine, table, columns)`` creates an index on the given table, based on a list of strings to specify the included ``columns``.
 **Queries**
 * ``find(engine, table, _limit=N, _offset=N, order_by='id', **kw)`` will retrieve database records. The query will return an iterator that only loads 5000 records at any one time, even if ``_limit`` and ``_offset`` are specified - meaning that ``find`` can be run on tables of arbitrary size. ``order_by`` is a string column name, always returned in ascending order. Finally ``**kw`` can be used to filter columns for equality, e.g. ``find(…, category=5)``. 
 * ``find_one(engine, table, **kw)``, like ``find`` but will only return the first matching row or ``None`` if no matches were found. 
 * ``distinct(engine, table, *columns, **kw)`` will return the combined distinct values for ``columns``. ``**kw`` allows filtering the same way it does in ``find``.
 * ``all``, alias for ``find`` without filter options.
 **Adding and updating data**
 * ``add_row(engine, table, row, ensure=True, types={})`` add the values in the dictionary ``row`` to the given ``table``. ``ensure`` will check the schema and create the columns if necessary, their types can be specified using the ``types`` dictionary. If no ``types`` are given, the type will be guessed from the first submitted value of the column, defaulting to a text column. 
 * ``update_row(engine, table, row, unique, ensure=True, types={})`` will update a row or set of rows based on the data in the ``row`` dictionary and the column names specified in ``unique``. The remaining arguments are handled like those in ``add_row``. 
 * ``upsert(engine, table, row, unique, ensure=True, types={})`` will combine the semantics of ``update_row`` and ``add_row`` by first attempting to update existing data and otherwise (only if no record matching on the ``unique`` keys can be found) creating a new record.
 * ``delete(engine, table, **kw)`` will remove records from a table. ``**kw`` is the same as in ``find`` and can be used to limit the set of records to be removed.
 Feedback
 --------
 Please feel free create issues on the GitHub tracker at [okfn/sqlaload](https://github.com/okfn/sqlaload/issues). For other discussions, join the [okfn-labs](http://lists.okfn.org/mailman/listinfo/okfn-labs) mailing list. 
--- a/dataset/init.py
+++ b/dataset/init.py
@ -8,10 +8,11 @@ from dataset.persistence.table import Table
 def connect(url):
-    """ Opens a new connection to a database. *url* can be any valid `SQLAlchemy engine URL`_. Returns
+    """
-    an instance of :py:class:`dataset.Database.
+    Opens a new connection to a database. *url* can be any valid `SQLAlchemy engine URL`_. Returns
-
+    an instance of :py:class:`Database <dataset.Database>`.
    ::
        db = dataset.connect('sqlite:///factbook.db')
    .. _SQLAlchemy Engine URL: http://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine
--- a/dataset/persistence/database.py
+++ b/dataset/persistence/database.py
@ -32,15 +32,23 @@ class Database(object):
    @property
    def tables(self):
-        """ Get a listing of all tables that exist in the database. """
+        """ Get a listing of all tables that exist in the database.
        >>> print db.tables
        set([u'user', u'action'])
        """
        return set(self.metadata.tables.keys() + self._tables.keys())
    def create_table(self, table_name):
-        """ Creates a new table. The new table will automatically have
+        """
-        an `id` column, which is set to be an auto-incrementing integer
+        Creates a new table. The new table will automatically have an `id` column, which is
-        as the primary key of the table.
+        set to be an auto-incrementing integer as the primary key of the table.
-        Returns a :py:class:`dataset.Table` instance."""
+        Returns a :py:class:`Table <dataset.Table>` instance.
        ::
            table = db.create_table('population')
        """
        with self.lock:
            log.debug("Creating table: %s on %r" % (table_name, self.engine))
            table = SQLATable(table_name, self.metadata)
@ -51,12 +59,17 @@ class Database(object):
            return Table(self, table)
    def load_table(self, table_name):
-        """ Loads a table. This will fail if the tables does not already
+        """
        Loads a table. This will fail if the tables does not already
        exist in the database. If the table exists, its columns will be
-        reflected and are available on the :py:class:`dataset.Table`
+        reflected and are available on the :py:class:`Table <dataset.Table>`
        object.
-        Returns a :py:class:`dataset.Table` instance."""
+        Returns a :py:class:`Table <dataset.Table>` instance.
        ::
            table = db.load_table('population')
        """
        with self.lock:
            log.debug("Loading table: %s on %r" % (table_name, self))
            table = SQLATable(table_name, self.metadata, autoload=True)
@ -64,9 +77,17 @@ class Database(object):
            return Table(self, table)
    def get_table(self, table_name):
-        """ Loads a table or creates it if it doesn't exist yet.
+        """
-        Returns a :py:class:`dataset.Table` instance. Alternatively to *get_table*
+        Smart wrapper around *load_table* and *create_table*. Either loads a table
-        you can also get tables using the dict syntax."""
+        or creates it if it doesn't exist yet.
        Returns a :py:class:`Table <dataset.Table>` instance.
        ::
            table = db.get_table('population')
            # you can also use the short-hand syntax:
            table = db['population']
        """
        with self.lock:
            if table_name in self._tables:
                return Table(self, self._tables[table_name])
@ -79,16 +100,16 @@ class Database(object):
        return self.get_table(table_name)
    def query(self, query):
-        """ Run a statement on the database directly, allowing for the
+        """
        Run a statement on the database directly, allowing for the
        execution of arbitrary read/write queries. A query can either be
-        a plain text string, or a SQLAlchemy expression. The returned
+        a plain text string, or a `SQLAlchemy expression <http://docs.sqlalchemy.org/ru/latest/core/tutorial.html#selecting>`_. The returned
        iterator will yield each result sequentially.
        ::
-        .. code-block:: python
+            res = db.query('SELECT user, COUNT(*) c FROM photos GROUP BY user')
-
+            for row in res:
-            result = db.query('SELECT * FROM population WHERE population > 10000000')
+                print row['user'], row['c']
            for row in result:
                print row
        """
        return resultiter(self.engine.execute(query))
--- a/dataset/persistence/table.py
+++ b/dataset/persistence/table.py
@ -17,53 +17,69 @@ class Table(object):
        self.database = database
        self.table = table
    @property
    def columns(self):
        """
        Get a listing of all columns that exist in the table.
        >>> print 'age' in table.columns
        True
        """
        return set(self.table.columns.keys())
    def drop(self):
-        """ Drop the table from the database, deleting both the schema 
+        """
        Drop the table from the database, deleting both the schema
        and all the contents within it.
        Note: the object will be in an unusable state after using this
        command and should not be used again. If you want to re-create
        the table, make sure to get a fresh instance from the
-        :py:class:`dataset.Database`. """
+        :py:class:`Database <dataset.Database>`.
        """
        with self.database.lock:
            self.database.tables.pop(self.table.name, None)
            self.table.drop(engine)
    def insert(self, row, ensure=True, types={}):
-        """ Add a row (type: dict) by inserting it into the database.
+        """
        Add a row (type: dict) by inserting it into the table.
        If ``ensure`` is set, any of the keys of the row are not
        table columns, they will be created automatically.
        During column creation, ``types`` will be checked for a key
        matching the name of a column to be created, and the given
        SQLAlchemy column type will be used. Otherwise, the type is
-        guessed from the row's value, defaulting to a simple unicode
+        guessed from the row value, defaulting to a simple unicode
-        field. """
+        field.
        ::
            data = dict(id=10, title='I am a banana!')
            table.insert(data, ['id'])
        """
        if ensure:
            self._ensure_columns(row, types=types)
        self.database.engine.execute(self.table.insert(row))
-    def update(self, row, unique, ensure=True, types={}):
+    def update(self, row, keys, ensure=True, types={}):
-        """ Update a row in the database. The update is managed via
+        """
-        the set of column names stated in ``unique``: they will be 
+        Update a row in the table. The update is managed via
        the set of column names stated in ``keys``: they will be
        used as filters for the data to be updated, using the values
-        in ``row``. Example:
+        in ``row``.
-
+        ::
        .. code-block:: python
            # update all entries with id matching 10, setting their title columns
            data = dict(id=10, title='I am a banana!')
            table.update(data, ['id'])
        This will update all entries matching the given ``id``, setting
        their ``title`` column.
        If keys in ``row`` update columns not present in the table,
        they will be created based on the settings of ``ensure`` and
-        ``types``, matching the behaviour of ``insert``.
+        ``types``, matching the behaviour of :py:meth:`insert() <dataset.Table.insert>`.
        """
-        if not len(unique):
+        if not len(keys):
            return False
-        clause = [(u, row.get(u)) for u in unique]
+        clause = [(u, row.get(u)) for u in keys]
        if ensure:
            self._ensure_columns(row, types=types)
        try:
@ -71,17 +87,25 @@ class Table(object):
            stmt = self.table.update(filters, row)
            rp = self.database.engine.execute(stmt)
            return rp.rowcount > 0
-        except KeyError, ke:
+        except KeyError:
            return False
-    def upsert(self, row, unique, ensure=True, types={}):
+    def upsert(self, row, keys, ensure=True, types={}):
-        if ensure:
+        """
-            self.create_index(unique)
+        An UPSERT is a smart combination of insert and update. If rows with matching ``keys`` exist
        they will be updated, otherwise a new row is inserted in the table.
        ::
-        if not self.update(row, unique, ensure=ensure, types=types):
+            data = dict(id=10, title='I am a banana!')
            table.upsert(data, ['id'])
        """
        if ensure:
            self.create_index(keys)
        if not self.update(row, keys, ensure=ensure, types=types):
            self.insert(row, ensure=ensure, types=types)
-    def delete(self, **kw):
+    def delete(self, **filter):
        """ Delete rows from the table. Keyword arguments can be used
        to add column-based filters. The filter criterion will always
        be equality:
@ -92,7 +116,7 @@ class Table(object):
        If no arguments are given, all records are deleted. 
        """
-        q = self._args_to_clause(kw)
+        q = self._args_to_clause(filter)
        stmt = self.table.delete(q)
        self.database.engine.execute(stmt)
@ -114,6 +138,13 @@ class Table(object):
        return and_(*clauses)
    def create_column(self, name, type):
        """
        Explicitely create a new column ``name`` of a specified type.
        ``type`` must be a `SQLAlchemy column type <http://docs.sqlalchemy.org/en/rel_0_8/core/types.html>`_.
        ::
            table.create_column('person', sqlalchemy.String)
        """
        with self.database.lock:
            if name not in self.table.columns.keys():
                col = Column(name, type)
@ -121,6 +152,12 @@ class Table(object):
                           connection=self.database.engine)
    def create_index(self, columns, name=None):
        """
        Create an index to speed up queries on a table. If no ``name`` is given a random name is created.
        ::
            table.create_index(['name', 'country'])
        """
        with self.database.lock:
            if not name:
                sig = abs(hash('||'.join(columns)))
@ -136,16 +173,53 @@ class Table(object):
            self.indexes[name] = idx
            return idx
-    def find_one(self, **kw):
+    def find_one(self, **filter):
-        res = list(self.find(_limit=1, **kw))
+        """
        Works just like :py:meth:`find() <dataset.Table.find>` but returns only one result.
        ::
            row = table.find_one(country='United States')
        """
        res = list(self.find(_limit=1, **filter))
        if not len(res):
            return None
        return res[0]
-    def find(self, _limit=None, _step=5000, _offset=0,
+    def _args_to_order_by(self, order_by):
-             order_by='id', **kw):
+        if order_by[0] == '-':
-        order_by = [self.table.c[order_by].asc()]
+            return self.table.c[order_by[1:]].desc()
-        args = self._args_to_clause(kw)
+        else:
            return self.table.c[order_by].asc()
    def find(self, _limit=None, _offset=0, _step=5000,
             order_by='id', **filter):
        """
        Performs a simple search on the table. Simply pass keyword arguments as ``filter``.
        ::
            results = table.find(country='France')
            results = table.find(country='France', year=1980)
        Using ``_limit``::
            # just return the first 10 rows
            results = table.find(country='France', _limit=10)
        You can sort the results by single or multiple columns. Append a minus sign
        to the column name for descending order::
            # sort results by a column 'year'
            results = table.find(country='France', order_by='year')
            # return all rows sorted by multiple columns (by year in descending order)
            results = table.find(order_by=['country', '-year'])
        For more complex queries, please use :py:meth:`db.query() <dataset.Database.query>`
        instead."""
        if isinstance(order_by, (str, unicode)):
            order_by = [order_by]
        order_by = [self._args_to_order_by(o) for o in order_by]
        args = self._args_to_clause(filter)
        for i in count():
            qoffset = _offset + (_step * i)
@ -163,14 +237,29 @@ class Table(object):
                yield row
    def __len__(self):
        """
        Returns the number of rows in the table.
        """
        d = self.database.query(self.table.count()).next()
        return d.values().pop()
-    def distinct(self, *columns, **kw):
+    def distinct(self, *columns, **filter):
        """
        Returns all rows of a table, but removes rows in with duplicate values in ``columns``.
        Interally this creates a `DISTINCT statement <http://www.w3schools.com/sql/sql_distinct.asp>`_.
        ::
            # returns only one row per year, ignoring the rest
            table.distinct('year')
            # works with multiple columns, too
            table.distinct('year', 'country')
            # you can also combine this with a filter
            table.distinct('year', country='China')
        """
        qargs = []
        try:
            columns = [self.table.c[c] for c in columns]
-            for col, val in kw.items():
+            for col, val in filter.items():
                qargs.append(self.table.c[col] == val)
        except KeyError:
            return []
@ -181,8 +270,22 @@ class Table(object):
        return self.database.query(q)
    def all(self):
-        """ Return all records in the table, ordered by their 
+        """
-        ``id``. This is an alias for calling ``find`` without
+        Returns all rows of the table as simple dictionaries. This is simply a shortcut
-        any arguments. """
+        to *find()* called with no arguments.
        ::
            rows = table.all()"""
        return self.find()
    def __iter__(self):
        """
        Allows for iterating over all rows in the table without explicetly
        calling :py:meth:`all() <dataset.Table.all>`.
        ::
            for row in table:
                print row
        """
        for row in self.all():
            yield row
--- a/docs/_static/dataset-logo.png
+++ b/docs/_static/dataset-logo.png
--- a/docs/_static/knight_mozilla_on.jpg
+++ b/docs/_static/knight_mozilla_on.jpg
--- a/docs/_themes/README.rst
+++ b/docs/_themes/README.rst
@ -9,11 +9,11 @@ this guide:
 1. put this folder as _themes into your docs folder. Alternatively
   you can also use git submodules to check out the contents there.
-2. add this to your conf.py: ::
+2. add this to your conf.py:
    sys.path.append(os.path.abspath('_themes'))
    html_theme_path = ['_themes']
-    html_theme = 'flask'
+    html_theme = 'kr'
 The following themes exist:
--- a/docs/_themes/kr/autotoc.html
+++ b/docs/_themes/kr/autotoc.html
@ -0,0 +1,25 @@
 <h3><a href="{{ pathto(master_doc) }}">{{ _('Table Of Contents') }}</a></h3>
 <ul class="custom-index container"></ul>
 <script type="text/javascript">
 $(function() {
    var ul = $('ul.custom-index');
    $('dl.class').each(function(i, el) {
        var name = $('tt.descname', el).html(), id = $('dt', el).attr('id'),
            li = $('<li><a href="#' + id + '">' + name + '</a></li>');
        ul.append(li);
        var ul_ = $('<ul />');
        li.append(ul_);
        $('dl.method', el).each(function(i, el) {
            var name = $('tt.descname', el).html(), id = $('dt', el).attr('id');
            ul_.append('<li><a href="#' + id + '">' + name + '</a></li>');
        });
    });
 });
 </script>
--- a/docs/_themes/kr/layout.html
+++ b/docs/_themes/kr/layout.html
@ -1,71 +1,45 @@
 {%- extends "basic/layout.html" %}
 {%- block extrahead %}
  {{ super() }}
  <link href="//fonts.googleapis.com/css?family=Open+Sans:400|Antic+Slab" rel="stylesheet" type="text/css">
  {% if theme_touch_icon %}
    <link rel="apple-touch-icon" href="{{ pathto('_static/' ~ theme_touch_icon, 1) }}" />
  {% endif %}
  <meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9">
 {% endblock %}
-{%- block relbar2 %}{% endblock %}
+
 {% block sidebar2 %}
  {{ sidebar() }}
 {% endblock %}
 {%- block footer %}
-    <div class="footer">
+    <div class="footer" style="position:relative">
        <a href="http://www.mozillaopennews.org/" style="position:absolute;left:10px;bottom:-10px"><img src="_static/knight_mozilla_on.jpg" /></a>
      &copy; Copyright {{ copyright }}.
    </div>
-    <a href="https://github.com/kennethreitz/requests" class="github">
+    <div>
-        <img style="position: absolute; top: 0; right: 0; border: 0;" src="http://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub"  class="github"/>
+
    </div>
    <a style="position: absolute; top: 0; right: 0; border: 0;" href="https://github.com/pudo/dataset" class="github">
        <img src="http://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub"  class="github"/>
    </a>
    <script type="text/javascript">
    /* <![CDATA[ */
        (function() {
            var s = document.createElement('script'), t = document.getElementsByTagName('script')[0];
            s.type = 'text/javascript';
            s.async = true;
            s.src = 'http://api.flattr.com/js/0.6/load.js?mode=auto';
            t.parentNode.insertBefore(s, t);
        })();
    /* ]]> */
    </script>
        <script type="text/javascript">
    setTimeout(function(){var a=document.createElement("script");
    var b=document.getElementsByTagName("script")[0];
    a.src=document.location.protocol+"//dnn506yrbagrg.cloudfront.net/pages/scripts/0013/7219.js?"+Math.floor(new Date().getTime()/3600000);
    a.async=true;a.type="text/javascript";b.parentNode.insertBefore(a,b)}, 1);
    </script>
    <script type="text/javascript">
        new HelloBar(36402,48802);
    </script>
    <!-- Piwik -->
 <script type="text/javascript">
-
+  var _paq = _paq || [];
-      var _gaq = _gaq || [];
+  _paq.push(["trackPageView"]);
-      _gaq.push(['_setAccount', 'UA-8742933-11']);
+  _paq.push(["enableLinkTracking"]);
      _gaq.push(['_setDomainName', 'none']);
      _gaq.push(['_setAllowLinker', true]);
      _gaq.push(['_trackPageview']);
  (function() {
-        var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+    var u=(("https:" == document.location.protocol) ? "https" : "http") + "://piwik.vis4.net/";
-        ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+    _paq.push(["setTrackerUrl", u+"piwik.php"]);
-        var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    _paq.push(["setSiteId", "17"]);
-      })();
+    var d=document, g=d.createElement("script"), s=d.getElementsByTagName("script")[0]; g.type="text/javascript";
-
+    g.defer=true; g.async=true; g.src=u+"piwik.js"; s.parentNode.insertBefore(g,s);
    </script>
    <script type="text/javascript">
      (function() {
        var t   = document.createElement('script');
        t.type  = 'text/javascript';
        t.async = true;
        t.id    = 'gauges-tracker';
        t.setAttribute('data-site-id',
                       '4ddc27f6613f5d186d000007');
        t.src = '//secure.gaug.es/track.js';
        var s = document.getElementsByTagName('script')[0];
        s.parentNode.insertBefore(t, s);
  })();
 </script>
-
+<!-- End Piwik Code -->
 {%- endblock %}
--- a/docs/_themes/kr/sidebarlogo.html
+++ b/docs/_themes/kr/sidebarlogo.html
@ -0,0 +1,15 @@
 <a style="border:0" href="index.html"><img src="_static/dataset-logo.png" alt="dataset" /></a>
 <p style="font-style:italic;font-size:0.9em; text-align:center; margin-bottom:1.5em">Because managing databases in Python should be as simple as reading and writing JSON files.</p>
 <iframe width="200px" scrolling="0" height="20px" frameborder="0" allowtransparency="true" src="http://ghbtns.com/github-btn.html?user=pudo&amp;repo=dataset&amp;type=watch&amp;count=true&amp;size=small"></iframe>
 <h3>Overview</h3>
 <ul>
    <li><a href="quickstart.html">Quickstart Guide</a></li>
    <li><a href="api.html">API-Documentation</a></li>
    <li><a href="install.html">Installation</a></li>
 </ul>
--- a/docs/_themes/kr/static/flasky.css_t
+++ b/docs/_themes/kr/static/flasky.css_t
@ -14,9 +14,10 @@
 /* -- page layout ----------------------------------------------------------- */
 body {
-    font-family: 'goudy old style', 'minion pro', 'bell mt', Georgia, 'Hiragino Mincho Pro';
+    font-family: "Georgia", "Open Sans", OpenSansRegular, sans-serif;
-    font-size: 17px;
+    font-size: 16px;
-    background-color: white;
+    background: #fff;
    font-weight: 400;
    color: #000;
    margin: 0;
    padding: 0;
@ -45,7 +46,7 @@ hr {
 }
 div.body {
-    background-color: #ffffff;
+    background-color: white;
    color: #3E4349;
    padding: 0 30px 0 30px;
 }
@ -98,11 +99,11 @@ div.sphinxsidebarwrapper p.logo {
 div.sphinxsidebar h3,
 div.sphinxsidebar h4 {
-    font-family: 'Garamond', 'Georgia', serif;
+    font-family: 'Antic Slab' ,'Garamond', 'Georgia', serif;
-    color: #444;
+    color: #000;
    font-size: 24px;
    font-weight: normal;
-    margin: 0 0 5px 0;
+    margin: 30px 0 5px 0;
    padding: 0;
 }
@ -111,7 +112,7 @@ div.sphinxsidebar h4 {
 }
 div.sphinxsidebar h3 a {
-    color: #444;
+    color: #000;
 }
 div.sphinxsidebar p.logo a,
@ -127,7 +128,7 @@ div.sphinxsidebar p {
 }
 div.sphinxsidebar ul {
-    margin: 10px 0;
+    margin: 10px 0px;
    padding: 0;
    color: #000;
 }
@ -156,18 +157,20 @@ div.body h3,
 div.body h4,
 div.body h5,
 div.body h6 {
-    font-family: 'Garamond', 'Georgia', serif;
+    font-family: 'Antic Slab', serif;
    font-weight: normal;
    margin: 30px 0px 10px 0px;
    padding: 0;
    text-shadow: 1px 1px 3px #ddd;
    color: #000;
 }
-div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; }
+div.body h1 { margin-top: 0; padding-top: 0; font-size: 250%; }
-div.body h2 { font-size: 180%; }
+div.body h2 { font-size: 190%; }
-div.body h3 { font-size: 150%; }
+div.body h3 { font-size: 160%; }
-div.body h4 { font-size: 130%; }
+div.body h4 { font-size: 140%; }
-div.body h5 { font-size: 100%; }
+div.body h5 { font-size: 110%; }
-div.body h6 { font-size: 100%; }
+div.body h6 { font-size: 110%; }
 a.headerlink {
    color: #ddd;
@ -244,9 +247,14 @@ p.admonition-title:after {
    content: ":";
 }
-pre, tt {
+pre {
    font-family: 'Consolas', 'Menlo', 'Deja Vu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
-    font-size: 0.9em;
+    font-size: 0.88em;
 }
 tt {
    font-family: 'Consolas', 'Menlo', 'Deja Vu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
    font-size: 0.95em;
 }
 img.screenshot {
@ -342,13 +350,13 @@ pre {
 }
 dl pre, blockquote pre, li pre {
-    margin-left: -60px;
+    margin-left: 0px;
-    padding-left: 60px;
+    padding-left: 15px;
 }
 dl dl pre {
-    margin-left: -90px;
+    margin-left: 0px;
-    padding-left: 90px;
+    padding-left: 15px;
 }
 tt {
@ -359,6 +367,7 @@ tt {
 tt.xref, a tt {
    background-color: #FBFBFB;
    color: #2277bb;
    border-bottom: 1px solid white;
 }
@ -534,3 +543,12 @@ a:hover tt {
 .revsys-inline {
    display: none!important;
 }
 div.sphinxsidebar #searchbox input[type="text"] {
    width: 140px;
    padding: 4px 3px;
 }
 .highlight .nv {
    color: #C65D09!important;
 }
--- a/docs/_themes/kr_small/layout.html
+++ b/docs/_themes/kr_small/layout.html
@ -1,22 +0,0 @@
 {% extends "basic/layout.html" %}
 {% block header %}
  {{ super() }}
  {% if pagename == 'index' %}
  <div class=indexwrapper>
  {% endif %}
 {% endblock %}
 {% block footer %}
  {% if pagename == 'index' %}
  </div>
  {% endif %}
 {% endblock %}
 {# do not display relbars #}
 {% block relbar1 %}{% endblock %}
 {% block relbar2 %}
  {% if theme_github_fork %}
    <a href="http://github.com/{{ theme_github_fork }}"><img style="position: fixed; top: 0; right: 0; border: 0;"
    src="http://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub" /></a>
  {% endif %}
 {% endblock %}
 {% block sidebar1 %}{% endblock %}
 {% block sidebar2 %}{% endblock %}
--- a/docs/_themes/kr_small/static/flasky.css_t
+++ b/docs/_themes/kr_small/static/flasky.css_t
@ -1,287 +0,0 @@
 /*
 * flasky.css_t
 * ~~~~~~~~~~~~
 *
 * Sphinx stylesheet -- flasky theme based on nature theme.
 *
 * :copyright: Copyright 2007-2010 by the Sphinx team, see AUTHORS.
 * :license: BSD, see LICENSE for details.
 *
 */
@import url("basic.css");
 /* -- page layout ----------------------------------------------------------- */
 body {
    font-family: 'Georgia', serif;
    font-size: 17px;
    color: #000;
    background: white;
    margin: 0;
    padding: 0;
 }
 div.documentwrapper {
    float: left;
    width: 100%;
 }
 div.bodywrapper {
    margin: 40px auto 0 auto;
    width: 700px;
 }
 hr {
    border: 1px solid #B1B4B6;
 }
 div.body {
    background-color: #ffffff;
    color: #3E4349;
    padding: 0 30px 30px 30px;
 }
 img.floatingflask {
    padding: 0 0 10px 10px;
    float: right;
 }
 div.footer {
    text-align: right;
    color: #888;
    padding: 10px;
    font-size: 14px;
    width: 650px;
    margin: 0 auto 40px auto;
 }
 div.footer a {
    color: #888;
    text-decoration: underline;
 }
 div.related {
    line-height: 32px;
    color: #888;
 }
 div.related ul {
    padding: 0 0 0 10px;
 }
 div.related a {
    color: #444;
 }
 /* -- body styles ----------------------------------------------------------- */
 a {
    color: #004B6B;
    text-decoration: underline;
 }
 a:hover {
    color: #6D4100;
    text-decoration: underline;
 }
 div.body {
    padding-bottom: 40px; /* saved for footer */
 }
 div.body h1,
 div.body h2,
 div.body h3,
 div.body h4,
 div.body h5,
 div.body h6 {
    font-family: 'Garamond', 'Georgia', serif;
    font-weight: normal;
    margin: 30px 0px 10px 0px;
    padding: 0;
 }
 {% if theme_index_logo %}
 div.indexwrapper h1 {
    text-indent: -999999px;
    background: url({{ theme_index_logo }}) no-repeat center center;
    height: {{ theme_index_logo_height }};
 }
 {% endif %}
 div.body h2 { font-size: 180%; }
 div.body h3 { font-size: 150%; }
 div.body h4 { font-size: 130%; }
 div.body h5 { font-size: 100%; }
 div.body h6 { font-size: 100%; }
 a.headerlink {
    color: white;
    padding: 0 4px;
    text-decoration: none;
 }
 a.headerlink:hover {
    color: #444;
    background: #eaeaea;
 }
 div.body p, div.body dd, div.body li {
    line-height: 1.4em;
 }
 div.admonition {
    background: #fafafa;
    margin: 20px -30px;
    padding: 10px 30px;
    border-top: 1px solid #ccc;
    border-bottom: 1px solid #ccc;
 }
 div.admonition p.admonition-title {
    font-family: 'Garamond', 'Georgia', serif;
    font-weight: normal;
    font-size: 24px;
    margin: 0 0 10px 0;
    padding: 0;
    line-height: 1;
 }
 div.admonition p.last {
    margin-bottom: 0;
 }
 div.highlight{
    background-color: white;
 }
 dt:target, .highlight {
    background: #FAF3E8;
 }
 div.note {
    background-color: #eee;
    border: 1px solid #ccc;
 }
 div.seealso {
    background-color: #ffc;
    border: 1px solid #ff6;
 }
 div.topic {
    background-color: #eee;
 }
 div.warning {
    background-color: #ffe4e4;
    border: 1px solid #f66;
 }
 p.admonition-title {
    display: inline;
 }
 p.admonition-title:after {
    content: ":";
 }
 pre, tt {
    font-family: 'Consolas', 'Menlo', 'Deja Vu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
    font-size: 0.85em;
 }
 img.screenshot {
 }
 tt.descname, tt.descclassname {
    font-size: 0.95em;
 }
 tt.descname {
    padding-right: 0.08em;
 }
 img.screenshot {
    -moz-box-shadow: 2px 2px 4px #eee;
    -webkit-box-shadow: 2px 2px 4px #eee;
    box-shadow: 2px 2px 4px #eee;
 }
 table.docutils {
    border: 1px solid #888;
    -moz-box-shadow: 2px 2px 4px #eee;
    -webkit-box-shadow: 2px 2px 4px #eee;
    box-shadow: 2px 2px 4px #eee;
 }
 table.docutils td, table.docutils th {
    border: 1px solid #888;
    padding: 0.25em 0.7em;
 }
 table.field-list, table.footnote {
    border: none;
    -moz-box-shadow: none;
    -webkit-box-shadow: none;
    box-shadow: none;
 }
 table.footnote {
    margin: 15px 0;
    width: 100%;
    border: 1px solid #eee;
 }
 table.field-list th {
    padding: 0 0.8em 0 0;
 }
 table.field-list td {
    padding: 0;
 }
 table.footnote td {
    padding: 0.5em;
 }
 dl {
    margin: 0;
    padding: 0;
 }
 dl dd {
    margin-left: 30px;
 }
 pre {
    padding: 0;
    margin: 15px -30px;
    padding: 8px;
    line-height: 1.3em;
    padding: 7px 30px;
    background: #eee;
    border-radius: 2px;
    -moz-border-radius: 2px;
    -webkit-border-radius: 2px;
 }
 dl pre {
    margin-left: -60px;
    padding-left: 60px;
 }
 tt {
    background-color: #ecf0f3;
    color: #222;
    /* padding: 1px 2px; */
 }
 tt.xref, a tt {
    background-color: #FBFBFB;
 }
 a:hover tt {
    background: #EEE;
 }
--- a/docs/_themes/kr_small/theme.conf
+++ b/docs/_themes/kr_small/theme.conf
@ -1,10 +0,0 @@
 [theme]
 inherit = basic
 stylesheet = flasky.css
 nosidebar = true
 pygments_style = flask_theme_support.FlaskyStyle
 [options]
 index_logo = ''
 index_logo_height = 120px
 github_fork = ''
--- a/docs/api.rst
+++ b/docs/api.rst
@ -2,29 +2,19 @@
 API documentation
 =================
 .. toctree::
   :maxdepth: 2
 .. autofunction:: dataset.connect
 Database
 --------
 .. autoclass:: dataset.Database
-   :members: get_table, create_table, load_table, query
+   :members: get_table, create_table, load_table, query, tables
-   :undoc-members:
+   :special-members:
 Table
 -----
 Using the *Table* class you can easily store and retreive data from database tables.
 .. autoclass:: dataset.Table
-   :members:
+   :members: columns, drop, insert, update, upsert, find, find_one, distinct, create_column, create_index, all
-   :undoc-members:
+   :special-members: __len__, __iter__
--- a/docs/conf.py
+++ b/docs/conf.py
@ -100,7 +100,7 @@ html_theme = 'kr'
 # further.  For a list of options available for each theme, see the
 # documentation.
 # html_theme_options = {
-#     'codebgcolor': ''
+#     'stickysidebar': "true"
 # }
 # Add any paths that contain custom themes here, relative to this directory.
@ -136,7 +136,11 @@ html_static_path = ['_static']
 #html_use_smartypants = True
 # Custom sidebar templates, maps document names to template names.
-#html_sidebars = {}
+html_sidebars = {
    'index': ['sidebarlogo.html', 'sourcelink.html', 'searchbox.html'],
    'api': ['sidebarlogo.html', 'autotoc.html', 'sourcelink.html', 'searchbox.html'],
    '**': ['sidebarlogo.html', 'localtoc.html', 'sourcelink.html', 'searchbox.html']
 }
 # Additional templates that should be rendered to pages, maps page names to
 # template names.
@ -152,7 +156,7 @@ html_static_path = ['_static']
 #html_split_index = False
 # If true, links to the reST sources are added to the pages.
-#html_show_sourcelink = True
+html_show_sourcelink = False
 # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
 #html_show_sphinx = True
--- a/docs/index.rst
+++ b/docs/index.rst
@ -3,23 +3,58 @@
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.
-Welcome to dataset's documentation!
+dataset: databases for lazy people
-===================================
+==================================
-Getting the databases out of your data's way.
+.. toctree::
   :hidden:
 Although managing data in relational database has plenty of benefits, we find them rarely being used in the typical day-to-day work with small to medium scale datasets. But why is that? Why do we see an awful lot of data stored in static files in CSV or JSON format?
 The answer is that **programmers are lazy**, and thus they tend to prefer the easiest solution they find. And in **Python**, a database wasn't the simplest solution for storing a bunch of structured data. This is what **dataset** is going to change!
 In short, dataset combines the straightforwardness of NoSQL interfaces with the full power and flexibility of relational databases. It makes database management as simple as reading and writing JSON files.
 ::
   import dataset
   db = dataset.connect('sqlite:///:memory:')
   table = db['sometable']
   table.insert(dict(name='John Doe', age=37))
   table.insert(dict(name='Jane Doe', age=34, gender='female'))
   john = table.find_one(name='John Doe')
 Here is `similar code, without dataset <https://gist.github.com/gka/5296492>`_.
 Features
 --------
 * **Automatic schema**: If a table or column is written that does not
  exist in the database, it will be created automatically.
 * **Upserts**: Records are either created or updated, depending on
  whether an existing version can be found.
 * **Query helpers** for simple queries such as :py:meth:`all <dataset.Table.all>` rows in a table or
  all :py:meth:`distinct <dataset.Table.distinct>` values across a set of columns.
 * **Compatibility**: Being built on top of `SQLAlchemy <http://www.sqlalchemy.org/>`_, ``dataset`` works with all major databases, such as SQLite, PostgreSQL and MySQL.
 Contents
 --------
 .. toctree::
   :maxdepth: 2
-* `Learn how to use dataset in five minutes <quickstart>`_
+   quickstart
-* `Browse the complete API docs <api>`_
+   api
 Contributors
 ------------
 ``dataset`` is written and maintained by `Friedrich Lindenberg <https://github.com/pudo>`_ and `Gregor Aisch <https://github.com/gka>`_. Its code is largely based on the preceding libraries `sqlaload <https://github.com/okfn/sqlaload>`_ and `datafreeze <https://github.com/spiegelonline/datafreeze>`_. And of course, we're standing on the `shoulders of giants <http://www.sqlalchemy.org/>`_.
-Indices and tables
+Our cute little `naked mole rat <http://www.youtube.com/watch?feature=player_detailpage&v=A5DcOEzW1wA#t=14s>`_ was drawn by `Johannes Koch <http://chechuchape.com/>`_.
 ==================
 * :ref:`genindex`
 * :ref:`modindex`
 * :ref:`search`
--- a/docs/install.rst
+++ b/docs/install.rst
@ -0,0 +1,20 @@
 Installation Guide
 ==================
 *— work in progress —*
 The easiest way is to install ``dataset`` from the `Python Package Index <https://pypi.python.org/pypi/dataset/>`_ using ``pip`` or ``easy_install``:
 .. code-block:: bash
   $ easy_install dataset
 To install it manually simply download the repository from Github:
 .. code-block:: bash
   $ git clone git://github.com/pudo/dataset.git
   $ cd dataset/
   $ python setup.py install
--- a/docs/quickstart.rst
+++ b/docs/quickstart.rst
@ -3,41 +3,36 @@ Quickstart
 ==========
-.. toctree::
+Hi, welcome to the twelve-minute quick-start tutorial.
   :maxdepth: 2
-Hi, welcome to the five-minute quick-start tutorial.
+Connecting to a database
 ------------------------
 At first you need to import the dataset package :) ::
   import dataset
-To connect to a database you need to identify it using what is called an engine url. Here are a few examples::
+To connect to a database you need to identify it by its `URL <http://docs.sqlalchemy.org/en/latest/core/engines.html#engine-creation-api>`_, which basically is a string of the form ``"dialect://user:password@host/dbname"``. Here are a few common examples::
   # connecting to a SQLite database
-   db = dataset.connect('sqlite:///factbook.db')
+   db = dataset.connect('sqlite:///mydatabase.db')
-   # connecting to a MySQL database
+   # connecting to a MySQL database with user and password
   db = dataset.connect('mysql://user:password@localhost/mydatabase')
   # connecting to a PostgreSQL database
   db = dataset.connect('postgresql://scott:tiger@localhost:5432/mydatabase')
 If you want to learn more about how to connect to various databases, have a look at the `SQLAlchemy documentation`_.
 .. _SQLAlchemy documentation: http://docs.sqlalchemy.org/en/latest/core/engines.html#engine-creation-api
 Storing data
 ------------
-At first you need to get a reference to the table in which you want to store your data. You don't
+To store some data you need to get a reference to a table. You don't need to worry about whether the table already exists or not, since dataset will create it automatically::
 need to worry about whether the table already exists or not, since dataset will create it automatically::
   # get a reference to the table 'person'
   table = db['person']
-Now storing data in a table is a matter of a single function call. Just pass a `dict`_ to *insert*. Note
+Now storing data in a table is a matter of a single function call. Just pass a `dict`_ to *insert*. Note that you don't need to create the columns *name* and *age* – dataset will do this automatically::
 that you don't need to create the columns *name* and *age* – dataset will do this automatically::
   # Insert a new record.
   table.insert(dict(name='John Doe', age=46))
@ -54,37 +49,77 @@ Updating existing entries is easy, too::
   table.update(dict(name='John Doe', age=47), ['name'])
 Inspecting databases and tables
 -------------------------------
 When dealing with unknown databases we might want to check its structure first. To begin with, let's find out what tables are stored in the database:
   >>> print db.tables
   set([u'user', u'action'])
 Now, let's list all columns available in the table ``user``:
   >>> print db['user'].columns
   set([u'id', u'name', u'email', u'pwd', u'country'])
 Using ``len()`` we can get the total number of rows in a table:
   >>> print len(db['user'])
   187
 Reading data from tables
 ------------------------
-Checking::
+Now let's get some real data out of the table::
-   table = db['population']
+   users = db['user'].all()
-   # Let's grab a list of all items/rows/entries in the table:
+If we simply want to iterate over all rows in a table, we can ommit :py:meth:`all() <dataset.Table.all>`::
   table.all()
-   table.distinct()
+   for user in db['user']:
      print user['email']
-Searching for specific entries::
+We can search for specific entries using :py:meth:`find() <dataset.Table.find>` and :py:meth:`find_one() <dataset.Table.find_one>`::
-   # Returns the first item where the column country equals 'China'
+   # All users from China
-   table.find_one(country='China')
+   users = table.find(country='China')
-   # Returns all items
+   # Get a specific user
-   table.find(country='China')
+   john = table.find_one(name='John Doe')
-Querying data
+Using  :py:meth:`distinct() <dataset.Table.distinct>` we can grab a set of rows with unique values in one or more columns::
 -------------
-Querying data is easy. Dataset returns an iteratable result object::
+   # Get one user per country
   db['user'].distinct('country')
-   result = db.query('SELECT ...')
+
 Running custom SQL queries
 --------------------------
 Of course the main reason you're using a database is that you want to use the full power of SQL queries. Here's how you run them with ``dataset``::
   result = db.query('SELECT country, COUNT(*) c FROM user GROUP BY country')
   for row in result:
-      print row
+      print row['country'], row['c']
 Freezing your data
 ------------------
 Exporting data
 --------------
 While playing around with our database in Python is a nice thing, sometimes we want to use the data –or parts of it– elsewhere, say in an interactive web application. Therefor ``dataset`` supports serializing rows of data into static files such as JSON using the :py:meth:`freeze() <dataset.freeze>` function::
   # export all users into a single JSON
   result = db['users'].all()
   dataset.freeze(result, 'users.json')
 You can create one file per row by setting ``mode`` to "item"::
   # export one JSON file per user
   dataset.freeze(result, 'users/{{ id }}.json', mode='item')
 Since this is a common operation we made it available via command line utility ``datafreeze``. Read more about the `freezefile markup <https://github.com/spiegelonline/datafreeze#example-freezefileyaml>`_.
 .. code-block:: bash
   $ datafreeze freezefile.yaml