This commit is contained in:
Friedrich Lindenberg 2013-04-03 22:27:06 +02:00
commit f3533de1a7
19 changed files with 456 additions and 622 deletions

View File

@ -1,96 +1,8 @@
SQLAlchemy Loading Tools dataset: databases for lazy people
======================== ==================================
A collection of wrappers and functions to make SQLAlchemy core easier In short, **dataset** makes reading and writing data in databases as simple as reading and writing JSON files.
to use in ETL applications. SQLAlchemy is used only for database
abstraction and not as an ORM, allowing users to write extraction
scripts that can work with multiple database backends. Functions
include:
* **Automatic schema**. If a column is written that does not [Read the docs](https://dataset.readthedocs.org/)
exist on the table, it will be created automatically.
* **Upserts**. Records are either created or updated, depdending on
whether an existing version can be found.
* **Query helpers** for simple queries such as all rows in a table or
all distinct values across a set of columns.
Examples
--------
A typical use of ``sqlaload`` would look like this:
from sqlaload import connect, get_table, distinct, update
engine = connect('sqlite:///customers.db')
table = get_table(engine, 'customers')
for entry in distinct(engine, table, 'post_code', 'city')
lon, lat = geocode(entry['post_code'], entry['city'])
update(entry, {'lon': lon, 'lat': lat})
In this example, we selected all distinct post codes and city names from an imaginary customers database, send them through our geocoding routine and finally updated all matching rows with the returned geo information.
Another example, updating data in a datastore, might look like this:
from sqlaload import connect, get_table, upsert
engine = connect('sqlite:///things.db')
table = get_table(engine, 'data')
for item in magic_data_source_that_produces_entries():
assert 'key1' in item
assert 'key2' in item
# this will either insert or update, depending on
# whether an entry with the matching values for
# 'key1' and 'key2' already exists:
upsert(engine, table, item, ['key1', 'key2'])
Here's the same example, but using the object-oriented API:
import sqlaload
db = sqlaload.create('sqlite:///things.db')
table = db.get_table('data')
for item in magic_data_source_that_produces_entries():
assert 'key1' in item
assert 'key2' in item
table.upsert(item, ['key1', 'key2'])
Functions
---------
The library currently exposes the following functions:
**Schema management**
* ``connect(url)``, connect to a database and return an ``engine``. See the [SQLAlchemy documentation](http://docs.sqlalchemy.org/en/rel_0_8/core/engines.html#database-urls) for information about URL schemes and formats.
* ``get_table(engine, table_name)`` will load a table configuration from the database, either reflecting the existing schema or creating a new table (with an ``id`` column).
* ``create_table(engine, table_name)`` and ``load_table(engine, table_name)`` are more explicit than ``get_table`` but allow the same functions.
* ``drop_table(engine, table_name)`` will remove an existing table, deleting all of its contents.
* ``create_column(engine, table, column_name, type)`` adds a new column to a table, ``type`` must be a SQLAlchemy type class.
* ``create_index(engine, table, columns)`` creates an index on the given table, based on a list of strings to specify the included ``columns``.
**Queries**
* ``find(engine, table, _limit=N, _offset=N, order_by='id', **kw)`` will retrieve database records. The query will return an iterator that only loads 5000 records at any one time, even if ``_limit`` and ``_offset`` are specified - meaning that ``find`` can be run on tables of arbitrary size. ``order_by`` is a string column name, always returned in ascending order. Finally ``**kw`` can be used to filter columns for equality, e.g. ``find(…, category=5)``.
* ``find_one(engine, table, **kw)``, like ``find`` but will only return the first matching row or ``None`` if no matches were found.
* ``distinct(engine, table, *columns, **kw)`` will return the combined distinct values for ``columns``. ``**kw`` allows filtering the same way it does in ``find``.
* ``all``, alias for ``find`` without filter options.
**Adding and updating data**
* ``add_row(engine, table, row, ensure=True, types={})`` add the values in the dictionary ``row`` to the given ``table``. ``ensure`` will check the schema and create the columns if necessary, their types can be specified using the ``types`` dictionary. If no ``types`` are given, the type will be guessed from the first submitted value of the column, defaulting to a text column.
* ``update_row(engine, table, row, unique, ensure=True, types={})`` will update a row or set of rows based on the data in the ``row`` dictionary and the column names specified in ``unique``. The remaining arguments are handled like those in ``add_row``.
* ``upsert(engine, table, row, unique, ensure=True, types={})`` will combine the semantics of ``update_row`` and ``add_row`` by first attempting to update existing data and otherwise (only if no record matching on the ``unique`` keys can be found) creating a new record.
* ``delete(engine, table, **kw)`` will remove records from a table. ``**kw`` is the same as in ``find`` and can be used to limit the set of records to be removed.
Feedback
--------
Please feel free create issues on the GitHub tracker at [okfn/sqlaload](https://github.com/okfn/sqlaload/issues). For other discussions, join the [okfn-labs](http://lists.okfn.org/mailman/listinfo/okfn-labs) mailing list.

View File

@ -8,10 +8,11 @@ from dataset.persistence.table import Table
def connect(url): def connect(url):
""" Opens a new connection to a database. *url* can be any valid `SQLAlchemy engine URL`_. Returns """
an instance of :py:class:`dataset.Database. Opens a new connection to a database. *url* can be any valid `SQLAlchemy engine URL`_. Returns
an instance of :py:class:`Database <dataset.Database>`.
:: ::
db = dataset.connect('sqlite:///factbook.db') db = dataset.connect('sqlite:///factbook.db')
.. _SQLAlchemy Engine URL: http://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine .. _SQLAlchemy Engine URL: http://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine

View File

@ -32,15 +32,23 @@ class Database(object):
@property @property
def tables(self): def tables(self):
""" Get a listing of all tables that exist in the database. """ """ Get a listing of all tables that exist in the database.
>>> print db.tables
set([u'user', u'action'])
"""
return set(self.metadata.tables.keys() + self._tables.keys()) return set(self.metadata.tables.keys() + self._tables.keys())
def create_table(self, table_name): def create_table(self, table_name):
""" Creates a new table. The new table will automatically have """
an `id` column, which is set to be an auto-incrementing integer Creates a new table. The new table will automatically have an `id` column, which is
as the primary key of the table. set to be an auto-incrementing integer as the primary key of the table.
Returns a :py:class:`dataset.Table` instance.""" Returns a :py:class:`Table <dataset.Table>` instance.
::
table = db.create_table('population')
"""
with self.lock: with self.lock:
log.debug("Creating table: %s on %r" % (table_name, self.engine)) log.debug("Creating table: %s on %r" % (table_name, self.engine))
table = SQLATable(table_name, self.metadata) table = SQLATable(table_name, self.metadata)
@ -51,12 +59,17 @@ class Database(object):
return Table(self, table) return Table(self, table)
def load_table(self, table_name): def load_table(self, table_name):
""" Loads a table. This will fail if the tables does not already """
Loads a table. This will fail if the tables does not already
exist in the database. If the table exists, its columns will be exist in the database. If the table exists, its columns will be
reflected and are available on the :py:class:`dataset.Table` reflected and are available on the :py:class:`Table <dataset.Table>`
object. object.
Returns a :py:class:`dataset.Table` instance.""" Returns a :py:class:`Table <dataset.Table>` instance.
::
table = db.load_table('population')
"""
with self.lock: with self.lock:
log.debug("Loading table: %s on %r" % (table_name, self)) log.debug("Loading table: %s on %r" % (table_name, self))
table = SQLATable(table_name, self.metadata, autoload=True) table = SQLATable(table_name, self.metadata, autoload=True)
@ -64,9 +77,17 @@ class Database(object):
return Table(self, table) return Table(self, table)
def get_table(self, table_name): def get_table(self, table_name):
""" Loads a table or creates it if it doesn't exist yet. """
Returns a :py:class:`dataset.Table` instance. Alternatively to *get_table* Smart wrapper around *load_table* and *create_table*. Either loads a table
you can also get tables using the dict syntax.""" or creates it if it doesn't exist yet.
Returns a :py:class:`Table <dataset.Table>` instance.
::
table = db.get_table('population')
# you can also use the short-hand syntax:
table = db['population']
"""
with self.lock: with self.lock:
if table_name in self._tables: if table_name in self._tables:
return Table(self, self._tables[table_name]) return Table(self, self._tables[table_name])
@ -79,16 +100,16 @@ class Database(object):
return self.get_table(table_name) return self.get_table(table_name)
def query(self, query): def query(self, query):
""" Run a statement on the database directly, allowing for the """
Run a statement on the database directly, allowing for the
execution of arbitrary read/write queries. A query can either be execution of arbitrary read/write queries. A query can either be
a plain text string, or a SQLAlchemy expression. The returned a plain text string, or a `SQLAlchemy expression <http://docs.sqlalchemy.org/ru/latest/core/tutorial.html#selecting>`_. The returned
iterator will yield each result sequentially. iterator will yield each result sequentially.
::
.. code-block:: python res = db.query('SELECT user, COUNT(*) c FROM photos GROUP BY user')
for row in res:
result = db.query('SELECT * FROM population WHERE population > 10000000') print row['user'], row['c']
for row in result:
print row
""" """
return resultiter(self.engine.execute(query)) return resultiter(self.engine.execute(query))

View File

@ -17,53 +17,69 @@ class Table(object):
self.database = database self.database = database
self.table = table self.table = table
@property
def columns(self):
"""
Get a listing of all columns that exist in the table.
>>> print 'age' in table.columns
True
"""
return set(self.table.columns.keys())
def drop(self): def drop(self):
""" Drop the table from the database, deleting both the schema """
Drop the table from the database, deleting both the schema
and all the contents within it. and all the contents within it.
Note: the object will be in an unusable state after using this Note: the object will be in an unusable state after using this
command and should not be used again. If you want to re-create command and should not be used again. If you want to re-create
the table, make sure to get a fresh instance from the the table, make sure to get a fresh instance from the
:py:class:`dataset.Database`. """ :py:class:`Database <dataset.Database>`.
"""
with self.database.lock: with self.database.lock:
self.database.tables.pop(self.table.name, None) self.database.tables.pop(self.table.name, None)
self.table.drop(engine) self.table.drop(engine)
def insert(self, row, ensure=True, types={}): def insert(self, row, ensure=True, types={}):
""" Add a row (type: dict) by inserting it into the database. """
Add a row (type: dict) by inserting it into the table.
If ``ensure`` is set, any of the keys of the row are not If ``ensure`` is set, any of the keys of the row are not
table columns, they will be created automatically. table columns, they will be created automatically.
During column creation, ``types`` will be checked for a key During column creation, ``types`` will be checked for a key
matching the name of a column to be created, and the given matching the name of a column to be created, and the given
SQLAlchemy column type will be used. Otherwise, the type is SQLAlchemy column type will be used. Otherwise, the type is
guessed from the row's value, defaulting to a simple unicode guessed from the row value, defaulting to a simple unicode
field. """ field.
::
data = dict(id=10, title='I am a banana!')
table.insert(data, ['id'])
"""
if ensure: if ensure:
self._ensure_columns(row, types=types) self._ensure_columns(row, types=types)
self.database.engine.execute(self.table.insert(row)) self.database.engine.execute(self.table.insert(row))
def update(self, row, unique, ensure=True, types={}): def update(self, row, keys, ensure=True, types={}):
""" Update a row in the database. The update is managed via """
the set of column names stated in ``unique``: they will be Update a row in the table. The update is managed via
the set of column names stated in ``keys``: they will be
used as filters for the data to be updated, using the values used as filters for the data to be updated, using the values
in ``row``. Example: in ``row``.
::
.. code-block:: python
# update all entries with id matching 10, setting their title columns
data = dict(id=10, title='I am a banana!') data = dict(id=10, title='I am a banana!')
table.update(data, ['id']) table.update(data, ['id'])
This will update all entries matching the given ``id``, setting
their ``title`` column.
If keys in ``row`` update columns not present in the table, If keys in ``row`` update columns not present in the table,
they will be created based on the settings of ``ensure`` and they will be created based on the settings of ``ensure`` and
``types``, matching the behaviour of ``insert``. ``types``, matching the behaviour of :py:meth:`insert() <dataset.Table.insert>`.
""" """
if not len(unique): if not len(keys):
return False return False
clause = [(u, row.get(u)) for u in unique] clause = [(u, row.get(u)) for u in keys]
if ensure: if ensure:
self._ensure_columns(row, types=types) self._ensure_columns(row, types=types)
try: try:
@ -71,17 +87,25 @@ class Table(object):
stmt = self.table.update(filters, row) stmt = self.table.update(filters, row)
rp = self.database.engine.execute(stmt) rp = self.database.engine.execute(stmt)
return rp.rowcount > 0 return rp.rowcount > 0
except KeyError, ke: except KeyError:
return False return False
def upsert(self, row, unique, ensure=True, types={}): def upsert(self, row, keys, ensure=True, types={}):
if ensure: """
self.create_index(unique) An UPSERT is a smart combination of insert and update. If rows with matching ``keys`` exist
they will be updated, otherwise a new row is inserted in the table.
::
if not self.update(row, unique, ensure=ensure, types=types): data = dict(id=10, title='I am a banana!')
table.upsert(data, ['id'])
"""
if ensure:
self.create_index(keys)
if not self.update(row, keys, ensure=ensure, types=types):
self.insert(row, ensure=ensure, types=types) self.insert(row, ensure=ensure, types=types)
def delete(self, **kw): def delete(self, **filter):
""" Delete rows from the table. Keyword arguments can be used """ Delete rows from the table. Keyword arguments can be used
to add column-based filters. The filter criterion will always to add column-based filters. The filter criterion will always
be equality: be equality:
@ -92,7 +116,7 @@ class Table(object):
If no arguments are given, all records are deleted. If no arguments are given, all records are deleted.
""" """
q = self._args_to_clause(kw) q = self._args_to_clause(filter)
stmt = self.table.delete(q) stmt = self.table.delete(q)
self.database.engine.execute(stmt) self.database.engine.execute(stmt)
@ -114,6 +138,13 @@ class Table(object):
return and_(*clauses) return and_(*clauses)
def create_column(self, name, type): def create_column(self, name, type):
"""
Explicitely create a new column ``name`` of a specified type.
``type`` must be a `SQLAlchemy column type <http://docs.sqlalchemy.org/en/rel_0_8/core/types.html>`_.
::
table.create_column('person', sqlalchemy.String)
"""
with self.database.lock: with self.database.lock:
if name not in self.table.columns.keys(): if name not in self.table.columns.keys():
col = Column(name, type) col = Column(name, type)
@ -121,6 +152,12 @@ class Table(object):
connection=self.database.engine) connection=self.database.engine)
def create_index(self, columns, name=None): def create_index(self, columns, name=None):
"""
Create an index to speed up queries on a table. If no ``name`` is given a random name is created.
::
table.create_index(['name', 'country'])
"""
with self.database.lock: with self.database.lock:
if not name: if not name:
sig = abs(hash('||'.join(columns))) sig = abs(hash('||'.join(columns)))
@ -136,16 +173,53 @@ class Table(object):
self.indexes[name] = idx self.indexes[name] = idx
return idx return idx
def find_one(self, **kw): def find_one(self, **filter):
res = list(self.find(_limit=1, **kw)) """
Works just like :py:meth:`find() <dataset.Table.find>` but returns only one result.
::
row = table.find_one(country='United States')
"""
res = list(self.find(_limit=1, **filter))
if not len(res): if not len(res):
return None return None
return res[0] return res[0]
def find(self, _limit=None, _step=5000, _offset=0, def _args_to_order_by(self, order_by):
order_by='id', **kw): if order_by[0] == '-':
order_by = [self.table.c[order_by].asc()] return self.table.c[order_by[1:]].desc()
args = self._args_to_clause(kw) else:
return self.table.c[order_by].asc()
def find(self, _limit=None, _offset=0, _step=5000,
order_by='id', **filter):
"""
Performs a simple search on the table. Simply pass keyword arguments as ``filter``.
::
results = table.find(country='France')
results = table.find(country='France', year=1980)
Using ``_limit``::
# just return the first 10 rows
results = table.find(country='France', _limit=10)
You can sort the results by single or multiple columns. Append a minus sign
to the column name for descending order::
# sort results by a column 'year'
results = table.find(country='France', order_by='year')
# return all rows sorted by multiple columns (by year in descending order)
results = table.find(order_by=['country', '-year'])
For more complex queries, please use :py:meth:`db.query() <dataset.Database.query>`
instead."""
if isinstance(order_by, (str, unicode)):
order_by = [order_by]
order_by = [self._args_to_order_by(o) for o in order_by]
args = self._args_to_clause(filter)
for i in count(): for i in count():
qoffset = _offset + (_step * i) qoffset = _offset + (_step * i)
@ -163,14 +237,29 @@ class Table(object):
yield row yield row
def __len__(self): def __len__(self):
"""
Returns the number of rows in the table.
"""
d = self.database.query(self.table.count()).next() d = self.database.query(self.table.count()).next()
return d.values().pop() return d.values().pop()
def distinct(self, *columns, **kw): def distinct(self, *columns, **filter):
"""
Returns all rows of a table, but removes rows in with duplicate values in ``columns``.
Interally this creates a `DISTINCT statement <http://www.w3schools.com/sql/sql_distinct.asp>`_.
::
# returns only one row per year, ignoring the rest
table.distinct('year')
# works with multiple columns, too
table.distinct('year', 'country')
# you can also combine this with a filter
table.distinct('year', country='China')
"""
qargs = [] qargs = []
try: try:
columns = [self.table.c[c] for c in columns] columns = [self.table.c[c] for c in columns]
for col, val in kw.items(): for col, val in filter.items():
qargs.append(self.table.c[col] == val) qargs.append(self.table.c[col] == val)
except KeyError: except KeyError:
return [] return []
@ -181,8 +270,22 @@ class Table(object):
return self.database.query(q) return self.database.query(q)
def all(self): def all(self):
""" Return all records in the table, ordered by their """
``id``. This is an alias for calling ``find`` without Returns all rows of the table as simple dictionaries. This is simply a shortcut
any arguments. """ to *find()* called with no arguments.
::
rows = table.all()"""
return self.find() return self.find()
def __iter__(self):
"""
Allows for iterating over all rows in the table without explicetly
calling :py:meth:`all() <dataset.Table.all>`.
::
for row in table:
print row
"""
for row in self.all():
yield row

BIN
docs/_static/dataset-logo.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

BIN
docs/_static/knight_mozilla_on.jpg vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.8 KiB

View File

@ -9,11 +9,11 @@ this guide:
1. put this folder as _themes into your docs folder. Alternatively 1. put this folder as _themes into your docs folder. Alternatively
you can also use git submodules to check out the contents there. you can also use git submodules to check out the contents there.
2. add this to your conf.py: :: 2. add this to your conf.py:
sys.path.append(os.path.abspath('_themes')) sys.path.append(os.path.abspath('_themes'))
html_theme_path = ['_themes'] html_theme_path = ['_themes']
html_theme = 'flask' html_theme = 'kr'
The following themes exist: The following themes exist:

25
docs/_themes/kr/autotoc.html vendored Normal file
View File

@ -0,0 +1,25 @@
<h3><a href="{{ pathto(master_doc) }}">{{ _('Table Of Contents') }}</a></h3>
<ul class="custom-index container"></ul>
<script type="text/javascript">
$(function() {
var ul = $('ul.custom-index');
$('dl.class').each(function(i, el) {
var name = $('tt.descname', el).html(), id = $('dt', el).attr('id'),
li = $('<li><a href="#' + id + '">' + name + '</a></li>');
ul.append(li);
var ul_ = $('<ul />');
li.append(ul_);
$('dl.method', el).each(function(i, el) {
var name = $('tt.descname', el).html(), id = $('dt', el).attr('id');
ul_.append('<li><a href="#' + id + '">' + name + '</a></li>');
});
});
});
</script>

View File

@ -1,71 +1,45 @@
{%- extends "basic/layout.html" %} {%- extends "basic/layout.html" %}
{%- block extrahead %} {%- block extrahead %}
{{ super() }} {{ super() }}
<link href="//fonts.googleapis.com/css?family=Open+Sans:400|Antic+Slab" rel="stylesheet" type="text/css">
{% if theme_touch_icon %} {% if theme_touch_icon %}
<link rel="apple-touch-icon" href="{{ pathto('_static/' ~ theme_touch_icon, 1) }}" /> <link rel="apple-touch-icon" href="{{ pathto('_static/' ~ theme_touch_icon, 1) }}" />
{% endif %} {% endif %}
<meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9"> <meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9">
{% endblock %} {% endblock %}
{%- block relbar2 %}{% endblock %}
{% block sidebar2 %}
{{ sidebar() }}
{% endblock %}
{%- block footer %} {%- block footer %}
<div class="footer"> <div class="footer" style="position:relative">
<a href="http://www.mozillaopennews.org/" style="position:absolute;left:10px;bottom:-10px"><img src="_static/knight_mozilla_on.jpg" /></a>
&copy; Copyright {{ copyright }}. &copy; Copyright {{ copyright }}.
</div> </div>
<a href="https://github.com/kennethreitz/requests" class="github"> <div>
<img style="position: absolute; top: 0; right: 0; border: 0;" src="http://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub" class="github"/>
</div>
<a style="position: absolute; top: 0; right: 0; border: 0;" href="https://github.com/pudo/dataset" class="github">
<img src="http://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub" class="github"/>
</a> </a>
<script type="text/javascript">
/* <![CDATA[ */
(function() {
var s = document.createElement('script'), t = document.getElementsByTagName('script')[0];
s.type = 'text/javascript';
s.async = true;
s.src = 'http://api.flattr.com/js/0.6/load.js?mode=auto';
t.parentNode.insertBefore(s, t);
})();
/* ]]> */
</script>
<script type="text/javascript">
setTimeout(function(){var a=document.createElement("script");
var b=document.getElementsByTagName("script")[0];
a.src=document.location.protocol+"//dnn506yrbagrg.cloudfront.net/pages/scripts/0013/7219.js?"+Math.floor(new Date().getTime()/3600000);
a.async=true;a.type="text/javascript";b.parentNode.insertBefore(a,b)}, 1);
</script>
<script type="text/javascript">
new HelloBar(36402,48802);
</script>
<!-- Piwik -->
<script type="text/javascript"> <script type="text/javascript">
var _paq = _paq || [];
var _gaq = _gaq || []; _paq.push(["trackPageView"]);
_gaq.push(['_setAccount', 'UA-8742933-11']); _paq.push(["enableLinkTracking"]);
_gaq.push(['_setDomainName', 'none']);
_gaq.push(['_setAllowLinker', true]);
_gaq.push(['_trackPageview']);
(function() { (function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; var u=(("https:" == document.location.protocol) ? "https" : "http") + "://piwik.vis4.net/";
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; _paq.push(["setTrackerUrl", u+"piwik.php"]);
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); _paq.push(["setSiteId", "17"]);
})(); var d=document, g=d.createElement("script"), s=d.getElementsByTagName("script")[0]; g.type="text/javascript";
g.defer=true; g.async=true; g.src=u+"piwik.js"; s.parentNode.insertBefore(g,s);
</script>
<script type="text/javascript">
(function() {
var t = document.createElement('script');
t.type = 'text/javascript';
t.async = true;
t.id = 'gauges-tracker';
t.setAttribute('data-site-id',
'4ddc27f6613f5d186d000007');
t.src = '//secure.gaug.es/track.js';
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(t, s);
})(); })();
</script> </script>
<!-- End Piwik Code -->
{%- endblock %} {%- endblock %}

15
docs/_themes/kr/sidebarlogo.html vendored Normal file
View File

@ -0,0 +1,15 @@
<a style="border:0" href="index.html"><img src="_static/dataset-logo.png" alt="dataset" /></a>
<p style="font-style:italic;font-size:0.9em; text-align:center; margin-bottom:1.5em">Because managing databases in Python should be as simple as reading and writing JSON files.</p>
<iframe width="200px" scrolling="0" height="20px" frameborder="0" allowtransparency="true" src="http://ghbtns.com/github-btn.html?user=pudo&amp;repo=dataset&amp;type=watch&amp;count=true&amp;size=small"></iframe>
<h3>Overview</h3>
<ul>
<li><a href="quickstart.html">Quickstart Guide</a></li>
<li><a href="api.html">API-Documentation</a></li>
<li><a href="install.html">Installation</a></li>
</ul>

View File

@ -14,9 +14,10 @@
/* -- page layout ----------------------------------------------------------- */ /* -- page layout ----------------------------------------------------------- */
body { body {
font-family: 'goudy old style', 'minion pro', 'bell mt', Georgia, 'Hiragino Mincho Pro'; font-family: "Georgia", "Open Sans", OpenSansRegular, sans-serif;
font-size: 17px; font-size: 16px;
background-color: white; background: #fff;
font-weight: 400;
color: #000; color: #000;
margin: 0; margin: 0;
padding: 0; padding: 0;
@ -45,7 +46,7 @@ hr {
} }
div.body { div.body {
background-color: #ffffff; background-color: white;
color: #3E4349; color: #3E4349;
padding: 0 30px 0 30px; padding: 0 30px 0 30px;
} }
@ -98,11 +99,11 @@ div.sphinxsidebarwrapper p.logo {
div.sphinxsidebar h3, div.sphinxsidebar h3,
div.sphinxsidebar h4 { div.sphinxsidebar h4 {
font-family: 'Garamond', 'Georgia', serif; font-family: 'Antic Slab' ,'Garamond', 'Georgia', serif;
color: #444; color: #000;
font-size: 24px; font-size: 24px;
font-weight: normal; font-weight: normal;
margin: 0 0 5px 0; margin: 30px 0 5px 0;
padding: 0; padding: 0;
} }
@ -111,7 +112,7 @@ div.sphinxsidebar h4 {
} }
div.sphinxsidebar h3 a { div.sphinxsidebar h3 a {
color: #444; color: #000;
} }
div.sphinxsidebar p.logo a, div.sphinxsidebar p.logo a,
@ -127,7 +128,7 @@ div.sphinxsidebar p {
} }
div.sphinxsidebar ul { div.sphinxsidebar ul {
margin: 10px 0; margin: 10px 0px;
padding: 0; padding: 0;
color: #000; color: #000;
} }
@ -156,18 +157,20 @@ div.body h3,
div.body h4, div.body h4,
div.body h5, div.body h5,
div.body h6 { div.body h6 {
font-family: 'Garamond', 'Georgia', serif; font-family: 'Antic Slab', serif;
font-weight: normal; font-weight: normal;
margin: 30px 0px 10px 0px; margin: 30px 0px 10px 0px;
padding: 0; padding: 0;
text-shadow: 1px 1px 3px #ddd;
color: #000;
} }
div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; } div.body h1 { margin-top: 0; padding-top: 0; font-size: 250%; }
div.body h2 { font-size: 180%; } div.body h2 { font-size: 190%; }
div.body h3 { font-size: 150%; } div.body h3 { font-size: 160%; }
div.body h4 { font-size: 130%; } div.body h4 { font-size: 140%; }
div.body h5 { font-size: 100%; } div.body h5 { font-size: 110%; }
div.body h6 { font-size: 100%; } div.body h6 { font-size: 110%; }
a.headerlink { a.headerlink {
color: #ddd; color: #ddd;
@ -244,9 +247,14 @@ p.admonition-title:after {
content: ":"; content: ":";
} }
pre, tt { pre {
font-family: 'Consolas', 'Menlo', 'Deja Vu Sans Mono', 'Bitstream Vera Sans Mono', monospace; font-family: 'Consolas', 'Menlo', 'Deja Vu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
font-size: 0.9em; font-size: 0.88em;
}
tt {
font-family: 'Consolas', 'Menlo', 'Deja Vu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
font-size: 0.95em;
} }
img.screenshot { img.screenshot {
@ -342,13 +350,13 @@ pre {
} }
dl pre, blockquote pre, li pre { dl pre, blockquote pre, li pre {
margin-left: -60px; margin-left: 0px;
padding-left: 60px; padding-left: 15px;
} }
dl dl pre { dl dl pre {
margin-left: -90px; margin-left: 0px;
padding-left: 90px; padding-left: 15px;
} }
tt { tt {
@ -359,6 +367,7 @@ tt {
tt.xref, a tt { tt.xref, a tt {
background-color: #FBFBFB; background-color: #FBFBFB;
color: #2277bb;
border-bottom: 1px solid white; border-bottom: 1px solid white;
} }
@ -534,3 +543,12 @@ a:hover tt {
.revsys-inline { .revsys-inline {
display: none!important; display: none!important;
} }
div.sphinxsidebar #searchbox input[type="text"] {
width: 140px;
padding: 4px 3px;
}
.highlight .nv {
color: #C65D09!important;
}

View File

@ -1,22 +0,0 @@
{% extends "basic/layout.html" %}
{% block header %}
{{ super() }}
{% if pagename == 'index' %}
<div class=indexwrapper>
{% endif %}
{% endblock %}
{% block footer %}
{% if pagename == 'index' %}
</div>
{% endif %}
{% endblock %}
{# do not display relbars #}
{% block relbar1 %}{% endblock %}
{% block relbar2 %}
{% if theme_github_fork %}
<a href="http://github.com/{{ theme_github_fork }}"><img style="position: fixed; top: 0; right: 0; border: 0;"
src="http://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub" /></a>
{% endif %}
{% endblock %}
{% block sidebar1 %}{% endblock %}
{% block sidebar2 %}{% endblock %}

View File

@ -1,287 +0,0 @@
/*
* flasky.css_t
* ~~~~~~~~~~~~
*
* Sphinx stylesheet -- flasky theme based on nature theme.
*
* :copyright: Copyright 2007-2010 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
@import url("basic.css");
/* -- page layout ----------------------------------------------------------- */
body {
font-family: 'Georgia', serif;
font-size: 17px;
color: #000;
background: white;
margin: 0;
padding: 0;
}
div.documentwrapper {
float: left;
width: 100%;
}
div.bodywrapper {
margin: 40px auto 0 auto;
width: 700px;
}
hr {
border: 1px solid #B1B4B6;
}
div.body {
background-color: #ffffff;
color: #3E4349;
padding: 0 30px 30px 30px;
}
img.floatingflask {
padding: 0 0 10px 10px;
float: right;
}
div.footer {
text-align: right;
color: #888;
padding: 10px;
font-size: 14px;
width: 650px;
margin: 0 auto 40px auto;
}
div.footer a {
color: #888;
text-decoration: underline;
}
div.related {
line-height: 32px;
color: #888;
}
div.related ul {
padding: 0 0 0 10px;
}
div.related a {
color: #444;
}
/* -- body styles ----------------------------------------------------------- */
a {
color: #004B6B;
text-decoration: underline;
}
a:hover {
color: #6D4100;
text-decoration: underline;
}
div.body {
padding-bottom: 40px; /* saved for footer */
}
div.body h1,
div.body h2,
div.body h3,
div.body h4,
div.body h5,
div.body h6 {
font-family: 'Garamond', 'Georgia', serif;
font-weight: normal;
margin: 30px 0px 10px 0px;
padding: 0;
}
{% if theme_index_logo %}
div.indexwrapper h1 {
text-indent: -999999px;
background: url({{ theme_index_logo }}) no-repeat center center;
height: {{ theme_index_logo_height }};
}
{% endif %}
div.body h2 { font-size: 180%; }
div.body h3 { font-size: 150%; }
div.body h4 { font-size: 130%; }
div.body h5 { font-size: 100%; }
div.body h6 { font-size: 100%; }
a.headerlink {
color: white;
padding: 0 4px;
text-decoration: none;
}
a.headerlink:hover {
color: #444;
background: #eaeaea;
}
div.body p, div.body dd, div.body li {
line-height: 1.4em;
}
div.admonition {
background: #fafafa;
margin: 20px -30px;
padding: 10px 30px;
border-top: 1px solid #ccc;
border-bottom: 1px solid #ccc;
}
div.admonition p.admonition-title {
font-family: 'Garamond', 'Georgia', serif;
font-weight: normal;
font-size: 24px;
margin: 0 0 10px 0;
padding: 0;
line-height: 1;
}
div.admonition p.last {
margin-bottom: 0;
}
div.highlight{
background-color: white;
}
dt:target, .highlight {
background: #FAF3E8;
}
div.note {
background-color: #eee;
border: 1px solid #ccc;
}
div.seealso {
background-color: #ffc;
border: 1px solid #ff6;
}
div.topic {
background-color: #eee;
}
div.warning {
background-color: #ffe4e4;
border: 1px solid #f66;
}
p.admonition-title {
display: inline;
}
p.admonition-title:after {
content: ":";
}
pre, tt {
font-family: 'Consolas', 'Menlo', 'Deja Vu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
font-size: 0.85em;
}
img.screenshot {
}
tt.descname, tt.descclassname {
font-size: 0.95em;
}
tt.descname {
padding-right: 0.08em;
}
img.screenshot {
-moz-box-shadow: 2px 2px 4px #eee;
-webkit-box-shadow: 2px 2px 4px #eee;
box-shadow: 2px 2px 4px #eee;
}
table.docutils {
border: 1px solid #888;
-moz-box-shadow: 2px 2px 4px #eee;
-webkit-box-shadow: 2px 2px 4px #eee;
box-shadow: 2px 2px 4px #eee;
}
table.docutils td, table.docutils th {
border: 1px solid #888;
padding: 0.25em 0.7em;
}
table.field-list, table.footnote {
border: none;
-moz-box-shadow: none;
-webkit-box-shadow: none;
box-shadow: none;
}
table.footnote {
margin: 15px 0;
width: 100%;
border: 1px solid #eee;
}
table.field-list th {
padding: 0 0.8em 0 0;
}
table.field-list td {
padding: 0;
}
table.footnote td {
padding: 0.5em;
}
dl {
margin: 0;
padding: 0;
}
dl dd {
margin-left: 30px;
}
pre {
padding: 0;
margin: 15px -30px;
padding: 8px;
line-height: 1.3em;
padding: 7px 30px;
background: #eee;
border-radius: 2px;
-moz-border-radius: 2px;
-webkit-border-radius: 2px;
}
dl pre {
margin-left: -60px;
padding-left: 60px;
}
tt {
background-color: #ecf0f3;
color: #222;
/* padding: 1px 2px; */
}
tt.xref, a tt {
background-color: #FBFBFB;
}
a:hover tt {
background: #EEE;
}

View File

@ -1,10 +0,0 @@
[theme]
inherit = basic
stylesheet = flasky.css
nosidebar = true
pygments_style = flask_theme_support.FlaskyStyle
[options]
index_logo = ''
index_logo_height = 120px
github_fork = ''

View File

@ -2,29 +2,19 @@
API documentation API documentation
================= =================
.. toctree::
:maxdepth: 2
.. autofunction:: dataset.connect .. autofunction:: dataset.connect
Database Database
-------- --------
.. autoclass:: dataset.Database .. autoclass:: dataset.Database
:members: get_table, create_table, load_table, query :members: get_table, create_table, load_table, query, tables
:undoc-members: :special-members:
Table Table
----- -----
Using the *Table* class you can easily store and retreive data from database tables.
.. autoclass:: dataset.Table .. autoclass:: dataset.Table
:members: :members: columns, drop, insert, update, upsert, find, find_one, distinct, create_column, create_index, all
:undoc-members: :special-members: __len__, __iter__

View File

@ -100,7 +100,7 @@ html_theme = 'kr'
# further. For a list of options available for each theme, see the # further. For a list of options available for each theme, see the
# documentation. # documentation.
# html_theme_options = { # html_theme_options = {
# 'codebgcolor': '' # 'stickysidebar': "true"
# } # }
# Add any paths that contain custom themes here, relative to this directory. # Add any paths that contain custom themes here, relative to this directory.
@ -136,7 +136,11 @@ html_static_path = ['_static']
#html_use_smartypants = True #html_use_smartypants = True
# Custom sidebar templates, maps document names to template names. # Custom sidebar templates, maps document names to template names.
#html_sidebars = {} html_sidebars = {
'index': ['sidebarlogo.html', 'sourcelink.html', 'searchbox.html'],
'api': ['sidebarlogo.html', 'autotoc.html', 'sourcelink.html', 'searchbox.html'],
'**': ['sidebarlogo.html', 'localtoc.html', 'sourcelink.html', 'searchbox.html']
}
# Additional templates that should be rendered to pages, maps page names to # Additional templates that should be rendered to pages, maps page names to
# template names. # template names.
@ -152,7 +156,7 @@ html_static_path = ['_static']
#html_split_index = False #html_split_index = False
# If true, links to the reST sources are added to the pages. # If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True html_show_sourcelink = False
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True. # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
#html_show_sphinx = True #html_show_sphinx = True

View File

@ -3,23 +3,58 @@
You can adapt this file completely to your liking, but it should at least You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive. contain the root `toctree` directive.
Welcome to dataset's documentation! dataset: databases for lazy people
=================================== ==================================
Getting the databases out of your data's way. .. toctree::
:hidden:
Although managing data in relational database has plenty of benefits, we find them rarely being used in the typical day-to-day work with small to medium scale datasets. But why is that? Why do we see an awful lot of data stored in static files in CSV or JSON format?
The answer is that **programmers are lazy**, and thus they tend to prefer the easiest solution they find. And in **Python**, a database wasn't the simplest solution for storing a bunch of structured data. This is what **dataset** is going to change!
In short, dataset combines the straightforwardness of NoSQL interfaces with the full power and flexibility of relational databases. It makes database management as simple as reading and writing JSON files.
::
import dataset
db = dataset.connect('sqlite:///:memory:')
table = db['sometable']
table.insert(dict(name='John Doe', age=37))
table.insert(dict(name='Jane Doe', age=34, gender='female'))
john = table.find_one(name='John Doe')
Here is `similar code, without dataset <https://gist.github.com/gka/5296492>`_.
Features
--------
* **Automatic schema**: If a table or column is written that does not
exist in the database, it will be created automatically.
* **Upserts**: Records are either created or updated, depending on
whether an existing version can be found.
* **Query helpers** for simple queries such as :py:meth:`all <dataset.Table.all>` rows in a table or
all :py:meth:`distinct <dataset.Table.distinct>` values across a set of columns.
* **Compatibility**: Being built on top of `SQLAlchemy <http://www.sqlalchemy.org/>`_, ``dataset`` works with all major databases, such as SQLite, PostgreSQL and MySQL.
Contents
--------
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
* `Learn how to use dataset in five minutes <quickstart>`_ quickstart
* `Browse the complete API docs <api>`_ api
Contributors
------------
``dataset`` is written and maintained by `Friedrich Lindenberg <https://github.com/pudo>`_ and `Gregor Aisch <https://github.com/gka>`_. Its code is largely based on the preceding libraries `sqlaload <https://github.com/okfn/sqlaload>`_ and `datafreeze <https://github.com/spiegelonline/datafreeze>`_. And of course, we're standing on the `shoulders of giants <http://www.sqlalchemy.org/>`_.
Indices and tables Our cute little `naked mole rat <http://www.youtube.com/watch?feature=player_detailpage&v=A5DcOEzW1wA#t=14s>`_ was drawn by `Johannes Koch <http://chechuchape.com/>`_.
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

20
docs/install.rst Normal file
View File

@ -0,0 +1,20 @@
Installation Guide
==================
*— work in progress —*
The easiest way is to install ``dataset`` from the `Python Package Index <https://pypi.python.org/pypi/dataset/>`_ using ``pip`` or ``easy_install``:
.. code-block:: bash
$ easy_install dataset
To install it manually simply download the repository from Github:
.. code-block:: bash
$ git clone git://github.com/pudo/dataset.git
$ cd dataset/
$ python setup.py install

View File

@ -3,41 +3,36 @@ Quickstart
========== ==========
.. toctree:: Hi, welcome to the twelve-minute quick-start tutorial.
:maxdepth: 2
Hi, welcome to the five-minute quick-start tutorial. Connecting to a database
------------------------
At first you need to import the dataset package :) :: At first you need to import the dataset package :) ::
import dataset import dataset
To connect to a database you need to identify it using what is called an engine url. Here are a few examples:: To connect to a database you need to identify it by its `URL <http://docs.sqlalchemy.org/en/latest/core/engines.html#engine-creation-api>`_, which basically is a string of the form ``"dialect://user:password@host/dbname"``. Here are a few common examples::
# connecting to a SQLite database # connecting to a SQLite database
db = dataset.connect('sqlite:///factbook.db') db = dataset.connect('sqlite:///mydatabase.db')
# connecting to a MySQL database # connecting to a MySQL database with user and password
db = dataset.connect('mysql://user:password@localhost/mydatabase') db = dataset.connect('mysql://user:password@localhost/mydatabase')
# connecting to a PostgreSQL database # connecting to a PostgreSQL database
db = dataset.connect('postgresql://scott:tiger@localhost:5432/mydatabase') db = dataset.connect('postgresql://scott:tiger@localhost:5432/mydatabase')
If you want to learn more about how to connect to various databases, have a look at the `SQLAlchemy documentation`_.
.. _SQLAlchemy documentation: http://docs.sqlalchemy.org/en/latest/core/engines.html#engine-creation-api
Storing data Storing data
------------ ------------
At first you need to get a reference to the table in which you want to store your data. You don't To store some data you need to get a reference to a table. You don't need to worry about whether the table already exists or not, since dataset will create it automatically::
need to worry about whether the table already exists or not, since dataset will create it automatically::
# get a reference to the table 'person' # get a reference to the table 'person'
table = db['person'] table = db['person']
Now storing data in a table is a matter of a single function call. Just pass a `dict`_ to *insert*. Note Now storing data in a table is a matter of a single function call. Just pass a `dict`_ to *insert*. Note that you don't need to create the columns *name* and *age* dataset will do this automatically::
that you don't need to create the columns *name* and *age* dataset will do this automatically::
# Insert a new record. # Insert a new record.
table.insert(dict(name='John Doe', age=46)) table.insert(dict(name='John Doe', age=46))
@ -54,37 +49,77 @@ Updating existing entries is easy, too::
table.update(dict(name='John Doe', age=47), ['name']) table.update(dict(name='John Doe', age=47), ['name'])
Inspecting databases and tables
-------------------------------
When dealing with unknown databases we might want to check its structure first. To begin with, let's find out what tables are stored in the database:
>>> print db.tables
set([u'user', u'action'])
Now, let's list all columns available in the table ``user``:
>>> print db['user'].columns
set([u'id', u'name', u'email', u'pwd', u'country'])
Using ``len()`` we can get the total number of rows in a table:
>>> print len(db['user'])
187
Reading data from tables Reading data from tables
------------------------ ------------------------
Checking:: Now let's get some real data out of the table::
table = db['population'] users = db['user'].all()
# Let's grab a list of all items/rows/entries in the table: If we simply want to iterate over all rows in a table, we can ommit :py:meth:`all() <dataset.Table.all>`::
table.all()
table.distinct() for user in db['user']:
print user['email']
Searching for specific entries:: We can search for specific entries using :py:meth:`find() <dataset.Table.find>` and :py:meth:`find_one() <dataset.Table.find_one>`::
# Returns the first item where the column country equals 'China' # All users from China
table.find_one(country='China') users = table.find(country='China')
# Returns all items # Get a specific user
table.find(country='China') john = table.find_one(name='John Doe')
Querying data Using :py:meth:`distinct() <dataset.Table.distinct>` we can grab a set of rows with unique values in one or more columns::
-------------
Querying data is easy. Dataset returns an iteratable result object:: # Get one user per country
db['user'].distinct('country')
result = db.query('SELECT ...')
Running custom SQL queries
--------------------------
Of course the main reason you're using a database is that you want to use the full power of SQL queries. Here's how you run them with ``dataset``::
result = db.query('SELECT country, COUNT(*) c FROM user GROUP BY country')
for row in result: for row in result:
print row print row['country'], row['c']
Freezing your data
------------------
Exporting data
--------------
While playing around with our database in Python is a nice thing, sometimes we want to use the data or parts of it elsewhere, say in an interactive web application. Therefor ``dataset`` supports serializing rows of data into static files such as JSON using the :py:meth:`freeze() <dataset.freeze>` function::
# export all users into a single JSON
result = db['users'].all()
dataset.freeze(result, 'users.json')
You can create one file per row by setting ``mode`` to "item"::
# export one JSON file per user
dataset.freeze(result, 'users/{{ id }}.json', mode='item')
Since this is a common operation we made it available via command line utility ``datafreeze``. Read more about the `freezefile markup <https://github.com/spiegelonline/datafreeze#example-freezefileyaml>`_.
.. code-block:: bash
$ datafreeze freezefile.yaml