2013-04-01 23:36:09 +02:00
2013-04-02 00:03:25 +02:00
Quickstart
==========
2013-04-01 23:36:09 +02:00
2013-04-03 01:48:26 +02:00
Hi, welcome to the twelve-minute quick-start tutorial.
2013-04-01 23:36:09 +02:00
2013-04-02 23:45:44 +02:00
Connecting to a database
------------------------
2013-04-02 00:03:25 +02:00
At first you need to import the dataset package :) ::
2013-04-01 23:36:09 +02:00
import dataset
2013-04-03 22:47:28 +02:00
To connect to a database you need to identify it by its `URL <http://docs.sqlalchemy.org/en/latest/core/engines.html#engine-creation-api> `_ , which basically is a string of the form `` "dialect://user:password@host/dbname" `` . Here are a few examples for different database backends::
2013-04-01 23:36:09 +02:00
# connecting to a SQLite database
2013-04-02 23:45:44 +02:00
db = dataset.connect('sqlite:///mydatabase.db')
2013-04-01 23:36:09 +02:00
2013-04-02 23:45:44 +02:00
# connecting to a MySQL database with user and password
2013-04-01 23:36:09 +02:00
db = dataset.connect('mysql://user:password@localhost/mydatabase')
# connecting to a PostgreSQL database
db = dataset.connect('postgresql://scott:tiger@localhost:5432/mydatabase')
2013-07-14 23:05:52 +02:00
It is also possible to define the `URL` as an environment variable called `DATABASE_URL`
so you can initialize database connection without explicitly passing an `URL` ::
db = dataset.connect()
2013-04-03 22:47:28 +02:00
Depending on which database you're using, you may also have to install
the database bindings to support that database. SQLite is included in
the Python core, but PostgreSQL requires `` psycopg2 `` to be installed.
MySQL can be enabled by installing the `` mysql-db `` drivers.
2013-04-01 23:36:09 +02:00
Storing data
------------
2013-04-03 22:47:28 +02:00
To store some data you need to get a reference to a table. You don't need
to worry about whether the table already exists or not, since dataset
will create it automatically::
2013-04-01 23:36:09 +02:00
2014-02-27 17:13:06 +01:00
# get a reference to the table 'user'
table = db['user']
2013-04-01 23:36:09 +02:00
2013-04-03 22:47:28 +02:00
Now storing data in a table is a matter of a single function call. Just
pass a `dict`_ to *insert* . Note that you don't need to create the columns
*name* and *age* – dataset will do this automatically::
2013-04-01 23:36:09 +02:00
# Insert a new record.
2014-02-27 17:13:06 +01:00
table.insert(dict(name='John Doe', age=46, country='China'))
2013-04-01 23:36:09 +02:00
# dataset will create "missing" columns any time you insert a dict with an unknown key
2014-02-27 17:13:06 +01:00
table.insert(dict(name='Jane Doe', age=37, country='France', gender='female'))
2013-04-01 23:36:09 +02:00
.. _dict: http://docs.python.org/2/library/stdtypes.html#dict
Updating existing entries is easy, too::
table.update(dict(name='John Doe', age=47), ['name'])
2013-04-03 22:47:28 +02:00
The list of filter columns given as the second argument filter using the
values in the first column. If you don't want to update over a
particular value, just use the auto-generated `` id `` column.
2014-06-11 13:12:44 +02:00
Using Transactions
------------------
2014-06-12 07:52:56 +02:00
You can group a set of database updates in a transaction. In that case, all updates
are committed at once or, in case of exception, all of them are reverted. Transactions
are supported through a context manager, so they can be used through a `` with ``
statement::
2014-06-11 13:12:44 +02:00
with dataset.connect() as tx:
tx['user'].insert(dict(name='John Doe', age=46, country='China'))
2014-06-12 07:52:56 +02:00
You can get same functionality by invocing the methods :py:meth: `begin() <dataset.Table.begin>` ,
:py:meth: `commit() <dataset.Table.commit>` and :py:meth: `rollback() <dataset.Table.rollback>`
explicitly::
2014-06-11 13:12:44 +02:00
db = dataset.connect()
db.begin()
try:
db['user'].insert(dict(name='John Doe', age=46, country='China'))
db.commit()
except:
db.rollback()
Nested transactions are supported too::
2014-06-12 07:52:56 +02:00
2014-06-11 13:12:44 +02:00
db = dataset.connect()
with db as tx1:
tx1['user'].insert(dict(name='John Doe', age=46, country='China'))
with db sa tx2:
tx2['user'].insert(dict(name='Jane Doe', age=37, country='France', gender='female'))
2013-04-03 01:48:26 +02:00
Inspecting databases and tables
-------------------------------
2013-04-03 22:47:28 +02:00
When dealing with unknown databases we might want to check their structure
first. To start exploring, let's find out what tables are stored in the
database:
2013-04-03 01:48:26 +02:00
2014-01-31 20:42:26 +01:00
>>> print(db.tables)
2014-02-27 17:13:06 +01:00
[u'user']
2013-04-03 01:48:26 +02:00
Now, let's list all columns available in the table `` user `` :
2014-01-31 20:42:04 +01:00
>>> print(db['user'].columns)
2014-02-27 17:13:06 +01:00
[u'id', u'country', u'age', u'name', u'gender']
2013-04-03 01:48:26 +02:00
Using `` len() `` we can get the total number of rows in a table:
2014-01-31 20:42:26 +01:00
>>> print(len(db['user']))
2014-02-27 17:13:06 +01:00
2
2013-04-03 01:48:26 +02:00
2013-04-01 23:36:09 +02:00
Reading data from tables
------------------------
2013-04-03 01:48:26 +02:00
Now let's get some real data out of the table::
users = db['user'].all()
2013-04-01 23:36:09 +02:00
2013-04-03 22:47:28 +02:00
If we simply want to iterate over all rows in a table, we can omit :py:meth: `all() <dataset.Table.all>` ::
2013-04-03 12:28:42 +02:00
for user in db['user']:
2014-02-27 17:13:06 +01:00
print(user['age'])
2013-04-03 12:28:42 +02:00
2013-04-03 22:47:28 +02:00
We can search for specific entries using :py:meth: `find() <dataset.Table.find>` and
:py:meth: `find_one() <dataset.Table.find_one>` ::
2013-04-01 23:36:09 +02:00
2013-04-03 01:48:26 +02:00
# All users from China
2014-02-27 17:13:06 +01:00
chinese_users = table.find(country='China')
2013-04-01 23:36:09 +02:00
2013-04-03 01:48:26 +02:00
# Get a specific user
2013-04-03 12:28:42 +02:00
john = table.find_one(name='John Doe')
2013-04-01 23:36:09 +02:00
2013-04-03 22:47:28 +02:00
Using :py:meth: `distinct() <dataset.Table.distinct>` we can grab a set of rows
with unique values in one or more columns::
2013-04-01 23:36:09 +02:00
2013-04-03 01:48:26 +02:00
# Get one user per country
db['user'].distinct('country')
2013-04-01 23:36:09 +02:00
2013-04-02 00:03:25 +02:00
2013-04-03 00:24:23 +02:00
Running custom SQL queries
--------------------------
2013-04-02 00:03:25 +02:00
2013-04-03 22:47:28 +02:00
Of course the main reason you're using a database is that you want to
use the full power of SQL queries. Here's how you run them with `` dataset `` ::
2013-04-02 00:03:25 +02:00
2013-04-03 01:48:26 +02:00
result = db.query('SELECT country, COUNT(*) c FROM user GROUP BY country')
2013-04-03 00:51:33 +02:00
for row in result:
2014-01-31 22:21:50 +01:00
print(row['country'], row['c'])
2013-04-03 01:48:26 +02:00
2013-04-03 22:47:28 +02:00
The :py:meth: `query() <dataset.Table.query>` method can also be used to
2014-05-21 02:45:44 +02:00
access the underlying `SQLAlchemy core API <http://docs.sqlalchemy.org/en/latest/orm/query.html#the-query-object> `_ , which allows for the
2013-04-03 22:47:28 +02:00
programmatic construction of more complex queries::
2014-02-27 17:13:06 +01:00
table = db['user'].table
statement = table.select(table.c.name.like('%John%'))
2013-04-03 22:47:28 +02:00
result = db.query(statement)
2013-04-03 01:48:26 +02:00
2013-04-03 15:54:16 +02:00
Exporting data
--------------
2013-04-03 01:48:26 +02:00
2013-04-03 22:47:28 +02:00
While playing around with our database in Python is a nice thing, they are
sometimes just a processing stage until we go on to use it in another
place, say in an interactive web application. To make this seamless,
`` dataset `` supports serializing rows of data into static JSON and CSV files
such using the :py:meth: `freeze() <dataset.freeze>` function::
2013-04-03 01:48:26 +02:00
# export all users into a single JSON
result = db['users'].all()
2013-11-12 15:14:53 +01:00
dataset.freeze(result, format='json', filename='users.json')
2013-04-03 01:48:26 +02:00
You can create one file per row by setting `` mode `` to "item"::
# export one JSON file per user
2013-11-12 15:14:53 +01:00
dataset.freeze(result, format='json', filename='users/{{ id }}.json', mode='item')
2013-04-02 00:03:25 +02:00
2013-04-03 22:47:28 +02:00
Since this is a common operation we made it available via command line
2013-04-22 17:44:55 +02:00
utility `` datafreeze `` . Read more about the :doc: `freezefile markup <freezefile>` .
2013-04-02 00:03:25 +02:00
2013-04-03 01:48:26 +02:00
.. code-block :: bash
2013-04-02 00:03:25 +02:00
2013-04-03 01:48:26 +02:00
$ datafreeze freezefile.yaml