2013-04-01 23:36:09 +02:00
2013-04-02 00:03:25 +02:00
Quickstart
==========
2013-04-01 23:36:09 +02:00
2013-04-03 01:48:26 +02:00
Hi, welcome to the twelve-minute quick-start tutorial.
2013-04-01 23:36:09 +02:00
2013-04-02 23:45:44 +02:00
Connecting to a database
------------------------
2013-04-02 00:03:25 +02:00
At first you need to import the dataset package :) ::
2013-04-01 23:36:09 +02:00
import dataset
2013-04-03 22:47:28 +02:00
To connect to a database you need to identify it by its `URL <http://docs.sqlalchemy.org/en/latest/core/engines.html#engine-creation-api> `_ , which basically is a string of the form `` "dialect://user:password@host/dbname" `` . Here are a few examples for different database backends::
2013-04-01 23:36:09 +02:00
# connecting to a SQLite database
2013-04-02 23:45:44 +02:00
db = dataset.connect('sqlite:///mydatabase.db')
2013-04-01 23:36:09 +02:00
2013-04-02 23:45:44 +02:00
# connecting to a MySQL database with user and password
2013-04-01 23:36:09 +02:00
db = dataset.connect('mysql://user:password@localhost/mydatabase')
# connecting to a PostgreSQL database
db = dataset.connect('postgresql://scott:tiger@localhost:5432/mydatabase')
2013-07-14 23:05:52 +02:00
It is also possible to define the `URL` as an environment variable called `DATABASE_URL`
so you can initialize database connection without explicitly passing an `URL` ::
db = dataset.connect()
2013-04-03 22:47:28 +02:00
Depending on which database you're using, you may also have to install
the database bindings to support that database. SQLite is included in
2017-09-09 18:24:34 +02:00
the Python core, but PostgreSQL requires `` psycopg2 `` to be installed.
MySQL can be enabled by installing the `` mysql-db `` drivers.
2013-04-03 22:47:28 +02:00
2013-04-01 23:36:09 +02:00
Storing data
------------
2013-04-03 22:47:28 +02:00
To store some data you need to get a reference to a table. You don't need
to worry about whether the table already exists or not, since dataset
will create it automatically::
2013-04-01 23:36:09 +02:00
2014-02-27 17:13:06 +01:00
# get a reference to the table 'user'
table = db['user']
2013-04-01 23:36:09 +02:00
2013-04-03 22:47:28 +02:00
Now storing data in a table is a matter of a single function call. Just
pass a `dict`_ to *insert* . Note that you don't need to create the columns
*name* and *age* – dataset will do this automatically::
2013-04-01 23:36:09 +02:00
# Insert a new record.
2014-02-27 17:13:06 +01:00
table.insert(dict(name='John Doe', age=46, country='China'))
2013-04-01 23:36:09 +02:00
# dataset will create "missing" columns any time you insert a dict with an unknown key
2014-02-27 17:13:06 +01:00
table.insert(dict(name='Jane Doe', age=37, country='France', gender='female'))
2013-04-01 23:36:09 +02:00
.. _dict: http://docs.python.org/2/library/stdtypes.html#dict
Updating existing entries is easy, too::
table.update(dict(name='John Doe', age=47), ['name'])
2013-04-03 22:47:28 +02:00
The list of filter columns given as the second argument filter using the
values in the first column. If you don't want to update over a
particular value, just use the auto-generated `` id `` column.
2014-06-11 13:12:44 +02:00
Using Transactions
------------------
2014-06-12 07:52:56 +02:00
You can group a set of database updates in a transaction. In that case, all updates
are committed at once or, in case of exception, all of them are reverted. Transactions
are supported through a context manager, so they can be used through a `` with ``
statement::
2014-06-11 13:12:44 +02:00
with dataset.connect() as tx:
tx['user'].insert(dict(name='John Doe', age=46, country='China'))
2014-09-22 06:28:37 +02:00
You can get same functionality by invoking the methods :py:meth: `begin() <dataset.Table.begin>` ,
2014-06-12 07:52:56 +02:00
:py:meth: `commit() <dataset.Table.commit>` and :py:meth: `rollback() <dataset.Table.rollback>`
explicitly::
2014-06-11 13:12:44 +02:00
db = dataset.connect()
db.begin()
try:
db['user'].insert(dict(name='John Doe', age=46, country='China'))
db.commit()
except:
db.rollback()
Nested transactions are supported too::
2014-06-12 07:52:56 +02:00
2014-06-11 13:12:44 +02:00
db = dataset.connect()
with db as tx1:
tx1['user'].insert(dict(name='John Doe', age=46, country='China'))
2014-10-06 16:31:55 +02:00
with db as tx2:
2014-06-11 13:12:44 +02:00
tx2['user'].insert(dict(name='Jane Doe', age=37, country='France', gender='female'))
2013-04-03 01:48:26 +02:00
Inspecting databases and tables
-------------------------------
2013-04-03 22:47:28 +02:00
When dealing with unknown databases we might want to check their structure
first. To start exploring, let's find out what tables are stored in the
database:
2013-04-03 01:48:26 +02:00
2014-01-31 20:42:26 +01:00
>>> print(db.tables)
2014-02-27 17:13:06 +01:00
[u'user']
2013-04-03 01:48:26 +02:00
Now, let's list all columns available in the table `` user `` :
2014-01-31 20:42:04 +01:00
>>> print(db['user'].columns)
2017-09-09 18:24:34 +02:00
[u'id', u'country', u'age', u'name', u'gender']
2013-04-03 01:48:26 +02:00
Using `` len() `` we can get the total number of rows in a table:
2014-01-31 20:42:26 +01:00
>>> print(len(db['user']))
2014-02-27 17:13:06 +01:00
2
2013-04-03 01:48:26 +02:00
2013-04-01 23:36:09 +02:00
Reading data from tables
------------------------
2013-04-03 01:48:26 +02:00
Now let's get some real data out of the table::
users = db['user'].all()
2013-04-01 23:36:09 +02:00
2013-04-03 22:47:28 +02:00
If we simply want to iterate over all rows in a table, we can omit :py:meth: `all() <dataset.Table.all>` ::
2013-04-03 12:28:42 +02:00
for user in db['user']:
2014-02-27 17:13:06 +01:00
print(user['age'])
2013-04-03 12:28:42 +02:00
2013-04-03 22:47:28 +02:00
We can search for specific entries using :py:meth: `find() <dataset.Table.find>` and
:py:meth: `find_one() <dataset.Table.find_one>` ::
2013-04-01 23:36:09 +02:00
2013-04-03 01:48:26 +02:00
# All users from China
2014-02-27 17:13:06 +01:00
chinese_users = table.find(country='China')
2013-04-01 23:36:09 +02:00
2013-04-03 01:48:26 +02:00
# Get a specific user
2013-04-03 12:28:42 +02:00
john = table.find_one(name='John Doe')
2013-04-01 23:36:09 +02:00
2016-03-24 06:24:12 +01:00
# Find by comparison
elderly_users = table.find(table.table.columns.age >= 70)
2013-04-03 22:47:28 +02:00
Using :py:meth: `distinct() <dataset.Table.distinct>` we can grab a set of rows
with unique values in one or more columns::
2013-04-01 23:36:09 +02:00
2013-04-03 01:48:26 +02:00
# Get one user per country
db['user'].distinct('country')
2013-04-01 23:36:09 +02:00
2015-05-23 16:15:17 +02:00
Finally, you can use the `` row_type `` parameter to choose the data type in which
results will be returned::
import dataset
from stuf import stuf
db = dataset.connect('sqlite:///mydatabase.db', row_type=stuf)
Now contents will be returned in `` stuf `` objects (basically, `` dict ``
2017-09-09 18:24:34 +02:00
objects whose elements can be acessed as attributes (`` item.name `` ) as well as
2015-05-23 16:15:17 +02:00
by index (`` item['name'] `` ).
2013-04-02 00:03:25 +02:00
2013-04-03 00:24:23 +02:00
Running custom SQL queries
--------------------------
2013-04-02 00:03:25 +02:00
2013-04-03 22:47:28 +02:00
Of course the main reason you're using a database is that you want to
use the full power of SQL queries. Here's how you run them with `` dataset `` ::
2013-04-02 00:03:25 +02:00
2013-04-03 01:48:26 +02:00
result = db.query('SELECT country, COUNT(*) c FROM user GROUP BY country')
2013-04-03 00:51:33 +02:00
for row in result:
2014-01-31 22:21:50 +01:00
print(row['country'], row['c'])
2013-04-03 01:48:26 +02:00
2017-09-09 18:24:34 +02:00
The :py:meth: `query() <dataset.Table.query>` method can also be used to
2014-05-21 02:45:44 +02:00
access the underlying `SQLAlchemy core API <http://docs.sqlalchemy.org/en/latest/orm/query.html#the-query-object> `_ , which allows for the
2013-04-03 22:47:28 +02:00
programmatic construction of more complex queries::
2014-02-27 17:13:06 +01:00
table = db['user'].table
statement = table.select(table.c.name.like('%John%'))
2017-09-09 18:24:34 +02:00
result = db.query(statement)