Copy edit some of the docs.

This commit is contained in:
Friedrich Lindenberg 2013-04-03 22:47:28 +02:00
parent f3533de1a7
commit 75230128f2
2 changed files with 48 additions and 19 deletions

View File

@ -10,11 +10,12 @@ dataset: databases for lazy people
:hidden: :hidden:
Although managing data in relational database has plenty of benefits, we find them rarely being used in the typical day-to-day work with small to medium scale datasets. But why is that? Why do we see an awful lot of data stored in static files in CSV or JSON format? Although managing data in relational database has plenty of benefits, they're rarely used in day-to-day work with small to medium scale datasets. But why is that? Why do we see an awful lot of data stored in static files in CSV or JSON format, even though they are hard
to query and update incrementally?
The answer is that **programmers are lazy**, and thus they tend to prefer the easiest solution they find. And in **Python**, a database wasn't the simplest solution for storing a bunch of structured data. This is what **dataset** is going to change! The answer is that **programmers are lazy**, and thus they tend to prefer the easiest solution they find. And in **Python**, a database isn't the simplest solution for storing a bunch of structured data. This is what **dataset** is going to change!
In short, dataset combines the straightforwardness of NoSQL interfaces with the full power and flexibility of relational databases. It makes database management as simple as reading and writing JSON files. In short, dataset combines the straightforwardness of JSON files or a NoSQL store with the full power and flexibility of relational databases.
:: ::

View File

@ -12,7 +12,7 @@ At first you need to import the dataset package :) ::
import dataset import dataset
To connect to a database you need to identify it by its `URL <http://docs.sqlalchemy.org/en/latest/core/engines.html#engine-creation-api>`_, which basically is a string of the form ``"dialect://user:password@host/dbname"``. Here are a few common examples:: To connect to a database you need to identify it by its `URL <http://docs.sqlalchemy.org/en/latest/core/engines.html#engine-creation-api>`_, which basically is a string of the form ``"dialect://user:password@host/dbname"``. Here are a few examples for different database backends::
# connecting to a SQLite database # connecting to a SQLite database
db = dataset.connect('sqlite:///mydatabase.db') db = dataset.connect('sqlite:///mydatabase.db')
@ -23,16 +23,25 @@ To connect to a database you need to identify it by its `URL <http://docs.sqlalc
# connecting to a PostgreSQL database # connecting to a PostgreSQL database
db = dataset.connect('postgresql://scott:tiger@localhost:5432/mydatabase') db = dataset.connect('postgresql://scott:tiger@localhost:5432/mydatabase')
Depending on which database you're using, you may also have to install
the database bindings to support that database. SQLite is included in
the Python core, but PostgreSQL requires ``psycopg2`` to be installed.
MySQL can be enabled by installing the ``mysql-db`` drivers.
Storing data Storing data
------------ ------------
To store some data you need to get a reference to a table. You don't need to worry about whether the table already exists or not, since dataset will create it automatically:: To store some data you need to get a reference to a table. You don't need
to worry about whether the table already exists or not, since dataset
will create it automatically::
# get a reference to the table 'person' # get a reference to the table 'person'
table = db['person'] table = db['person']
Now storing data in a table is a matter of a single function call. Just pass a `dict`_ to *insert*. Note that you don't need to create the columns *name* and *age* dataset will do this automatically:: Now storing data in a table is a matter of a single function call. Just
pass a `dict`_ to *insert*. Note that you don't need to create the columns
*name* and *age* dataset will do this automatically::
# Insert a new record. # Insert a new record.
table.insert(dict(name='John Doe', age=46)) table.insert(dict(name='John Doe', age=46))
@ -40,19 +49,22 @@ Now storing data in a table is a matter of a single function call. Just pass a `
# dataset will create "missing" columns any time you insert a dict with an unknown key # dataset will create "missing" columns any time you insert a dict with an unknown key
table.insert(dict(name='Jane Doe', age=37, gender='female')) table.insert(dict(name='Jane Doe', age=37, gender='female'))
# If you need to insert many items at once, you can speed up things by using insert_many:
table.insert_many(list_of_persons)
.. _dict: http://docs.python.org/2/library/stdtypes.html#dict .. _dict: http://docs.python.org/2/library/stdtypes.html#dict
Updating existing entries is easy, too:: Updating existing entries is easy, too::
table.update(dict(name='John Doe', age=47), ['name']) table.update(dict(name='John Doe', age=47), ['name'])
The list of filter columns given as the second argument filter using the
values in the first column. If you don't want to update over a
particular value, just use the auto-generated ``id`` column.
Inspecting databases and tables Inspecting databases and tables
------------------------------- -------------------------------
When dealing with unknown databases we might want to check its structure first. To begin with, let's find out what tables are stored in the database: When dealing with unknown databases we might want to check their structure
first. To start exploring, let's find out what tables are stored in the
database:
>>> print db.tables >>> print db.tables
set([u'user', u'action']) set([u'user', u'action'])
@ -74,12 +86,13 @@ Now let's get some real data out of the table::
users = db['user'].all() users = db['user'].all()
If we simply want to iterate over all rows in a table, we can ommit :py:meth:`all() <dataset.Table.all>`:: If we simply want to iterate over all rows in a table, we can omit :py:meth:`all() <dataset.Table.all>`::
for user in db['user']: for user in db['user']:
print user['email'] print user['email']
We can search for specific entries using :py:meth:`find() <dataset.Table.find>` and :py:meth:`find_one() <dataset.Table.find_one>`:: We can search for specific entries using :py:meth:`find() <dataset.Table.find>` and
:py:meth:`find_one() <dataset.Table.find_one>`::
# All users from China # All users from China
users = table.find(country='China') users = table.find(country='China')
@ -87,7 +100,8 @@ We can search for specific entries using :py:meth:`find() <dataset.Table.find>`
# Get a specific user # Get a specific user
john = table.find_one(name='John Doe') john = table.find_one(name='John Doe')
Using :py:meth:`distinct() <dataset.Table.distinct>` we can grab a set of rows with unique values in one or more columns:: Using :py:meth:`distinct() <dataset.Table.distinct>` we can grab a set of rows
with unique values in one or more columns::
# Get one user per country # Get one user per country
db['user'].distinct('country') db['user'].distinct('country')
@ -96,29 +110,43 @@ Using :py:meth:`distinct() <dataset.Table.distinct>` we can grab a set of rows
Running custom SQL queries Running custom SQL queries
-------------------------- --------------------------
Of course the main reason you're using a database is that you want to use the full power of SQL queries. Here's how you run them with ``dataset``:: Of course the main reason you're using a database is that you want to
use the full power of SQL queries. Here's how you run them with ``dataset``::
result = db.query('SELECT country, COUNT(*) c FROM user GROUP BY country') result = db.query('SELECT country, COUNT(*) c FROM user GROUP BY country')
for row in result: for row in result:
print row['country'], row['c'] print row['country'], row['c']
The :py:meth:`query() <dataset.Table.query>` method can also be used to
access the underlying SQLAlchemy core API, which allows for the
programmatic construction of more complex queries::
table = db['users'].table
statement = table.select(table.c.name.like('%Snoopy%'))
result = db.query(statement)
Exporting data Exporting data
-------------- --------------
While playing around with our database in Python is a nice thing, sometimes we want to use the data or parts of it elsewhere, say in an interactive web application. Therefor ``dataset`` supports serializing rows of data into static files such as JSON using the :py:meth:`freeze() <dataset.freeze>` function:: While playing around with our database in Python is a nice thing, they are
sometimes just a processing stage until we go on to use it in another
place, say in an interactive web application. To make this seamless,
``dataset`` supports serializing rows of data into static JSON and CSV files
such using the :py:meth:`freeze() <dataset.freeze>` function::
# export all users into a single JSON # export all users into a single JSON
result = db['users'].all() result = db['users'].all()
dataset.freeze(result, 'users.json') dataset.freeze(result, 'users.json', format='json')
You can create one file per row by setting ``mode`` to "item":: You can create one file per row by setting ``mode`` to "item"::
# export one JSON file per user # export one JSON file per user
dataset.freeze(result, 'users/{{ id }}.json', mode='item') dataset.freeze(result, 'users/{{ id }}.json', format='json', mode='item')
Since this is a common operation we made it available via command line utility ``datafreeze``. Read more about the `freezefile markup <https://github.com/spiegelonline/datafreeze#example-freezefileyaml>`_. Since this is a common operation we made it available via command line
utility ``datafreeze``. Read more about the `freezefile markup <https://github.com/spiegelonline/datafreeze#example-freezefileyaml>`_.
.. code-block:: bash .. code-block:: bash