diff --git a/README.md b/README.md index 1022772..7150117 100644 --- a/README.md +++ b/README.md @@ -1,96 +1,8 @@ -SQLAlchemy Loading Tools -======================== +dataset: databases for lazy people +================================== -A collection of wrappers and functions to make SQLAlchemy core easier -to use in ETL applications. SQLAlchemy is used only for database -abstraction and not as an ORM, allowing users to write extraction -scripts that can work with multiple database backends. Functions -include: +In short, **dataset** makes reading and writing data in databases as simple as reading and writing JSON files. -* **Automatic schema**. If a column is written that does not - exist on the table, it will be created automatically. -* **Upserts**. Records are either created or updated, depdending on - whether an existing version can be found. -* **Query helpers** for simple queries such as all rows in a table or - all distinct values across a set of columns. - -Examples --------- - -A typical use of ``sqlaload`` would look like this: - - from sqlaload import connect, get_table, distinct, update - - engine = connect('sqlite:///customers.db') - table = get_table(engine, 'customers') - for entry in distinct(engine, table, 'post_code', 'city') - lon, lat = geocode(entry['post_code'], entry['city']) - update(entry, {'lon': lon, 'lat': lat}) - -In this example, we selected all distinct post codes and city names from an imaginary customers database, send them through our geocoding routine and finally updated all matching rows with the returned geo information. - -Another example, updating data in a datastore, might look like this: - - from sqlaload import connect, get_table, upsert - - engine = connect('sqlite:///things.db') - table = get_table(engine, 'data') - - for item in magic_data_source_that_produces_entries(): - assert 'key1' in item - assert 'key2' in item - # this will either insert or update, depending on - # whether an entry with the matching values for - # 'key1' and 'key2' already exists: - upsert(engine, table, item, ['key1', 'key2']) - - -Here's the same example, but using the object-oriented API: - - import sqlaload - - db = sqlaload.create('sqlite:///things.db') - table = db.get_table('data') - - for item in magic_data_source_that_produces_entries(): - assert 'key1' in item - assert 'key2' in item - table.upsert(item, ['key1', 'key2']) - - -Functions ---------- - -The library currently exposes the following functions: - -**Schema management** - -* ``connect(url)``, connect to a database and return an ``engine``. See the [SQLAlchemy documentation](http://docs.sqlalchemy.org/en/rel_0_8/core/engines.html#database-urls) for information about URL schemes and formats. -* ``get_table(engine, table_name)`` will load a table configuration from the database, either reflecting the existing schema or creating a new table (with an ``id`` column). -* ``create_table(engine, table_name)`` and ``load_table(engine, table_name)`` are more explicit than ``get_table`` but allow the same functions. -* ``drop_table(engine, table_name)`` will remove an existing table, deleting all of its contents. -* ``create_column(engine, table, column_name, type)`` adds a new column to a table, ``type`` must be a SQLAlchemy type class. -* ``create_index(engine, table, columns)`` creates an index on the given table, based on a list of strings to specify the included ``columns``. - -**Queries** - -* ``find(engine, table, _limit=N, _offset=N, order_by='id', **kw)`` will retrieve database records. The query will return an iterator that only loads 5000 records at any one time, even if ``_limit`` and ``_offset`` are specified - meaning that ``find`` can be run on tables of arbitrary size. ``order_by`` is a string column name, always returned in ascending order. Finally ``**kw`` can be used to filter columns for equality, e.g. ``find(…, category=5)``. -* ``find_one(engine, table, **kw)``, like ``find`` but will only return the first matching row or ``None`` if no matches were found. -* ``distinct(engine, table, *columns, **kw)`` will return the combined distinct values for ``columns``. ``**kw`` allows filtering the same way it does in ``find``. -* ``all``, alias for ``find`` without filter options. - -**Adding and updating data** - -* ``add_row(engine, table, row, ensure=True, types={})`` add the values in the dictionary ``row`` to the given ``table``. ``ensure`` will check the schema and create the columns if necessary, their types can be specified using the ``types`` dictionary. If no ``types`` are given, the type will be guessed from the first submitted value of the column, defaulting to a text column. -* ``update_row(engine, table, row, unique, ensure=True, types={})`` will update a row or set of rows based on the data in the ``row`` dictionary and the column names specified in ``unique``. The remaining arguments are handled like those in ``add_row``. -* ``upsert(engine, table, row, unique, ensure=True, types={})`` will combine the semantics of ``update_row`` and ``add_row`` by first attempting to update existing data and otherwise (only if no record matching on the ``unique`` keys can be found) creating a new record. -* ``delete(engine, table, **kw)`` will remove records from a table. ``**kw`` is the same as in ``find`` and can be used to limit the set of records to be removed. - - - -Feedback --------- - -Please feel free create issues on the GitHub tracker at [okfn/sqlaload](https://github.com/okfn/sqlaload/issues). For other discussions, join the [okfn-labs](http://lists.okfn.org/mailman/listinfo/okfn-labs) mailing list. +[Read the docs](https://dataset.readthedocs.org/) diff --git a/dataset/__init__.py b/dataset/__init__.py index 99007a5..74d6bc1 100644 --- a/dataset/__init__.py +++ b/dataset/__init__.py @@ -8,10 +8,11 @@ from dataset.persistence.table import Table def connect(url): - """ Opens a new connection to a database. *url* can be any valid `SQLAlchemy engine URL`_. Returns - an instance of :py:class:`dataset.Database. - + """ + Opens a new connection to a database. *url* can be any valid `SQLAlchemy engine URL`_. Returns + an instance of :py:class:`Database `. :: + db = dataset.connect('sqlite:///factbook.db') .. _SQLAlchemy Engine URL: http://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine diff --git a/dataset/persistence/database.py b/dataset/persistence/database.py index 84de871..8241f05 100644 --- a/dataset/persistence/database.py +++ b/dataset/persistence/database.py @@ -32,15 +32,23 @@ class Database(object): @property def tables(self): - """ Get a listing of all tables that exist in the database. """ + """ Get a listing of all tables that exist in the database. + + >>> print db.tables + set([u'user', u'action']) + """ return set(self.metadata.tables.keys() + self._tables.keys()) def create_table(self, table_name): - """ Creates a new table. The new table will automatically have - an `id` column, which is set to be an auto-incrementing integer - as the primary key of the table. + """ + Creates a new table. The new table will automatically have an `id` column, which is + set to be an auto-incrementing integer as the primary key of the table. - Returns a :py:class:`dataset.Table` instance.""" + Returns a :py:class:`Table ` instance. + :: + + table = db.create_table('population') + """ with self.lock: log.debug("Creating table: %s on %r" % (table_name, self.engine)) table = SQLATable(table_name, self.metadata) @@ -51,12 +59,17 @@ class Database(object): return Table(self, table) def load_table(self, table_name): - """ Loads a table. This will fail if the tables does not already + """ + Loads a table. This will fail if the tables does not already exist in the database. If the table exists, its columns will be - reflected and are available on the :py:class:`dataset.Table` + reflected and are available on the :py:class:`Table ` object. - Returns a :py:class:`dataset.Table` instance.""" + Returns a :py:class:`Table ` instance. + :: + + table = db.load_table('population') + """ with self.lock: log.debug("Loading table: %s on %r" % (table_name, self)) table = SQLATable(table_name, self.metadata, autoload=True) @@ -64,9 +77,17 @@ class Database(object): return Table(self, table) def get_table(self, table_name): - """ Loads a table or creates it if it doesn't exist yet. - Returns a :py:class:`dataset.Table` instance. Alternatively to *get_table* - you can also get tables using the dict syntax.""" + """ + Smart wrapper around *load_table* and *create_table*. Either loads a table + or creates it if it doesn't exist yet. + + Returns a :py:class:`Table ` instance. + :: + + table = db.get_table('population') + # you can also use the short-hand syntax: + table = db['population'] + """ with self.lock: if table_name in self._tables: return Table(self, self._tables[table_name]) @@ -79,16 +100,16 @@ class Database(object): return self.get_table(table_name) def query(self, query): - """ Run a statement on the database directly, allowing for the + """ + Run a statement on the database directly, allowing for the execution of arbitrary read/write queries. A query can either be - a plain text string, or a SQLAlchemy expression. The returned + a plain text string, or a `SQLAlchemy expression `_. The returned iterator will yield each result sequentially. + :: - .. code-block:: python - - result = db.query('SELECT * FROM population WHERE population > 10000000') - for row in result: - print row + res = db.query('SELECT user, COUNT(*) c FROM photos GROUP BY user') + for row in res: + print row['user'], row['c'] """ return resultiter(self.engine.execute(query)) diff --git a/dataset/persistence/table.py b/dataset/persistence/table.py index 742e5ac..c3565a9 100644 --- a/dataset/persistence/table.py +++ b/dataset/persistence/table.py @@ -17,53 +17,69 @@ class Table(object): self.database = database self.table = table + @property + def columns(self): + """ + Get a listing of all columns that exist in the table. + + >>> print 'age' in table.columns + True + """ + return set(self.table.columns.keys()) + def drop(self): - """ Drop the table from the database, deleting both the schema + """ + Drop the table from the database, deleting both the schema and all the contents within it. - + Note: the object will be in an unusable state after using this command and should not be used again. If you want to re-create the table, make sure to get a fresh instance from the - :py:class:`dataset.Database`. """ + :py:class:`Database `. + """ with self.database.lock: self.database.tables.pop(self.table.name, None) self.table.drop(engine) def insert(self, row, ensure=True, types={}): - """ Add a row (type: dict) by inserting it into the database. + """ + Add a row (type: dict) by inserting it into the table. If ``ensure`` is set, any of the keys of the row are not - table columns, they will be created automatically. - + table columns, they will be created automatically. + During column creation, ``types`` will be checked for a key - matching the name of a column to be created, and the given + matching the name of a column to be created, and the given SQLAlchemy column type will be used. Otherwise, the type is - guessed from the row's value, defaulting to a simple unicode - field. """ + guessed from the row value, defaulting to a simple unicode + field. + :: + + data = dict(id=10, title='I am a banana!') + table.insert(data, ['id']) + """ if ensure: self._ensure_columns(row, types=types) self.database.engine.execute(self.table.insert(row)) - def update(self, row, unique, ensure=True, types={}): - """ Update a row in the database. The update is managed via - the set of column names stated in ``unique``: they will be + def update(self, row, keys, ensure=True, types={}): + """ + Update a row in the table. The update is managed via + the set of column names stated in ``keys``: they will be used as filters for the data to be updated, using the values - in ``row``. Example: - - .. code-block:: python + in ``row``. + :: + # update all entries with id matching 10, setting their title columns data = dict(id=10, title='I am a banana!') table.update(data, ['id']) - This will update all entries matching the given ``id``, setting - their ``title`` column. - - If keys in ``row`` update columns not present in the table, - they will be created based on the settings of ``ensure`` and - ``types``, matching the behaviour of ``insert``. + If keys in ``row`` update columns not present in the table, + they will be created based on the settings of ``ensure`` and + ``types``, matching the behaviour of :py:meth:`insert() `. """ - if not len(unique): + if not len(keys): return False - clause = [(u, row.get(u)) for u in unique] + clause = [(u, row.get(u)) for u in keys] if ensure: self._ensure_columns(row, types=types) try: @@ -71,17 +87,25 @@ class Table(object): stmt = self.table.update(filters, row) rp = self.database.engine.execute(stmt) return rp.rowcount > 0 - except KeyError, ke: + except KeyError: return False - def upsert(self, row, unique, ensure=True, types={}): - if ensure: - self.create_index(unique) + def upsert(self, row, keys, ensure=True, types={}): + """ + An UPSERT is a smart combination of insert and update. If rows with matching ``keys`` exist + they will be updated, otherwise a new row is inserted in the table. + :: - if not self.update(row, unique, ensure=ensure, types=types): + data = dict(id=10, title='I am a banana!') + table.upsert(data, ['id']) + """ + if ensure: + self.create_index(keys) + + if not self.update(row, keys, ensure=ensure, types=types): self.insert(row, ensure=ensure, types=types) - def delete(self, **kw): + def delete(self, **filter): """ Delete rows from the table. Keyword arguments can be used to add column-based filters. The filter criterion will always be equality: @@ -92,7 +116,7 @@ class Table(object): If no arguments are given, all records are deleted. """ - q = self._args_to_clause(kw) + q = self._args_to_clause(filter) stmt = self.table.delete(q) self.database.engine.execute(stmt) @@ -102,8 +126,8 @@ class Table(object): _type = types[column] else: _type = guess_type(row[column]) - log.debug("Creating column: %s (%s) on %r" % (column, - _type, self.table.name)) + log.debug("Creating column: %s (%s) on %r" % (column, + _type, self.table.name)) self.create_column(column, _type) def _args_to_clause(self, args): @@ -114,13 +138,26 @@ class Table(object): return and_(*clauses) def create_column(self, name, type): + """ + Explicitely create a new column ``name`` of a specified type. + ``type`` must be a `SQLAlchemy column type `_. + :: + + table.create_column('person', sqlalchemy.String) + """ with self.database.lock: if name not in self.table.columns.keys(): col = Column(name, type) col.create(self.table, - connection=self.database.engine) + connection=self.database.engine) def create_index(self, columns, name=None): + """ + Create an index to speed up queries on a table. If no ``name`` is given a random name is created. + :: + + table.create_index(['name', 'country']) + """ with self.database.lock: if not name: sig = abs(hash('||'.join(columns))) @@ -136,53 +173,119 @@ class Table(object): self.indexes[name] = idx return idx - def find_one(self, **kw): - res = list(self.find(_limit=1, **kw)) + def find_one(self, **filter): + """ + Works just like :py:meth:`find() ` but returns only one result. + :: + + row = table.find_one(country='United States') + """ + res = list(self.find(_limit=1, **filter)) if not len(res): return None return res[0] - def find(self, _limit=None, _step=5000, _offset=0, - order_by='id', **kw): - order_by = [self.table.c[order_by].asc()] - args = self._args_to_clause(kw) + def _args_to_order_by(self, order_by): + if order_by[0] == '-': + return self.table.c[order_by[1:]].desc() + else: + return self.table.c[order_by].asc() + + def find(self, _limit=None, _offset=0, _step=5000, + order_by='id', **filter): + """ + Performs a simple search on the table. Simply pass keyword arguments as ``filter``. + :: + + results = table.find(country='France') + results = table.find(country='France', year=1980) + + Using ``_limit``:: + + # just return the first 10 rows + results = table.find(country='France', _limit=10) + + You can sort the results by single or multiple columns. Append a minus sign + to the column name for descending order:: + + # sort results by a column 'year' + results = table.find(country='France', order_by='year') + # return all rows sorted by multiple columns (by year in descending order) + results = table.find(order_by=['country', '-year']) + + For more complex queries, please use :py:meth:`db.query() ` + instead.""" + if isinstance(order_by, (str, unicode)): + order_by = [order_by] + order_by = [self._args_to_order_by(o) for o in order_by] + + args = self._args_to_clause(filter) for i in count(): qoffset = _offset + (_step * i) qlimit = _step if _limit is not None: - qlimit = min(_limit-(_step*i), _step) + qlimit = min(_limit - (_step * i), _step) if qlimit <= 0: break q = self.table.select(whereclause=args, limit=qlimit, - offset=qoffset, order_by=order_by) + offset=qoffset, order_by=order_by) rows = list(self.database.query(q)) if not len(rows): - return + return for row in rows: yield row def __len__(self): + """ + Returns the number of rows in the table. + """ d = self.database.query(self.table.count()).next() return d.values().pop() - def distinct(self, *columns, **kw): + def distinct(self, *columns, **filter): + """ + Returns all rows of a table, but removes rows in with duplicate values in ``columns``. + Interally this creates a `DISTINCT statement `_. + :: + + # returns only one row per year, ignoring the rest + table.distinct('year') + # works with multiple columns, too + table.distinct('year', 'country') + # you can also combine this with a filter + table.distinct('year', country='China') + """ qargs = [] try: columns = [self.table.c[c] for c in columns] - for col, val in kw.items(): - qargs.append(self.table.c[col]==val) + for col, val in filter.items(): + qargs.append(self.table.c[col] == val) except KeyError: return [] q = expression.select(columns, distinct=True, - whereclause=and_(*qargs), - order_by=[c.asc() for c in columns]) + whereclause=and_(*qargs), + order_by=[c.asc() for c in columns]) return self.database.query(q) def all(self): - """ Return all records in the table, ordered by their - ``id``. This is an alias for calling ``find`` without - any arguments. """ + """ + Returns all rows of the table as simple dictionaries. This is simply a shortcut + to *find()* called with no arguments. + :: + + rows = table.all()""" return self.find() + def __iter__(self): + """ + Allows for iterating over all rows in the table without explicetly + calling :py:meth:`all() `. + :: + + for row in table: + print row + """ + for row in self.all(): + yield row diff --git a/docs/_static/dataset-logo.png b/docs/_static/dataset-logo.png new file mode 100644 index 0000000..aa7e3b7 Binary files /dev/null and b/docs/_static/dataset-logo.png differ diff --git a/docs/_static/knight_mozilla_on.jpg b/docs/_static/knight_mozilla_on.jpg new file mode 100644 index 0000000..90f256a Binary files /dev/null and b/docs/_static/knight_mozilla_on.jpg differ diff --git a/docs/_themes/README.rst b/docs/_themes/README.md similarity index 83% rename from docs/_themes/README.rst rename to docs/_themes/README.md index dd8d7c0..91f0d6d 100755 --- a/docs/_themes/README.rst +++ b/docs/_themes/README.md @@ -6,14 +6,14 @@ his projects. It is a derivative of Mitsuhiko's themes for Flask and Flask relat projects. To use this style in your Sphinx documentation, follow this guide: -1. put this folder as _themes into your docs folder. Alternatively +1. put this folder as _themes into your docs folder. Alternatively you can also use git submodules to check out the contents there. -2. add this to your conf.py: :: +2. add this to your conf.py: sys.path.append(os.path.abspath('_themes')) html_theme_path = ['_themes'] - html_theme = 'flask' + html_theme = 'kr' The following themes exist: diff --git a/docs/_themes/kr/autotoc.html b/docs/_themes/kr/autotoc.html new file mode 100644 index 0000000..3e1ab00 --- /dev/null +++ b/docs/_themes/kr/autotoc.html @@ -0,0 +1,25 @@ +

{{ _('Table Of Contents') }}

+ + +
    + + + diff --git a/docs/_themes/kr/layout.html b/docs/_themes/kr/layout.html index 4f6bf1d..7b67150 100755 --- a/docs/_themes/kr/layout.html +++ b/docs/_themes/kr/layout.html @@ -1,71 +1,45 @@ {%- extends "basic/layout.html" %} {%- block extrahead %} {{ super() }} + {% if theme_touch_icon %} {% endif %} {% endblock %} -{%- block relbar2 %}{% endblock %} + + +{% block sidebar2 %} + {{ sidebar() }} +{% endblock %} + {%- block footer %} -