diff options
author | Nishanth Amuluru | 2011-01-08 11:20:57 +0530 |
---|---|---|
committer | Nishanth Amuluru | 2011-01-08 11:20:57 +0530 |
commit | 65411d01d448ff0cd4abd14eee14cf60b5f8fc20 (patch) | |
tree | b4c404363c4c63a61d6e2f8bd26c5b057c1fb09d /parts/django/docs/topics/db/aggregation.txt | |
parent | 2e35094d43b4cc6974172e1febf76abb50f086ec (diff) | |
download | pytask-65411d01d448ff0cd4abd14eee14cf60b5f8fc20.tar.gz pytask-65411d01d448ff0cd4abd14eee14cf60b5f8fc20.tar.bz2 pytask-65411d01d448ff0cd4abd14eee14cf60b5f8fc20.zip |
Added buildout stuff and made changes accordingly
--HG--
rename : profile/management/__init__.py => eggs/djangorecipe-0.20-py2.6.egg/EGG-INFO/dependency_links.txt
rename : profile/management/__init__.py => eggs/djangorecipe-0.20-py2.6.egg/EGG-INFO/not-zip-safe
rename : profile/management/__init__.py => eggs/infrae.subversion-1.4.5-py2.6.egg/EGG-INFO/dependency_links.txt
rename : profile/management/__init__.py => eggs/infrae.subversion-1.4.5-py2.6.egg/EGG-INFO/not-zip-safe
rename : profile/management/__init__.py => eggs/mercurial-1.7.3-py2.6-linux-x86_64.egg/EGG-INFO/dependency_links.txt
rename : profile/management/__init__.py => eggs/mercurial-1.7.3-py2.6-linux-x86_64.egg/EGG-INFO/not-zip-safe
rename : profile/management/__init__.py => eggs/py-1.4.0-py2.6.egg/EGG-INFO/dependency_links.txt
rename : profile/management/__init__.py => eggs/py-1.4.0-py2.6.egg/EGG-INFO/not-zip-safe
rename : profile/management/__init__.py => eggs/zc.buildout-1.5.2-py2.6.egg/EGG-INFO/dependency_links.txt
rename : profile/management/__init__.py => eggs/zc.buildout-1.5.2-py2.6.egg/EGG-INFO/not-zip-safe
rename : profile/management/__init__.py => eggs/zc.recipe.egg-1.3.2-py2.6.egg/EGG-INFO/dependency_links.txt
rename : profile/management/__init__.py => eggs/zc.recipe.egg-1.3.2-py2.6.egg/EGG-INFO/not-zip-safe
rename : profile/management/__init__.py => parts/django/Django.egg-info/dependency_links.txt
rename : taskapp/models.py => parts/django/django/conf/app_template/models.py
rename : taskapp/tests.py => parts/django/django/conf/app_template/tests.py
rename : taskapp/views.py => parts/django/django/conf/app_template/views.py
rename : taskapp/views.py => parts/django/django/contrib/gis/tests/geo3d/views.py
rename : profile/management/__init__.py => parts/django/tests/modeltests/delete/__init__.py
rename : profile/management/__init__.py => parts/django/tests/modeltests/files/__init__.py
rename : profile/management/__init__.py => parts/django/tests/modeltests/invalid_models/__init__.py
rename : profile/management/__init__.py => parts/django/tests/modeltests/m2m_signals/__init__.py
rename : profile/management/__init__.py => parts/django/tests/modeltests/model_package/__init__.py
rename : profile/management/__init__.py => parts/django/tests/regressiontests/bash_completion/__init__.py
rename : profile/management/__init__.py => parts/django/tests/regressiontests/bash_completion/management/__init__.py
rename : profile/management/__init__.py => parts/django/tests/regressiontests/bash_completion/management/commands/__init__.py
rename : profile/management/__init__.py => parts/django/tests/regressiontests/bash_completion/models.py
rename : profile/management/__init__.py => parts/django/tests/regressiontests/delete_regress/__init__.py
rename : profile/management/__init__.py => parts/django/tests/regressiontests/file_storage/__init__.py
rename : profile/management/__init__.py => parts/django/tests/regressiontests/max_lengths/__init__.py
rename : profile/forms.py => pytask/profile/forms.py
rename : profile/management/__init__.py => pytask/profile/management/__init__.py
rename : profile/management/commands/seed_db.py => pytask/profile/management/commands/seed_db.py
rename : profile/models.py => pytask/profile/models.py
rename : profile/templatetags/user_tags.py => pytask/profile/templatetags/user_tags.py
rename : taskapp/tests.py => pytask/profile/tests.py
rename : profile/urls.py => pytask/profile/urls.py
rename : profile/utils.py => pytask/profile/utils.py
rename : profile/views.py => pytask/profile/views.py
rename : static/css/base.css => pytask/static/css/base.css
rename : taskapp/tests.py => pytask/taskapp/tests.py
rename : taskapp/views.py => pytask/taskapp/views.py
rename : templates/base.html => pytask/templates/base.html
rename : templates/profile/browse_notifications.html => pytask/templates/profile/browse_notifications.html
rename : templates/profile/edit.html => pytask/templates/profile/edit.html
rename : templates/profile/view.html => pytask/templates/profile/view.html
rename : templates/profile/view_notification.html => pytask/templates/profile/view_notification.html
rename : templates/registration/activate.html => pytask/templates/registration/activate.html
rename : templates/registration/activation_email.txt => pytask/templates/registration/activation_email.txt
rename : templates/registration/activation_email_subject.txt => pytask/templates/registration/activation_email_subject.txt
rename : templates/registration/logged_out.html => pytask/templates/registration/logged_out.html
rename : templates/registration/login.html => pytask/templates/registration/login.html
rename : templates/registration/logout.html => pytask/templates/registration/logout.html
rename : templates/registration/password_change_done.html => pytask/templates/registration/password_change_done.html
rename : templates/registration/password_change_form.html => pytask/templates/registration/password_change_form.html
rename : templates/registration/password_reset_complete.html => pytask/templates/registration/password_reset_complete.html
rename : templates/registration/password_reset_confirm.html => pytask/templates/registration/password_reset_confirm.html
rename : templates/registration/password_reset_done.html => pytask/templates/registration/password_reset_done.html
rename : templates/registration/password_reset_email.html => pytask/templates/registration/password_reset_email.html
rename : templates/registration/password_reset_form.html => pytask/templates/registration/password_reset_form.html
rename : templates/registration/registration_complete.html => pytask/templates/registration/registration_complete.html
rename : templates/registration/registration_form.html => pytask/templates/registration/registration_form.html
rename : utils.py => pytask/utils.py
Diffstat (limited to 'parts/django/docs/topics/db/aggregation.txt')
-rw-r--r-- | parts/django/docs/topics/db/aggregation.txt | 378 |
1 files changed, 378 insertions, 0 deletions
diff --git a/parts/django/docs/topics/db/aggregation.txt b/parts/django/docs/topics/db/aggregation.txt new file mode 100644 index 0000000..eb21021 --- /dev/null +++ b/parts/django/docs/topics/db/aggregation.txt @@ -0,0 +1,378 @@ +=========== +Aggregation +=========== + +.. versionadded:: 1.1 + +.. currentmodule:: django.db.models + +The topic guide on :doc:`Django's database-abstraction API </topics/db/queries>` +described the way that you can use Django queries that create, +retrieve, update and delete individual objects. However, sometimes you will +need to retrieve values that are derived by summarizing or *aggregating* a +collection of objects. This topic guide describes the ways that aggregate values +can be generated and returned using Django queries. + +Throughout this guide, we'll refer to the following models. These models are +used to track the inventory for a series of online bookstores: + +.. _queryset-model-example: + +.. code-block:: python + + class Author(models.Model): + name = models.CharField(max_length=100) + age = models.IntegerField() + friends = models.ManyToManyField('self', blank=True) + + class Publisher(models.Model): + name = models.CharField(max_length=300) + num_awards = models.IntegerField() + + class Book(models.Model): + isbn = models.CharField(max_length=9) + name = models.CharField(max_length=300) + pages = models.IntegerField() + price = models.DecimalField(max_digits=10, decimal_places=2) + rating = models.FloatField() + authors = models.ManyToManyField(Author) + publisher = models.ForeignKey(Publisher) + pubdate = models.DateField() + + class Store(models.Model): + name = models.CharField(max_length=300) + books = models.ManyToManyField(Book) + + +Generating aggregates over a QuerySet +===================================== + +Django provides two ways to generate aggregates. The first way is to generate +summary values over an entire ``QuerySet``. For example, say you wanted to +calculate the average price of all books available for sale. Django's query +syntax provides a means for describing the set of all books:: + + >>> Book.objects.all() + +What we need is a way to calculate summary values over the objects that +belong to this ``QuerySet``. This is done by appending an ``aggregate()`` +clause onto the ``QuerySet``:: + + >>> from django.db.models import Avg + >>> Book.objects.all().aggregate(Avg('price')) + {'price__avg': 34.35} + +The ``all()`` is redundant in this example, so this could be simplified to:: + + >>> Book.objects.aggregate(Avg('price')) + {'price__avg': 34.35} + +The argument to the ``aggregate()`` clause describes the aggregate value that +we want to compute - in this case, the average of the ``price`` field on the +``Book`` model. A list of the aggregate functions that are available can be +found in the :ref:`QuerySet reference <aggregation-functions>`. + +``aggregate()`` is a terminal clause for a ``QuerySet`` that, when invoked, +returns a dictionary of name-value pairs. The name is an identifier for the +aggregate value; the value is the computed aggregate. The name is +automatically generated from the name of the field and the aggregate function. +If you want to manually specify a name for the aggregate value, you can do so +by providing that name when you specify the aggregate clause:: + + >>> Book.objects.aggregate(average_price=Avg('price')) + {'average_price': 34.35} + +If you want to generate more than one aggregate, you just add another +argument to the ``aggregate()`` clause. So, if we also wanted to know +the maximum and minimum price of all books, we would issue the query:: + + >>> from django.db.models import Avg, Max, Min, Count + >>> Book.objects.aggregate(Avg('price'), Max('price'), Min('price')) + {'price__avg': 34.35, 'price__max': Decimal('81.20'), 'price__min': Decimal('12.99')} + +Generating aggregates for each item in a QuerySet +================================================= + +The second way to generate summary values is to generate an independent +summary for each object in a ``QuerySet``. For example, if you are retrieving +a list of books, you may want to know how many authors contributed to +each book. Each Book has a many-to-many relationship with the Author; we +want to summarize this relationship for each book in the ``QuerySet``. + +Per-object summaries can be generated using the ``annotate()`` clause. +When an ``annotate()`` clause is specified, each object in the ``QuerySet`` +will be annotated with the specified values. + +The syntax for these annotations is identical to that used for the +``aggregate()`` clause. Each argument to ``annotate()`` describes an +aggregate that is to be calculated. For example, to annotate Books with +the number of authors:: + + # Build an annotated queryset + >>> q = Book.objects.annotate(Count('authors')) + # Interrogate the first object in the queryset + >>> q[0] + <Book: The Definitive Guide to Django> + >>> q[0].authors__count + 2 + # Interrogate the second object in the queryset + >>> q[1] + <Book: Practical Django Projects> + >>> q[1].authors__count + 1 + +As with ``aggregate()``, the name for the annotation is automatically derived +from the name of the aggregate function and the name of the field being +aggregated. You can override this default name by providing an alias when you +specify the annotation:: + + >>> q = Book.objects.annotate(num_authors=Count('authors')) + >>> q[0].num_authors + 2 + >>> q[1].num_authors + 1 + +Unlike ``aggregate()``, ``annotate()`` is *not* a terminal clause. The output +of the ``annotate()`` clause is a ``QuerySet``; this ``QuerySet`` can be +modified using any other ``QuerySet`` operation, including ``filter()``, +``order_by``, or even additional calls to ``annotate()``. + +Joins and aggregates +==================== + +So far, we have dealt with aggregates over fields that belong to the +model being queried. However, sometimes the value you want to aggregate +will belong to a model that is related to the model you are querying. + +When specifying the field to be aggregated in an aggregate function, Django +will allow you to use the same :ref:`double underscore notation +<field-lookups-intro>` that is used when referring to related fields in +filters. Django will then handle any table joins that are required to retrieve +and aggregate the related value. + +For example, to find the price range of books offered in each store, +you could use the annotation:: + + >>> Store.objects.annotate(min_price=Min('books__price'), max_price=Max('books__price')) + +This tells Django to retrieve the Store model, join (through the +many-to-many relationship) with the Book model, and aggregate on the +price field of the book model to produce a minimum and maximum value. + +The same rules apply to the ``aggregate()`` clause. If you wanted to +know the lowest and highest price of any book that is available for sale +in a store, you could use the aggregate:: + + >>> Store.objects.aggregate(min_price=Min('books__price'), max_price=Max('books__price')) + +Join chains can be as deep as you require. For example, to extract the +age of the youngest author of any book available for sale, you could +issue the query:: + + >>> Store.objects.aggregate(youngest_age=Min('books__authors__age')) + +Aggregations and other QuerySet clauses +======================================= + +``filter()`` and ``exclude()`` +------------------------------ + +Aggregates can also participate in filters. Any ``filter()`` (or +``exclude()``) applied to normal model fields will have the effect of +constraining the objects that are considered for aggregation. + +When used with an ``annotate()`` clause, a filter has the effect of +constraining the objects for which an annotation is calculated. For example, +you can generate an annotated list of all books that have a title starting +with "Django" using the query:: + + >>> Book.objects.filter(name__startswith="Django").annotate(num_authors=Count('authors')) + +When used with an ``aggregate()`` clause, a filter has the effect of +constraining the objects over which the aggregate is calculated. +For example, you can generate the average price of all books with a +title that starts with "Django" using the query:: + + >>> Book.objects.filter(name__startswith="Django").aggregate(Avg('price')) + +Filtering on annotations +~~~~~~~~~~~~~~~~~~~~~~~~ + +Annotated values can also be filtered. The alias for the annotation can be +used in ``filter()`` and ``exclude()`` clauses in the same way as any other +model field. + +For example, to generate a list of books that have more than one author, +you can issue the query:: + + >>> Book.objects.annotate(num_authors=Count('authors')).filter(num_authors__gt=1) + +This query generates an annotated result set, and then generates a filter +based upon that annotation. + +Order of ``annotate()`` and ``filter()`` clauses +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When developing a complex query that involves both ``annotate()`` and +``filter()`` clauses, particular attention should be paid to the order +in which the clauses are applied to the ``QuerySet``. + +When an ``annotate()`` clause is applied to a query, the annotation is +computed over the state of the query up to the point where the annotation +is requested. The practical implication of this is that ``filter()`` and +``annotate()`` are not commutative operations -- that is, there is a +difference between the query:: + + >>> Publisher.objects.annotate(num_books=Count('book')).filter(book__rating__gt=3.0) + +and the query:: + + >>> Publisher.objects.filter(book__rating__gt=3.0).annotate(num_books=Count('book')) + +Both queries will return a list of Publishers that have at least one good +book (i.e., a book with a rating exceeding 3.0). However, the annotation in +the first query will provide the total number of all books published by the +publisher; the second query will only include good books in the annotated +count. In the first query, the annotation precedes the filter, so the +filter has no effect on the annotation. In the second query, the filter +preceeds the annotation, and as a result, the filter constrains the objects +considered when calculating the annotation. + +``order_by()`` +-------------- + +Annotations can be used as a basis for ordering. When you +define an ``order_by()`` clause, the aggregates you provide can reference +any alias defined as part of an ``annotate()`` clause in the query. + +For example, to order a ``QuerySet`` of books by the number of authors +that have contributed to the book, you could use the following query:: + + >>> Book.objects.annotate(num_authors=Count('authors')).order_by('num_authors') + +``values()`` +------------ + +Ordinarily, annotations are generated on a per-object basis - an annotated +``QuerySet`` will return one result for each object in the original +``QuerySet``. However, when a ``values()`` clause is used to constrain the +columns that are returned in the result set, the method for evaluating +annotations is slightly different. Instead of returning an annotated result +for each result in the original ``QuerySet``, the original results are +grouped according to the unique combinations of the fields specified in the +``values()`` clause. An annotation is then provided for each unique group; +the annotation is computed over all members of the group. + +For example, consider an author query that attempts to find out the average +rating of books written by each author: + + >>> Author.objects.annotate(average_rating=Avg('book__rating')) + +This will return one result for each author in the database, annotated with +their average book rating. + +However, the result will be slightly different if you use a ``values()`` clause:: + + >>> Author.objects.values('name').annotate(average_rating=Avg('book__rating')) + +In this example, the authors will be grouped by name, so you will only get +an annotated result for each *unique* author name. This means if you have +two authors with the same name, their results will be merged into a single +result in the output of the query; the average will be computed as the +average over the books written by both authors. + +Order of ``annotate()`` and ``values()`` clauses +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As with the ``filter()`` clause, the order in which ``annotate()`` and +``values()`` clauses are applied to a query is significant. If the +``values()`` clause precedes the ``annotate()``, the annotation will be +computed using the grouping described by the ``values()`` clause. + +However, if the ``annotate()`` clause precedes the ``values()`` clause, +the annotations will be generated over the entire query set. In this case, +the ``values()`` clause only constrains the fields that are generated on +output. + +For example, if we reverse the order of the ``values()`` and ``annotate()`` +clause from our previous example:: + + >>> Author.objects.annotate(average_rating=Avg('book__rating')).values('name', 'average_rating') + +This will now yield one unique result for each author; however, only +the author's name and the ``average_rating`` annotation will be returned +in the output data. + +You should also note that ``average_rating`` has been explicitly included +in the list of values to be returned. This is required because of the +ordering of the ``values()`` and ``annotate()`` clause. + +If the ``values()`` clause precedes the ``annotate()`` clause, any annotations +will be automatically added to the result set. However, if the ``values()`` +clause is applied after the ``annotate()`` clause, you need to explicitly +include the aggregate column. + +Interaction with default ordering or ``order_by()`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Fields that are mentioned in the ``order_by()`` part of a queryset (or which +are used in the default ordering on a model) are used when selecting the +output data, even if they are not otherwise specified in the ``values()`` +call. These extra fields are used to group "like" results together and they +can make otherwise identical result rows appear to be separate. This shows up, +particularly, when counting things. + +By way of example, suppose you have a model like this:: + + class Item(models.Model): + name = models.CharField(max_length=10) + data = models.IntegerField() + + class Meta: + ordering = ["name"] + +The important part here is the default ordering on the ``name`` field. If you +want to count how many times each distinct ``data`` value appears, you might +try this:: + + # Warning: not quite correct! + Item.objects.values("data").annotate(Count("id")) + +...which will group the ``Item`` objects by their common ``data`` values and +then count the number of ``id`` values in each group. Except that it won't +quite work. The default ordering by ``name`` will also play a part in the +grouping, so this query will group by distinct ``(data, name)`` pairs, which +isn't what you want. Instead, you should construct this queryset:: + + Item.objects.values("data").annotate(Count("id")).order_by() + +...clearing any ordering in the query. You could also order by, say, ``data`` +without any harmful effects, since that is already playing a role in the +query. + +This behavior is the same as that noted in the queryset documentation for +:meth:`~django.db.models.QuerySet.distinct` and the general rule is the same: +normally you won't want extra columns playing a part in the result, so clear +out the ordering, or at least make sure it's restricted only to those fields +you also select in a ``values()`` call. + +.. note:: + You might reasonably ask why Django doesn't remove the extraneous columns + for you. The main reason is consistency with ``distinct()`` and other + places: Django **never** removes ordering constraints that you have + specified (and we can't change those other methods' behavior, as that + would violate our :doc:`/misc/api-stability` policy). + +Aggregating annotations +----------------------- + +You can also generate an aggregate on the result of an annotation. When you +define an ``aggregate()`` clause, the aggregates you provide can reference +any alias defined as part of an ``annotate()`` clause in the query. + +For example, if you wanted to calculate the average number of authors per +book you first annotate the set of books with the author count, then +aggregate that author count, referencing the annotation field:: + + >>> Book.objects.annotate(num_authors=Count('authors')).aggregate(Avg('num_authors')) + {'num_authors__avg': 1.66} |