diff options
Diffstat (limited to 'parts/django/docs/topics/db/aggregation.txt')
-rw-r--r-- | parts/django/docs/topics/db/aggregation.txt | 378 |
1 files changed, 0 insertions, 378 deletions
diff --git a/parts/django/docs/topics/db/aggregation.txt b/parts/django/docs/topics/db/aggregation.txt deleted file mode 100644 index eb21021..0000000 --- a/parts/django/docs/topics/db/aggregation.txt +++ /dev/null @@ -1,378 +0,0 @@ -=========== -Aggregation -=========== - -.. versionadded:: 1.1 - -.. currentmodule:: django.db.models - -The topic guide on :doc:`Django's database-abstraction API </topics/db/queries>` -described the way that you can use Django queries that create, -retrieve, update and delete individual objects. However, sometimes you will -need to retrieve values that are derived by summarizing or *aggregating* a -collection of objects. This topic guide describes the ways that aggregate values -can be generated and returned using Django queries. - -Throughout this guide, we'll refer to the following models. These models are -used to track the inventory for a series of online bookstores: - -.. _queryset-model-example: - -.. code-block:: python - - class Author(models.Model): - name = models.CharField(max_length=100) - age = models.IntegerField() - friends = models.ManyToManyField('self', blank=True) - - class Publisher(models.Model): - name = models.CharField(max_length=300) - num_awards = models.IntegerField() - - class Book(models.Model): - isbn = models.CharField(max_length=9) - name = models.CharField(max_length=300) - pages = models.IntegerField() - price = models.DecimalField(max_digits=10, decimal_places=2) - rating = models.FloatField() - authors = models.ManyToManyField(Author) - publisher = models.ForeignKey(Publisher) - pubdate = models.DateField() - - class Store(models.Model): - name = models.CharField(max_length=300) - books = models.ManyToManyField(Book) - - -Generating aggregates over a QuerySet -===================================== - -Django provides two ways to generate aggregates. The first way is to generate -summary values over an entire ``QuerySet``. For example, say you wanted to -calculate the average price of all books available for sale. Django's query -syntax provides a means for describing the set of all books:: - - >>> Book.objects.all() - -What we need is a way to calculate summary values over the objects that -belong to this ``QuerySet``. This is done by appending an ``aggregate()`` -clause onto the ``QuerySet``:: - - >>> from django.db.models import Avg - >>> Book.objects.all().aggregate(Avg('price')) - {'price__avg': 34.35} - -The ``all()`` is redundant in this example, so this could be simplified to:: - - >>> Book.objects.aggregate(Avg('price')) - {'price__avg': 34.35} - -The argument to the ``aggregate()`` clause describes the aggregate value that -we want to compute - in this case, the average of the ``price`` field on the -``Book`` model. A list of the aggregate functions that are available can be -found in the :ref:`QuerySet reference <aggregation-functions>`. - -``aggregate()`` is a terminal clause for a ``QuerySet`` that, when invoked, -returns a dictionary of name-value pairs. The name is an identifier for the -aggregate value; the value is the computed aggregate. The name is -automatically generated from the name of the field and the aggregate function. -If you want to manually specify a name for the aggregate value, you can do so -by providing that name when you specify the aggregate clause:: - - >>> Book.objects.aggregate(average_price=Avg('price')) - {'average_price': 34.35} - -If you want to generate more than one aggregate, you just add another -argument to the ``aggregate()`` clause. So, if we also wanted to know -the maximum and minimum price of all books, we would issue the query:: - - >>> from django.db.models import Avg, Max, Min, Count - >>> Book.objects.aggregate(Avg('price'), Max('price'), Min('price')) - {'price__avg': 34.35, 'price__max': Decimal('81.20'), 'price__min': Decimal('12.99')} - -Generating aggregates for each item in a QuerySet -================================================= - -The second way to generate summary values is to generate an independent -summary for each object in a ``QuerySet``. For example, if you are retrieving -a list of books, you may want to know how many authors contributed to -each book. Each Book has a many-to-many relationship with the Author; we -want to summarize this relationship for each book in the ``QuerySet``. - -Per-object summaries can be generated using the ``annotate()`` clause. -When an ``annotate()`` clause is specified, each object in the ``QuerySet`` -will be annotated with the specified values. - -The syntax for these annotations is identical to that used for the -``aggregate()`` clause. Each argument to ``annotate()`` describes an -aggregate that is to be calculated. For example, to annotate Books with -the number of authors:: - - # Build an annotated queryset - >>> q = Book.objects.annotate(Count('authors')) - # Interrogate the first object in the queryset - >>> q[0] - <Book: The Definitive Guide to Django> - >>> q[0].authors__count - 2 - # Interrogate the second object in the queryset - >>> q[1] - <Book: Practical Django Projects> - >>> q[1].authors__count - 1 - -As with ``aggregate()``, the name for the annotation is automatically derived -from the name of the aggregate function and the name of the field being -aggregated. You can override this default name by providing an alias when you -specify the annotation:: - - >>> q = Book.objects.annotate(num_authors=Count('authors')) - >>> q[0].num_authors - 2 - >>> q[1].num_authors - 1 - -Unlike ``aggregate()``, ``annotate()`` is *not* a terminal clause. The output -of the ``annotate()`` clause is a ``QuerySet``; this ``QuerySet`` can be -modified using any other ``QuerySet`` operation, including ``filter()``, -``order_by``, or even additional calls to ``annotate()``. - -Joins and aggregates -==================== - -So far, we have dealt with aggregates over fields that belong to the -model being queried. However, sometimes the value you want to aggregate -will belong to a model that is related to the model you are querying. - -When specifying the field to be aggregated in an aggregate function, Django -will allow you to use the same :ref:`double underscore notation -<field-lookups-intro>` that is used when referring to related fields in -filters. Django will then handle any table joins that are required to retrieve -and aggregate the related value. - -For example, to find the price range of books offered in each store, -you could use the annotation:: - - >>> Store.objects.annotate(min_price=Min('books__price'), max_price=Max('books__price')) - -This tells Django to retrieve the Store model, join (through the -many-to-many relationship) with the Book model, and aggregate on the -price field of the book model to produce a minimum and maximum value. - -The same rules apply to the ``aggregate()`` clause. If you wanted to -know the lowest and highest price of any book that is available for sale -in a store, you could use the aggregate:: - - >>> Store.objects.aggregate(min_price=Min('books__price'), max_price=Max('books__price')) - -Join chains can be as deep as you require. For example, to extract the -age of the youngest author of any book available for sale, you could -issue the query:: - - >>> Store.objects.aggregate(youngest_age=Min('books__authors__age')) - -Aggregations and other QuerySet clauses -======================================= - -``filter()`` and ``exclude()`` ------------------------------- - -Aggregates can also participate in filters. Any ``filter()`` (or -``exclude()``) applied to normal model fields will have the effect of -constraining the objects that are considered for aggregation. - -When used with an ``annotate()`` clause, a filter has the effect of -constraining the objects for which an annotation is calculated. For example, -you can generate an annotated list of all books that have a title starting -with "Django" using the query:: - - >>> Book.objects.filter(name__startswith="Django").annotate(num_authors=Count('authors')) - -When used with an ``aggregate()`` clause, a filter has the effect of -constraining the objects over which the aggregate is calculated. -For example, you can generate the average price of all books with a -title that starts with "Django" using the query:: - - >>> Book.objects.filter(name__startswith="Django").aggregate(Avg('price')) - -Filtering on annotations -~~~~~~~~~~~~~~~~~~~~~~~~ - -Annotated values can also be filtered. The alias for the annotation can be -used in ``filter()`` and ``exclude()`` clauses in the same way as any other -model field. - -For example, to generate a list of books that have more than one author, -you can issue the query:: - - >>> Book.objects.annotate(num_authors=Count('authors')).filter(num_authors__gt=1) - -This query generates an annotated result set, and then generates a filter -based upon that annotation. - -Order of ``annotate()`` and ``filter()`` clauses -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -When developing a complex query that involves both ``annotate()`` and -``filter()`` clauses, particular attention should be paid to the order -in which the clauses are applied to the ``QuerySet``. - -When an ``annotate()`` clause is applied to a query, the annotation is -computed over the state of the query up to the point where the annotation -is requested. The practical implication of this is that ``filter()`` and -``annotate()`` are not commutative operations -- that is, there is a -difference between the query:: - - >>> Publisher.objects.annotate(num_books=Count('book')).filter(book__rating__gt=3.0) - -and the query:: - - >>> Publisher.objects.filter(book__rating__gt=3.0).annotate(num_books=Count('book')) - -Both queries will return a list of Publishers that have at least one good -book (i.e., a book with a rating exceeding 3.0). However, the annotation in -the first query will provide the total number of all books published by the -publisher; the second query will only include good books in the annotated -count. In the first query, the annotation precedes the filter, so the -filter has no effect on the annotation. In the second query, the filter -preceeds the annotation, and as a result, the filter constrains the objects -considered when calculating the annotation. - -``order_by()`` --------------- - -Annotations can be used as a basis for ordering. When you -define an ``order_by()`` clause, the aggregates you provide can reference -any alias defined as part of an ``annotate()`` clause in the query. - -For example, to order a ``QuerySet`` of books by the number of authors -that have contributed to the book, you could use the following query:: - - >>> Book.objects.annotate(num_authors=Count('authors')).order_by('num_authors') - -``values()`` ------------- - -Ordinarily, annotations are generated on a per-object basis - an annotated -``QuerySet`` will return one result for each object in the original -``QuerySet``. However, when a ``values()`` clause is used to constrain the -columns that are returned in the result set, the method for evaluating -annotations is slightly different. Instead of returning an annotated result -for each result in the original ``QuerySet``, the original results are -grouped according to the unique combinations of the fields specified in the -``values()`` clause. An annotation is then provided for each unique group; -the annotation is computed over all members of the group. - -For example, consider an author query that attempts to find out the average -rating of books written by each author: - - >>> Author.objects.annotate(average_rating=Avg('book__rating')) - -This will return one result for each author in the database, annotated with -their average book rating. - -However, the result will be slightly different if you use a ``values()`` clause:: - - >>> Author.objects.values('name').annotate(average_rating=Avg('book__rating')) - -In this example, the authors will be grouped by name, so you will only get -an annotated result for each *unique* author name. This means if you have -two authors with the same name, their results will be merged into a single -result in the output of the query; the average will be computed as the -average over the books written by both authors. - -Order of ``annotate()`` and ``values()`` clauses -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -As with the ``filter()`` clause, the order in which ``annotate()`` and -``values()`` clauses are applied to a query is significant. If the -``values()`` clause precedes the ``annotate()``, the annotation will be -computed using the grouping described by the ``values()`` clause. - -However, if the ``annotate()`` clause precedes the ``values()`` clause, -the annotations will be generated over the entire query set. In this case, -the ``values()`` clause only constrains the fields that are generated on -output. - -For example, if we reverse the order of the ``values()`` and ``annotate()`` -clause from our previous example:: - - >>> Author.objects.annotate(average_rating=Avg('book__rating')).values('name', 'average_rating') - -This will now yield one unique result for each author; however, only -the author's name and the ``average_rating`` annotation will be returned -in the output data. - -You should also note that ``average_rating`` has been explicitly included -in the list of values to be returned. This is required because of the -ordering of the ``values()`` and ``annotate()`` clause. - -If the ``values()`` clause precedes the ``annotate()`` clause, any annotations -will be automatically added to the result set. However, if the ``values()`` -clause is applied after the ``annotate()`` clause, you need to explicitly -include the aggregate column. - -Interaction with default ordering or ``order_by()`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Fields that are mentioned in the ``order_by()`` part of a queryset (or which -are used in the default ordering on a model) are used when selecting the -output data, even if they are not otherwise specified in the ``values()`` -call. These extra fields are used to group "like" results together and they -can make otherwise identical result rows appear to be separate. This shows up, -particularly, when counting things. - -By way of example, suppose you have a model like this:: - - class Item(models.Model): - name = models.CharField(max_length=10) - data = models.IntegerField() - - class Meta: - ordering = ["name"] - -The important part here is the default ordering on the ``name`` field. If you -want to count how many times each distinct ``data`` value appears, you might -try this:: - - # Warning: not quite correct! - Item.objects.values("data").annotate(Count("id")) - -...which will group the ``Item`` objects by their common ``data`` values and -then count the number of ``id`` values in each group. Except that it won't -quite work. The default ordering by ``name`` will also play a part in the -grouping, so this query will group by distinct ``(data, name)`` pairs, which -isn't what you want. Instead, you should construct this queryset:: - - Item.objects.values("data").annotate(Count("id")).order_by() - -...clearing any ordering in the query. You could also order by, say, ``data`` -without any harmful effects, since that is already playing a role in the -query. - -This behavior is the same as that noted in the queryset documentation for -:meth:`~django.db.models.QuerySet.distinct` and the general rule is the same: -normally you won't want extra columns playing a part in the result, so clear -out the ordering, or at least make sure it's restricted only to those fields -you also select in a ``values()`` call. - -.. note:: - You might reasonably ask why Django doesn't remove the extraneous columns - for you. The main reason is consistency with ``distinct()`` and other - places: Django **never** removes ordering constraints that you have - specified (and we can't change those other methods' behavior, as that - would violate our :doc:`/misc/api-stability` policy). - -Aggregating annotations ------------------------ - -You can also generate an aggregate on the result of an annotation. When you -define an ``aggregate()`` clause, the aggregates you provide can reference -any alias defined as part of an ``annotate()`` clause in the query. - -For example, if you wanted to calculate the average number of authors per -book you first annotate the set of books with the author count, then -aggregate that author count, referencing the annotation field:: - - >>> Book.objects.annotate(num_authors=Count('authors')).aggregate(Avg('num_authors')) - {'num_authors__avg': 1.66} |