Skip to content

Commit b6dc3ff

Browse files
committed
issue2551328/issue2551264 unneeded next link and total_count incorrect
Fix: issue2551328 - REST results show next link if number of results is a multiple of page size. (Found by members of team 3 in the UMass-Boston CS682 Spring 2024 class.) issue2551264 - REST X-Total-Count header and @total_size count incorrect when paginated These issues arose because we retrieved the exact number of rows from the database as requested by the user using the @page_size parameter. With this changeset, we retrieve up to 10 million + 1 rows from the database. If the total number of rows exceeds 10 million, we set the total_count indicators to -1 as an invalid size. (The max number of requested rows (default 10 million +1) can be modified by the admin through interfaces.py.) By retrieving more data than necessary, we can calculate the total count by adding @page_index*@page_size to the number of rows returned by the query. Furthermore, since we return more than @page_size rows, we can determine the existence of a row at @page_size+1 and use that information to determine if a next link should be provided. Previously, a next link was returned if @page_size rows were retrieved. This change does not guarantee that the user will get @page_size rows returned. Access policy filtering occurs after the rows are returned, and discards rows inaccessible by the user. Using the current @page_index/@page_size it would be difficult to have the roundup code refetch data and make sure that a full @page_size set of rows is returned. E.G. @page_size=100 and 5 of them are dropped due to access restrictions. We then fetch 10 items and add items 1-4 and 6 (5 is inaccessible). There is no way to calculate the new database offset at: @page_index*@page_size + 6 from the URL. We would need to add an @page_offset=6 or something. This could work since the client isn't adding 1 to @page_index to get the next page. Thanks to HATEOAS, the client just uses the 'next' url. But I am not going to cross that bridge without a concrete use case. This can also be handled client side by merging a short response with the next response and re-paginating client side. Also added extra index markers to the docs to highlight use of interfaces.py.
1 parent d21ee54 commit b6dc3ff

File tree

7 files changed

+313
-20
lines changed

7 files changed

+313
-20
lines changed

CHANGES.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,14 @@ Fixed:
105105
which clobbers the info about the template issue. As a stop-gap set
106106
the line number to -1 so the original traceback can be seen. This
107107
could be a bug in ZopeTAL. (John Rouillard)
108+
- issue2551328 - REST results show next link if number of results is a
109+
multiple of page size. There should be no next link. (Found by Patel
110+
Malav and ... of the UMass-Boston CS682 Spring 2024 class; fix John
111+
Rouillard)
112+
- issue2551264 - REST X-Total-Count header and @total_size count
113+
incorrect when paginated - correct values are now returned.
114+
(John Rouillard)
115+
108116

109117
Features:
110118

doc/admin_guide.txt

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -259,6 +259,8 @@ to compress responses on the fly. The python standard library includes
259259
gzip support. For brotli or zstd you will need to install packages. See
260260
the `installation documentation`_ for details.
261261

262+
.. index:: single: interfaces.py; configuring http compression
263+
262264
Some assets will not be compressed on the fly. Assets with mime types
263265
of "image/png" or "image/jpeg" will not be compressed. You
264266
can add mime types to the list by using ``interfaces.py`` as discussed
@@ -320,6 +322,31 @@ get the gzip version and not a brotli compressed version. This
320322
mechanism allows the admin to allow use of brotli and zstd for
321323
dynamic content, but not for static content.
322324

325+
.. index:: single: interfaces.py; setting REST maximum result limit
326+
327+
Configuring REST Maximum Result Limit
328+
=====================================
329+
330+
To prevent denial of service (DOS) and limit user wait time for an
331+
unbounded request, the REST endpoint has a maximum limit on the number
332+
of rows that can be returned. By default, this is set to 10 million.
333+
This setting applies to all users of the REST interface. If you want
334+
to change this limit, you can add the following code to the
335+
``interfaces.py`` file in your tracker::
336+
337+
# change max response rows
338+
from roundup.rest import RestfulInstance
339+
RestfulInstance.max_response_row_size = 26
340+
341+
This code will set the maximum number of rows to 25 (one less than the
342+
value). Note that this setting is rarely used and is not available in
343+
the tracker's ``config.ini`` file. Setting it through this mechanism
344+
allows you to enter a string or number that may break Roundup, such as
345+
"asdf" or 0. In general, it is recommended to keep the limit at its
346+
default value. However, this option is available for cases when a
347+
request requires more than 10 million rows and pagination using
348+
``@page_index`` and ``@page_size=9999999`` is not possible.
349+
323350
Adding a Web Content Security Policy (CSP)
324351
==========================================
325352

doc/customizing.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2176,6 +2176,8 @@ string in the nosy reaction function.
21762176
Changing How the Core Code Works
21772177
--------------------------------
21782178

2179+
.. index:: single: interfaces.py; cache-control headers
2180+
21792181
Changing Cache-Control Headers
21802182
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
21812183

@@ -2202,6 +2204,7 @@ path in the tracker's `config.ini`. In the example above:
22022204

22032205
Note that a file name match overrides the mime type settings.
22042206

2207+
.. index:: single: interfaces.py; password complexity checking
22052208

22062209
Implement Password Complexity Checking
22072210
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -2241,6 +2244,8 @@ it replaces the setPassword method in the Password class. The new
22412244
version validates that the password is sufficiently complex. Then it
22422245
passes off the setting of password to the original method.
22432246

2247+
.. index:: single: interfaces.py; interpreting time interval values
2248+
22442249
Enhance Time Intervals Forms
22452250
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
22462251

doc/reference.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1556,6 +1556,8 @@ flexible extension point mechanism.
15561556
.. _interfaces.py:
15571557
.. _modifying the core of Roundup:
15581558

1559+
.. index:: single: interfaces.py; hooking into the roundup core
1560+
15591561
interfaces.py - hooking into the core of Roundup
15601562
================================================
15611563

doc/rest.txt

Lines changed: 27 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -491,11 +491,18 @@ Details are in the sections below.
491491
/data/\ *class* Collection
492492
--------------------------
493493

494-
When performing the ``GET`` method on a class (e.g. ``/data/issue``),
495-
the ``data`` object includes the number of items available in
496-
``@total_size``. A a ``collection`` list follows which contains the id
497-
and link to the respective item. For example a get on
498-
https://.../rest/data/issue returns::
494+
When you use the ``GET`` method on a class (like ``/data/issue``), the
495+
``data`` will include the number of available items in
496+
``@total_size``. If the size exceeds the administrative limit (which
497+
is 10 million by default), ``@total_size`` will be set to ``-1``. To
498+
navigate to the last page of results, you can use the ``next`` links
499+
or increment ``@page_index`` until the result does not include a
500+
``next`` ``@link`` or ``@total_size`` is not ``-1``. The value of the
501+
HTTP header ``X-Count-Total`` is the same as ``@total_size``.
502+
503+
A ``collection`` list contains the id and link to the
504+
respective item. For example a get on https://.../rest/data/issue
505+
returns::
499506

500507
{
501508
"data": {
@@ -517,11 +524,17 @@ https://.../rest/data/issue returns::
517524
Collection endpoints support a number of features as seen in the next
518525
sections.
519526

520-
A server may implement a default maximum number of items in the
521-
collection. This can be used to prevent denial of service (DOS). As
522-
a result all clients must be programmed to expect pagination
523-
decorations in the response. See the section on pagination below for
524-
details.
527+
Having an empty ``collection`` does not mean next next link will not
528+
return more data. The row limit is applied when the query is made to
529+
the database. The result set is then filtered, removing rows that the
530+
user does not have permission to access. So it is possible to have no
531+
data items on a page because the user does not have access to them. If
532+
you use ``@page_size`` near the administrative limit, you may receive
533+
fewer rows than requested. However, this does not mean you are out of
534+
data.
535+
536+
All clients must be programmed to expect pagination decorations in the
537+
response. See the section on pagination below for details.
525538

526539
Searching
527540
~~~~~~~~~
@@ -591,7 +604,9 @@ searching keyword class not issue class) will return matches for
591604
``Foo``, ``foobar``, ``foo taz`` etc.
592605

593606
In all cases the field ``@total_size`` is reported which is the total
594-
number of items available if you were to retrieve all of them.
607+
number of items available if you were to retrieve all of them. See
608+
more details in the parent section about ``@total_size`` and when it
609+
can return ``-1``.
595610

596611
Other data types: Date, Interval, Integer, Number need examples and may
597612
need work to allow range searches. Full text search (e.g. over the
@@ -1055,7 +1070,7 @@ Does not support PUT, DELETE or PATCH.
10551070
/data/\ *class*/\ *id* item
10561071
---------------------------
10571072

1058-
When performing the ``GET`` method on an item
1073+
When you use the ``GET`` method on an item
10591074
(e.g. ``/data/issue/42``), a ``link`` attribute contains the link to
10601075
the item, ``id`` contains the id, ``type`` contains the class name
10611076
(e.g. ``issue`` in the example) and an ``etag`` property can be used

roundup/rest.py

Lines changed: 42 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -440,6 +440,10 @@ class RestfulInstance(object):
440440

441441
api_version = None
442442

443+
# allow 10M row response - can change using interfaces.py
444+
# limit is 1 less than this size.
445+
max_response_row_size = 10000001
446+
443447
def __init__(self, client, db):
444448
self.client = client
445449
self.db = db
@@ -795,7 +799,7 @@ def get_collection(self, class_name, input):
795799
exact_props = {}
796800
page = {
797801
'size': None,
798-
'index': 1 # setting just size starts at page 1
802+
'index': 1, # setting just size starts at page 1
799803
}
800804
verbose = 1
801805
display_props = set()
@@ -910,12 +914,26 @@ def get_collection(self, class_name, input):
910914
l.append(sort)
911915
if exact_props:
912916
kw['exact_match_spec'] = exact_props
913-
if page['size'] is not None and page['size'] > 0:
914-
kw['limit'] = page['size']
917+
if page['size'] is None:
918+
kw['limit'] = self.max_response_row_size
919+
elif page['size'] > 0:
920+
if page['size'] >= self.max_response_row_size:
921+
raise UsageError(_(
922+
"Page size %(page_size)s must be less than admin "
923+
"limit on query result size: %(max_size)s.") % {
924+
"page_size": page['size'],
925+
"max_size": self.max_response_row_size,
926+
})
927+
kw['limit'] = self.max_response_row_size
915928
if page['index'] is not None and page['index'] > 1:
916929
kw['offset'] = (page['index'] - 1) * page['size']
917930
obj_list = class_obj.filter(None, *l, **kw)
918931

932+
# Have we hit the max number of returned rows?
933+
# If so there may be more data that the client
934+
# has to explicitly page through using offset/@page_index.
935+
overflow = len(obj_list) == self.max_response_row_size
936+
919937
# Note: We don't sort explicitly in python. The filter implementation
920938
# of the DB already sorts by ID if no sort option was given.
921939

@@ -930,7 +948,7 @@ def get_collection(self, class_name, input):
930948
for item_id in obj_list:
931949
r = {}
932950
if self.db.security.hasPermission(
933-
'View', uid, class_name, itemid=item_id, property='id'
951+
'View', uid, class_name, itemid=item_id, property='id',
934952
):
935953
r = {'id': item_id, 'link': class_path + item_id}
936954
if display_props:
@@ -942,14 +960,30 @@ def get_collection(self, class_name, input):
942960

943961
result_len = len(result['collection'])
944962

963+
if not overflow: # noqa: SIM108 - no nested ternary
964+
# add back the number of items in the offset.
965+
total_len = kw['offset'] + result_len if 'offset' in kw \
966+
else result_len
967+
else:
968+
# we have hit the max number of rows configured to be
969+
# returned. We hae no idea how many rows can match. We
970+
# could use 0 as the sentinel, but a filter could match 0
971+
# rows. So return -1 indicating we exceeded the result
972+
# max size on this query.
973+
total_len = -1
974+
975+
# truncate result['collection'] to page size
976+
if page['size'] is not None and page['size'] > 0:
977+
result['collection'] = result['collection'][:page['size']]
978+
945979
# pagination - page_index from 1...N
946980
if page['size'] is not None and page['size'] > 0:
947981
result['@links'] = {}
948982
for rel in ('next', 'prev', 'self'):
949983
if rel == 'next':
950984
# if current index includes all data, continue
951-
if page['size'] > result_len: continue # noqa: E701
952-
index = page['index']+1
985+
if page['size'] >= result_len: continue # noqa: E701
986+
index = page['index'] + 1
953987
if rel == 'prev':
954988
if page['index'] <= 1: continue # noqa: E701
955989
index = page['index'] - 1
@@ -964,8 +998,8 @@ def get_collection(self, class_name, input):
964998
for field in input.value
965999
if field.name != "@page_index"])})
9661000

967-
result['@total_size'] = result_len
968-
self.client.setHeader("X-Count-Total", str(result_len))
1001+
result['@total_size'] = total_len
1002+
self.client.setHeader("X-Count-Total", str(total_len))
9691003
self.client.setHeader("Allow", "OPTIONS, GET, POST")
9701004
return 200, result
9711005

0 commit comments

Comments
 (0)