Skip to content

Commit 0d41c5b

Browse files
author
Ralf Schlatterbeck
committed
Multilink fixes and optimizations:
- Optimisation: Late evaluation of Multilinks (only in rdbms backends): previously we materialized each multilink in a Node -- this creates an SQL query for each multilink (e.g. 'files' and 'messages' for each line in the issue index display) -- even if the multilinks aren't displayed. Now we compute multilinks only if they're accessed (and keep them cached). - Add a filter_iter similar to the existing filter call. This feature is considered experimental. This is currently not used in the web-interface but passes all tests for the filter call except sorting by Multilinks (which isn't supported by SQL and isn't a sane concept anyway). When using filter_iter instead of filter this saves a *lot* of SQL queries: Filter returns only the IDs of Nodes in the database, the additional content of a Node has to be fetched in a separate SQL call. The new filter_iter also returns the IDs of Nodes (one by one, it's an iterator) but pre-seeds the cache with the content of the Node. The information needed for seeding the cache is retrieved in the same SQL query as the ids.
1 parent f7cc42e commit 0d41c5b

File tree

8 files changed

+762
-436
lines changed

8 files changed

+762
-436
lines changed

CHANGES.txt

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,23 @@ Features:
1919
configured default class, or the -c option to the mailgw, or the class
2020
resulting from mail subject parsing. We also accept multiple -S
2121
options for the same class now. (Ralf)
22+
- Optimisation: Late evaluation of Multilinks (only in rdbms backends):
23+
previously we materialized each multilink in a Node -- this creates an
24+
SQL query for each multilink (e.g. 'files' and 'messages' for each
25+
line in the issue index display) -- even if the multilinks aren't
26+
displayed. Now we compute multilinks only if they're accessed (and
27+
keep them cached).
28+
- Add a filter_iter similar to the existing filter call. This feature is
29+
considered experimental. This is currently not used in the
30+
web-interface but passes all tests for the filter call except sorting
31+
by Multilinks (which isn't supported by SQL and isn't a sane concept
32+
anyway). When using filter_iter instead of filter this saves a *lot*
33+
of SQL queries: Filter returns only the IDs of Nodes in the database,
34+
the additional content of a Node has to be fetched in a separate SQL
35+
call. The new filter_iter also returns the IDs of Nodes (one by one,
36+
it's an iterator) but pre-seeds the cache with the content of the
37+
Node. The information needed for seeding the cache is retrieved in the
38+
same SQL query as the ids.
2239

2340
Fixed:
2441

roundup/backends/rdbms_common.py

Lines changed: 172 additions & 91 deletions
Large diffs are not rendered by default.

roundup/date.py

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -249,14 +249,22 @@ def __init__(self, spec='.', offset=0, add_granularity=False,
249249
serving as translation functions.
250250
"""
251251
self.setTranslator(translator)
252+
# Python 2.3+ datetime object
253+
# common case when reading from database: avoid double-conversion
254+
if isinstance(spec, datetime.datetime):
255+
if offset == 0:
256+
self.year, self.month, self.day, self.hour, self.minute, \
257+
self.second = spec.timetuple()[:6]
258+
else:
259+
TZ = get_timezone(tz)
260+
self.year, self.month, self.day, self.hour, self.minute, \
261+
self.second = TZ.localize(spec).utctimetuple()[:6]
262+
self.second += spec.microsecond/1000000.
263+
return
264+
252265
if type(spec) == type(''):
253266
self.set(spec, offset=offset, add_granularity=add_granularity)
254267
return
255-
elif isinstance(spec, datetime.datetime):
256-
# Python 2.3+ datetime object
257-
y,m,d,H,M,S,x,x,x = spec.timetuple()
258-
S += spec.microsecond/1000000.
259-
spec = (y,m,d,H,M,S,x,x,x)
260268
elif hasattr(spec, 'tuple'):
261269
spec = spec.tuple()
262270
elif isinstance(spec, Date):

roundup/hyperdb.py

Lines changed: 54 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -284,18 +284,17 @@ class Proptree(object):
284284
""" Simple tree data structure for optimizing searching of
285285
properties. Each node in the tree represents a roundup Class
286286
Property that has to be navigated for finding the given search
287-
or sort properties. The sort_type attribute is used for
288-
distinguishing nodes in the tree used for sorting or searching: If
289-
it is 0 for a node, that node is not used for sorting. If it is 1,
290-
it is used for both, sorting and searching. If it is 2 it is used
291-
for sorting only.
287+
or sort properties. The need_for attribute is used for
288+
distinguishing nodes in the tree used for sorting, searching or
289+
retrieval: The attribute is a dictionary containing one or several
290+
of the values 'sort', 'search', 'retrieve'.
292291
293292
The Proptree is also used for transitively searching attributes for
294293
backends that do not support transitive search (e.g. anydbm). The
295294
_val attribute with set_val is used for this.
296295
"""
297296

298-
def __init__(self, db, cls, name, props, parent = None):
297+
def __init__(self, db, cls, name, props, parent=None, retr=False):
299298
self.db = db
300299
self.name = name
301300
self.props = props
@@ -308,7 +307,7 @@ def __init__(self, db, cls, name, props, parent = None):
308307
self.children = []
309308
self.sortattr = []
310309
self.propdict = {}
311-
self.sort_type = 0
310+
self.need_for = {'search' : True}
312311
self.sort_direction = None
313312
self.sort_ids = None
314313
self.sort_ids_needed = False
@@ -317,30 +316,34 @@ def __init__(self, db, cls, name, props, parent = None):
317316
self.tree_sort_done = False
318317
self.propclass = None
319318
self.orderby = []
319+
self.sql_idx = None # index of retrieved column in sql result
320320
if parent:
321321
self.root = parent.root
322322
self.depth = parent.depth + 1
323323
else:
324324
self.root = self
325325
self.seqno = 1
326326
self.depth = 0
327-
self.sort_type = 1
327+
self.need_for['sort'] = True
328328
self.id = self.root.seqno
329329
self.root.seqno += 1
330330
if self.cls:
331331
self.classname = self.cls.classname
332332
self.uniqname = '%s%s' % (self.cls.classname, self.id)
333333
if not self.parent:
334334
self.uniqname = self.cls.classname
335+
if retr:
336+
self.append_retr_props()
335337

336-
def append(self, name, sort_type = 0):
338+
def append(self, name, need_for='search', retr=False):
337339
"""Append a property to self.children. Will create a new
338340
propclass for the child.
339341
"""
340342
if name in self.propdict:
341343
pt = self.propdict[name]
342-
if sort_type and not pt.sort_type:
343-
pt.sort_type = 1
344+
pt.need_for[need_for] = True
345+
if retr and isinstance(pt.propclass, Link):
346+
pt.append_retr_props()
344347
return pt
345348
propclass = self.props[name]
346349
cls = None
@@ -349,15 +352,24 @@ def append(self, name, sort_type = 0):
349352
cls = self.db.getclass(propclass.classname)
350353
props = cls.getprops()
351354
child = self.__class__(self.db, cls, name, props, parent = self)
352-
child.sort_type = sort_type
355+
child.need_for = {need_for : True}
353356
child.propclass = propclass
354357
self.children.append(child)
355358
self.propdict[name] = child
359+
if retr and isinstance(child.propclass, Link):
360+
child.append_retr_props()
356361
return child
357362

363+
def append_retr_props(self):
364+
"""Append properties for retrieval."""
365+
for name, prop in self.cls.getprops(protected=1).iteritems():
366+
if isinstance(prop, Multilink):
367+
continue
368+
self.append(name, need_for='retrieve')
369+
358370
def compute_sort_done(self, mlseen=False):
359371
""" Recursively check if attribute is needed for sorting
360-
(self.sort_type > 0) or all children have tree_sort_done set and
372+
('sort' in self.need_for) or all children have tree_sort_done set and
361373
sort_ids_needed unset: set self.tree_sort_done if one of the conditions
362374
holds. Also remove sort_ids_needed recursively once having seen a
363375
Multilink.
@@ -371,7 +383,7 @@ def compute_sort_done(self, mlseen=False):
371383
p.compute_sort_done(mlseen)
372384
if not p.tree_sort_done:
373385
self.tree_sort_done = False
374-
if not self.sort_type:
386+
if 'sort' not in self.need_for:
375387
self.tree_sort_done = True
376388
if mlseen:
377389
self.tree_sort_done = False
@@ -389,7 +401,7 @@ def search(self, search_matches=None, sort=True):
389401
"""
390402
filterspec = {}
391403
for p in self.children:
392-
if p.sort_type < 2:
404+
if 'search' in p.need_for:
393405
if p.children:
394406
p.search(sort = False)
395407
filterspec[p.name] = p.val
@@ -413,7 +425,7 @@ def sortable_children(self, intermediate=False):
413425
too.
414426
"""
415427
return [p for p in self.children
416-
if p.sort_type > 0 and (intermediate or p.sort_direction)]
428+
if 'sort' in p.need_for and (intermediate or p.sort_direction)]
417429

418430
def __iter__(self):
419431
""" Yield nodes in depth-first order -- visited nodes first """
@@ -534,7 +546,6 @@ def _sort(self, val):
534546
curdir = sa.sort_direction
535547
idx += 1
536548
sortattr.append (val)
537-
#print >> sys.stderr, "\nsortattr", sortattr
538549
sortattr = zip (*sortattr)
539550
for dir, i in reversed(zip(directions, dir_idx)):
540551
rev = dir == '-'
@@ -1055,27 +1066,40 @@ def _filter(self, search_matches, filterspec, sort=(None,None),
10551066
"""
10561067
raise NotImplementedError
10571068

1058-
def _proptree(self, filterspec, sortattr=[]):
1069+
def _proptree(self, filterspec, sortattr=[], retr=False):
10591070
"""Build a tree of all transitive properties in the given
10601071
filterspec.
1072+
If we retrieve (retr is True) linked items we don't follow
1073+
across multilinks. We also don't follow if the searched value
1074+
can contain NULL values.
10611075
"""
1062-
proptree = Proptree(self.db, self, '', self.getprops())
1076+
proptree = Proptree(self.db, self, '', self.getprops(), retr=retr)
10631077
for key, v in filterspec.iteritems():
10641078
keys = key.split('.')
10651079
p = proptree
1080+
mlseen = False
10661081
for k in keys:
1067-
p = p.append(k)
1082+
if isinstance (p.propclass, Multilink):
1083+
mlseen = True
1084+
isnull = v == '-1' or v is None
1085+
nullin = isinstance(v, type([])) and ('-1' in v or None in v)
1086+
r = retr and not mlseen and not isnull and not nullin
1087+
p = p.append(k, retr=r)
10681088
p.val = v
10691089
multilinks = {}
10701090
for s in sortattr:
10711091
keys = s[1].split('.')
10721092
p = proptree
1093+
mlseen = False
10731094
for k in keys:
1074-
p = p.append(k, sort_type = 2)
1095+
if isinstance (p.propclass, Multilink):
1096+
mlseen = True
1097+
r = retr and not mlseen
1098+
p = p.append(k, need_for='sort', retr=r)
10751099
if isinstance (p.propclass, Multilink):
10761100
multilinks[p] = True
10771101
if p.cls:
1078-
p = p.append(p.cls.orderprop(), sort_type = 2)
1102+
p = p.append(p.cls.orderprop(), need_for='sort')
10791103
if p.sort_direction: # if an orderprop is also specified explicitly
10801104
continue
10811105
p.sort_direction = s[0]
@@ -1158,14 +1182,21 @@ def filter(self, search_matches, filterspec, sort=[], group=[]):
11581182
This implements a non-optimized version of Transitive search
11591183
using _filter implemented in a backend class. A more efficient
11601184
version can be implemented in the individual backends -- e.g.,
1161-
an SQL backen will want to create a single SQL statement and
1185+
an SQL backend will want to create a single SQL statement and
11621186
override the filter method instead of implementing _filter.
11631187
"""
11641188
sortattr = self._sortattr(sort = sort, group = group)
11651189
proptree = self._proptree(filterspec, sortattr)
11661190
proptree.search(search_matches)
11671191
return proptree.sort()
11681192

1193+
# non-optimized filter_iter, a backend may chose to implement a
1194+
# better version that provides a real iterator that pre-fills the
1195+
# cache for each id returned. Note that the filter_iter doesn't
1196+
# promise to correctly sort by multilink (which isn't sane to do
1197+
# anyway).
1198+
filter_iter = filter
1199+
11691200
def count(self):
11701201
"""Get the number of nodes in this class.
11711202

0 commit comments

Comments
 (0)