Skip to content

Commit 65bed3f

Browse files
committed
Try to handle multiple connections better.
The session database is a hot spot. When multiple requests (e.g. 20) come in at the same time session database contention can get great. The original code didn't retry session database access when the open failed. This resulted in errors at the client. The second pass delayed 0.01 seconds and retried. It was better but we still had multiple second stalls. I think the first request got in, everybody else backed up and then retried at the same time. Again they stepped on each other. With logging I would see many counters go all the way to low single digits or to -1 indicating falure. This pass uses randomint to generate delays from 0-.125 seconds in 5ms increments. This performs better in testing. I rarely saw a counter less than 13 (2 failed retries). Current logging starts after 6 failures and counts down until success or failure.
1 parent d60655d commit 65bed3f

File tree

2 files changed

+11
-4
lines changed

2 files changed

+11
-4
lines changed

CHANGES.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,10 @@ Fixed:
5959
- handle configparser.InterpolationSyntaxError raised if value
6060
has a single %. Seems to afect python 3 only. Reported by
6161
nomicon on IRC. (John Rouillard)
62+
- add random delay to session database retry code between 0 and .125
63+
seconds. This seems to help reduce stalled connections when a
64+
number of connections are made at the same time. Log remaining
65+
retries once 5 of them have been used. (John Rouillard)
6266

6367
Features:
6468

roundup/backends/sessions_dbm.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"""
77
__docformat__ = 'restructuredtext'
88

9-
import os, marshal, time
9+
import os, marshal, time, logging, random
1010

1111
from roundup.anypy.html import html_escape as escape
1212

@@ -132,21 +132,24 @@ def opendb(self, mode):
132132
dbm = __import__(db_type)
133133

134134
retries_left = 15
135+
logger = logging.getLogger('roundup.hyperdb.backend.sessions')
135136
while True:
136137
try:
137138
handle = dbm.open(path, mode)
138139
break
139-
except OSError:
140+
except OSError as e:
140141
# Primarily we want to catch and retry:
141142
# [Errno 11] Resource temporarily unavailable retry
142143
# FIXME: make this more specific
144+
if retries_left < 10:
145+
logger.warning('dbm.open failed, retrying %s left: %s'%(retries_left,e))
143146
if retries_left < 0:
144147
# We have used up the retries. Reraise the exception
145148
# that got us here.
146149
raise
147150
else:
148-
# delay retry a bit
149-
time.sleep(0.01)
151+
# stagger retry to try to get around thundering herd issue.
152+
time.sleep(random.randint(0,25)*.005)
150153
retries_left = retries_left - 1
151154
continue # the while loop
152155
return handle

0 commit comments

Comments
 (0)