Skip to content

Commit 63439b3

Browse files
committed
fix: out of memory error when importing under postgresql
If you try importing more than 20k items under postgresql you can run out of memory: psycopg2.errors.OutOfMemory: out of shared memory HINT: You might need to increase max_locks_per_transaction. Tuning memory may help, it's unknown at this point. This checkin forces a commit to the postgres database after 10,000 rows have been added. This clears out the savepoints for each row and starts a new transaction. back_postgresql.py: Implement commit mechanism in checkpoint_data(). Add two class level attributes for tracking the number of savepoints and the limit when the commit should happen. roundup_admin.py: implement pragma and dynamically create the config item RDBMS_SAVEPOINT_LIMIT used by checkpoint_data. Also fixed formatting of descriptions when using pragma list in verbose mode. admin_guide.txt, upgrading.txt: Document change and use of pragma savepoint_limit in roundup-admin for changing the default of 10,000. test/db_test_base.py: add some more asserts. In existing testAdminImportExport, set the savepoint limit to 5 to test setting method and so that the commit code will be run by existing tests. This provides coverage, but does not actually test that the commit is done every 5 savepoints 8-(. The verification of every 5 savepoints was done manually using a pdb breakpoint just before the commit. acknowledgements.txt: Added 2.4.0 section mentioning Norbert as he has done a ton of testing with much larger datasets than I can test with.
1 parent 69c68b5 commit 63439b3

File tree

7 files changed

+140
-25
lines changed

7 files changed

+140
-25
lines changed

CHANGES.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ Fixed:
5959
- Fix error handling so failure during import of a non-user item
6060
doesn't cause a second traceback. (Found by Norbert Schlemmer, fix
6161
John Rouillard)
62+
- Handle out of memory error when importing large trackers in
63+
PostgreSQL. (Found by Norbert Schlemmer, extensive testing by
64+
Norbert, fix John Rouillard)
6265

6366
Features:
6467

doc/acknowledgements.txt

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,24 @@ ideas and everything else that helped!
1616

1717
.. _`Announcement with changelog for current release.`: announcement.html
1818

19+
2.4
20+
---
21+
22+
2.4.0
23+
~~~~~
24+
25+
Maintainer: John Rouillard
26+
27+
Release Manager: John Rouillard
28+
29+
Developer activity by changesets::
30+
31+
TBD
32+
33+
Other contributers
34+
35+
Norbert Schlemmer
36+
1937
2.3
2038
---
2139

doc/admin_guide.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -962,6 +962,16 @@ Migrating Backends
962962
move the new tracker home into its place.
963963
9. Restart web and email frontends.
964964

965+
If you are importing into PostgreSQL, it autocommits the data every
966+
10000 objects/rows by default. This can slow down importing, but it
967+
prevents an out of memory error caused by using a savepoint for each
968+
object. You can control the commit frequency by using::
969+
970+
pragma savepoint_limit=20000
971+
972+
to set a higher or lower number in roundup-admin. In this example a
973+
commit will be done every 20,000 objects/rows. The pragma can also be
974+
set on the roundup-admin command line as described below.
965975

966976
Moving a Tracker
967977
----------------

doc/upgrading.txt

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,20 @@ It is unlikey that you will care unless you have done some expert
141141
level Roundup customization. If you have, use one of the imports above
142142
if you plan on running on Python 3.13 (expected in 2024) or newer.
143143

144+
Fixing PostgreSQL Out of Memory Errors when Importing Tracker (info)
145+
--------------------------------------------------------------------
146+
147+
Importing a tracker into PostgreSQL can run out of memory with the
148+
error::
149+
150+
psycopg2.errors.OutOfMemory: out of shared memory
151+
HINT: You might need to increase max_locks_per_transaction.
152+
153+
before changing your PostgreSQL configuration, try changing the pragma
154+
``savepoint_limit`` to a lower value. By default it is set to
155+
``10000``. In some cases this may be too high. See the `administration
156+
guide`_ for further details.
157+
144158
.. index:: Upgrading; 2.2.0 to 2.3.0
145159

146160
Migrating from 2.2.0 to 2.3.0

roundup/admin.py

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,9 @@
3535
from roundup import date, hyperdb, init, password, token_r
3636
from roundup import __version__ as roundup_version
3737
import roundup.instance
38-
from roundup.configuration import (CoreConfig, NoConfigError, OptionUnsetError,
39-
OptionValueError, ParsingOptionError, UserConfig)
38+
from roundup.configuration import (CoreConfig, NoConfigError, Option,
39+
OptionUnsetError, OptionValueError,
40+
ParsingOptionError, UserConfig)
4041
from roundup.i18n import _, get_translation
4142
from roundup.exceptions import UsageError
4243
from roundup.anypy.my_input import my_input
@@ -108,6 +109,7 @@ def __init__(self):
108109
'display_protected': False,
109110
'indexer_backend': "as set in config.ini",
110111
'_reopen_tracker': False,
112+
'savepoint_limit': 10000,
111113
'show_retired': "no",
112114
'_retired_val': False,
113115
'verbose': False,
@@ -116,25 +118,29 @@ def __init__(self):
116118
}
117119
self.settings_help = {
118120
'display_header':
119-
_("Have 'display designator[,designator*]' show header inside "
120-
" []'s before items. Includes retired/active status."),
121+
_("Have 'display designator[,designator*]' show header inside\n"
122+
" []'s before items. Includes retired/active status.\n"),
121123

122124
'display_protected':
123-
_("Have 'display designator' and 'specification class' show "
124-
"protected fields: creator, id etc."),
125+
_("Have 'display designator' and 'specification class' show\n"
126+
" protected fields: creator, id etc.\n"),
125127

126128
'indexer_backend':
127-
_("Set indexer to use when running 'reindex' NYI"),
129+
_("Set indexer to use when running 'reindex' NYI\n"),
128130

129131
'_reopen_tracker':
130-
_("Force reopening of tracker when running each command."),
131-
132-
'show_retired': _("Show retired items in table, list etc. One of 'no', 'only', 'both'"),
133-
'_retired_val': _("internal mapping for show_retired."),
134-
'verbose': _("Enable verbose output: tracing, descriptions..."),
135-
136-
'_inttest': "Integer valued setting. For testing only.",
137-
'_floattest': "Float valued setting. For testing only.",
132+
_("Force reopening of tracker when running each command.\n"),
133+
134+
'savepoint_limit':
135+
_("set the number of rows imported before a database commit is\n"
136+
" done. Used only for imports on PostgreSQL.\n"),
137+
'show_retired': _("Show retired items in table, list etc. "
138+
"One of 'no', 'only', 'both'\n"),
139+
'_retired_val': _("internal mapping for show_retired.\n"),
140+
'verbose': _("Enable verbose output: tracing, descriptions...\n"),
141+
142+
'_inttest': "Integer valued setting. For testing only.\n",
143+
'_floattest': "Float valued setting. For testing only.\n",
138144
}
139145

140146
def get_class(self, classname):
@@ -1049,6 +1055,14 @@ def do_import(self, args, import_files=True):
10491055
if hasattr(csv, 'field_size_limit'):
10501056
csv.field_size_limit(self.db.config.CSV_FIELD_SIZE)
10511057

1058+
# default value is 10000, only go through this if default
1059+
# is different.
1060+
if self.settings['savepoint_limit'] != 10000:
1061+
self.db.config.add_option(Option(self.db.config,
1062+
"rdbms", "savepoint_limit"))
1063+
self.db.config.options["RDBMS_SAVEPOINT_LIMIT"].set(
1064+
self.settings['savepoint_limit'])
1065+
10521066
# directory to import from
10531067
dir = args[0]
10541068

roundup/backends/back_postgresql.py

Lines changed: 52 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -152,12 +152,25 @@ class Database(rdbms_common.Database):
152152
holds the value for the type of db. It is used by indexer to
153153
identify the database type so it can import the correct indexer
154154
module when using native text search mode.
155+
156+
import_savepoint_count:
157+
count the number of savepoints that have been created during
158+
import. Once the limit of savepoints is reached, a commit is
159+
done and this is reset to 0.
160+
155161
"""
156162

157163
arg = '%s'
158164

159165
dbtype = "postgres"
160166

167+
import_savepoint_count = 0
168+
169+
# Value is set from roundup-admin using db.config["RDBMS_SAVEPOINT_LIMIT"]
170+
# or to the default of 10_000 at runtime. Use 0 here to trigger
171+
# initialization.
172+
savepoint_limit = 0
173+
161174
# used by some code to switch styles of query
162175
implements_intersect = 1
163176

@@ -218,20 +231,49 @@ def checkpoint_data(self, savepoint="importing"):
218231
of operation in exception handler because
219232
postgres requires a rollback in case of error
220233
generating exception. Used with
221-
restore_connecion_on_error to handle uniqueness
234+
restore_connection_on_error to handle uniqueness
222235
conflict in import_table().
236+
237+
Savepoints take memory resources. Postgres keeps all
238+
savepoints (rather than overwriting) until a
239+
commit(). Commit every ~10,000 savepoints to prevent
240+
running out of memory on import.
241+
242+
NOTE: a commit outside of this method will not reset the
243+
import_savepoint_count. This can result in an unneeded
244+
commit on a new cursor (that has no savepoints) as there is
245+
no way to find out if there is a savepoint or how many
246+
savepoints are opened on a db connection/cursor.
247+
248+
Because an import is a one shot deal and not part of a long
249+
running daemon (e.g. the roundup-server), I am not too
250+
worried about it. It will just slow the import down a tad.
223251
"""
224-
# Savepoints take resources. Postgres keeps all
225-
# savepoints (rather than overwriting) until a
226-
# commit(). If an import fails because of a resource
227-
# issue with savepoints, uncomment this line. I
228-
# expect it will slow down the import but it should
229-
# eliminate any issue with stored savepoints and
230-
# resource use.
231-
#
232-
# self.sql('RELEASE SAVEPOINT %s' % savepoint)
252+
233253
self.sql('SAVEPOINT %s' % savepoint)
234254

255+
self.import_savepoint_count += 1
256+
257+
if not self.savepoint_limit:
258+
if "RDBMS_SAVEPOINT_LIMIT" in self.config.keys():
259+
# note this config option is created on the fly
260+
# by admin.py::do_import. It is never listed in
261+
# config.ini.
262+
self.savepoint_limit = self.config["RDBMS_SAVEPOINT_LIMIT"]
263+
else:
264+
self.savepoint_limit = 10000
265+
266+
if self.import_savepoint_count > self.savepoint_limit:
267+
# track savepoints and commit every 10000 (or user value)
268+
# so we don't run postgres out of memory. An import of a
269+
# customer's tracker ran out of memory after importing
270+
# ~23000 items with: psycopg2.errors.OutOfMemory: out of
271+
# shared memory HINT: You might need to increase
272+
# max_locks_per_transaction.
273+
274+
self.commit()
275+
self.import_savepoint_count = 0
276+
235277
def restore_connection_on_error(self, savepoint="importing"):
236278
"""Postgres leaves a connection/cursor in an unusable state
237279
after an error. Rollback the transaction to a

test/db_test_base.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3061,6 +3061,7 @@ def testImportExport(self):
30613061
self.db.commit()
30623062

30633063
self.assertEqual(self.db.user.lookup("duplicate"), active_dupe_id)
3064+
self.assertEqual(self.db.user.is_retired(retired_dupe_id), True)
30643065

30653066
finally:
30663067
shutil.rmtree('_test_export')
@@ -3151,12 +3152,25 @@ def stderrwrite(s):
31513152
self.assertRaises(csv.Error, tool.do_import, ['_test_export'])
31523153

31533154
self.nukeAndCreate()
3155+
3156+
# make sure we have an empty db
3157+
with self.assertRaises(IndexError) as e:
3158+
# users 1 and 2 always are created on schema load.
3159+
# so don't use them.
3160+
self.db.user.getnode("5").values()
3161+
31543162
self.db.config.CSV_FIELD_SIZE = 3200
31553163
tool = roundup.admin.AdminTool()
31563164
tool.tracker_home = home
31573165
tool.db = self.db
3166+
# Force import code to commit when more than 5
3167+
# savepoints have been created.
3168+
tool.settings['savepoint_limit'] = 5
31583169
tool.verbose = False
31593170
tool.do_import(['_test_export'])
3171+
3172+
# verify the data is loaded.
3173+
self.db.user.getnode("5").values()
31603174
finally:
31613175
roundup.admin.sys = sys
31623176
shutil.rmtree('_test_export')

0 commit comments

Comments
 (0)