Skip to content

Commit 709a80b

Browse files
committed
issue2551115/issue2551282 - utf8mb4 support in roundup
Fix issues with utf8 support in Roundup. By default using: utf8mb4 charset utf8mb4_unicode_ci collation (case insensitive) utf8mb4_0900_ci collation (case sensitive) which are settable from config.ini. Sadly I couldn't come up with a way to mange these from one parameter. Doing a compatibility lookup table would have increased the maintenance burden and have me chasing MySQL changes. So I opted for the easy path and have the admins (with more MySQL experience) make the choices. Conversion directions added to upgrading.txt. I don't have any good testing for this. I was able to generate utf8/utf8mb3 tables and load a little data and convert. However this is a poor substitute for a conversion on a working tracker 8-(.
1 parent 128a14e commit 709a80b

File tree

5 files changed

+172
-8
lines changed

5 files changed

+172
-8
lines changed

CHANGES.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,12 @@ python 3.6 or newer (3.4/3.5 might work, but they are not tested).
1616

1717
Fixed:
1818

19+
- issue2551282 - MySQL utf8mb4 issues and
20+
issue2551115 - Use utf8mb4 as a default for MySQL instead of utf8
21+
The default database type and collations have been set to:
22+
utf8mb4, utf8mb4_unicode_ci and utf8mb4_0900_bin. They are (sadly)
23+
configurable from config.ini. Require directions on upgrading the
24+
MySQL db have been documented in upgrading.txt.
1925
- issue2551063 - Rest/Xmlrpc interfaces needs failed login protection.
2026
Failed API login rate limiting with expiring lockout added. (John
2127
Rouillard)

doc/upgrading.txt

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,128 @@ values or if a value must be changed manually.
112112

113113
This will insert the bad API login rate limiting settings.
114114

115+
Update MySQL character set/collations (required)
116+
------------------------------------------------
117+
118+
issue2551282_ and issue2551115_ discuss issues with MySQL's utf8
119+
support. MySQL has variations on utf8 character support. This
120+
version of Roundup expects to use utf8mb4 which is a version of
121+
utf8 that covers all characters, not just the ones in the basic
122+
multilingual plane. Previous versions of Roundup used latin1 or
123+
utf8mb3 (also known as just utf8). Newer versions of MySQL are
124+
supposed to make utf8mb4 and not utf8mb3 the default.
125+
126+
To convert your database, you need to have MySQL 8.0.11 or newer
127+
(April 2018) and a mysql client.
128+
129+
.. warning::
130+
131+
This conversion can damage your database. Back up your
132+
database using mysqldump or other tools. Preferably on a quiet
133+
database. Verify that your database can be restored (or at
134+
least look up directions for restoring it). This is very
135+
important.
136+
137+
We suggest shutting down Roundup's interfaces:
138+
139+
* web
140+
* email
141+
* cron jobs that use Python or roundup-admin
142+
143+
then make your backup.
144+
145+
Then connect to your mysql instance using ``mysql`` with the
146+
information in ``config.ini``. If your tracker's ``config.ini``
147+
includes::
148+
149+
name = roundupdb
150+
host = localhost
151+
user = roundupuser
152+
password = rounduppw
153+
154+
you would run some version of::
155+
156+
mysql -u roundupuser --host localhost -p roundupdb
157+
158+
and supply ``rounduppw`` when prompted.
159+
160+
With the Roundup database quiet, convert the character set for the
161+
database and then for all the tables. To convert the tables you
162+
need a list of them. To get this run::
163+
164+
mysql -sN -u roundupuser --host localhost -p \
165+
-e 'show tables;' roundupdb > /tmp/tracker.tables
166+
167+
The ``-sN`` removes line drawing characters and column headers
168+
from the output. For each table ``<t>`` in the file, run::
169+
170+
ALTER TABLE `<t>` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
171+
172+
You can automate this conversion using sed::
173+
174+
sed -e 's/^/ALTER TABLE `/' \
175+
-e 's/$/` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;/'\
176+
/tmp/tracker.tables> /tmp/tracker.tables.sql
177+
178+
The backticks "`" are required as some of the table names became
179+
MySQL reserved words during Roundup's lifetime.
180+
181+
Inspect ``tracker.tables.sql`` to see if all the lines look
182+
correct. If so then we can start the conversion.
183+
184+
First convert the character set for the database by running::
185+
186+
mysql -u roundupuser --host localhost -p roundupdb
187+
188+
Then at the ``mysql>`` prompt run::
189+
190+
ALTER DATABASE roundupdb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
191+
192+
you should see: ``Query OK, 1 row affected (0.01 sec)``.
193+
194+
Now to modify all the tables run:
195+
196+
\. /tmp/tracker.tables.sql
197+
198+
You will see output similar to::
199+
200+
Query OK, 5 rows affected (0.01 sec)
201+
Records: 5 Duplicates: 0 Warnings: 0
202+
203+
for each table. The rows/records will depend on the number of
204+
entries in the table. This can take a while.
205+
206+
Once you have successfully completed this, copy your tracker's
207+
config.ini to a backup file. Edit ``config.ini`` to use the defaults:
208+
209+
* mysql_charset = utf8mb4
210+
* mysql_collation = utf8mb4_unicode_ci
211+
* mysql_binary_collation = utf8mb4_0900_bin
212+
213+
Also look for a ``~/.my.cnf`` for the roundup user and make sure
214+
that the settings for character set (charset) are utf8mb4 compatible.
215+
216+
To test, run ``roundup-admin -i tracker_home`` and display an
217+
issue designator: e.g. ``display issue10``. Check that the text
218+
fields are properly displayed (e.g. title). Start the web
219+
interface and browse some issues. Again, check that the text
220+
fields display correctly, that the history at the bottom of the
221+
issues displays correctly and if you are using the default full
222+
text search, make sure that that works.
223+
224+
If this works, bring email cron jobs etc. back online.
225+
226+
If this fails, take down the web interface, restore the database
227+
from backup, restore the old config.ini. Then test again and
228+
reach out to the mailing list for help.
229+
230+
We can use assistance in getting these directions corrected or
231+
enhanced. The core Roundup developers don't use MySQL for their
232+
production workloads so we count on users to help us with this.
233+
234+
.. _issue2551282: https://issues.roundup-tracker.org/issue2551282
235+
.. _issue2551115: https://issues.roundup-tracker.org/issue2551115
236+
115237
Disable performance improvement for wsgi mode (optional)
116238
--------------------------------------------------------
117239

roundup/backends/back_mysql.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -100,9 +100,13 @@ def db_create(config):
100100
kwargs = connection_dict(config)
101101
conn = MySQLdb.connect(**kwargs)
102102
cursor = conn.cursor()
103-
command = "CREATE DATABASE %s COLLATE utf8_general_ci" % config.RDBMS_NAME
103+
command = "CREATE DATABASE %s COLLATE %s" % (config.RDBMS_NAME,
104+
config.RDBMS_MYSQL_COLLATION)
104105
if sys.version_info[0] > 2:
105-
command += ' CHARACTER SET utf8'
106+
charset = config.RDBMS_MYSQL_CHARSET
107+
if charset == 'default':
108+
charset = 'utf8mb4' # use full utf set.
109+
command += ' CHARACTER SET %s' % charset
106110
logging.info(command)
107111
cursor.execute(command)
108112
conn.commit()
@@ -652,11 +656,15 @@ def sql_close(self):
652656

653657

654658
class MysqlClass:
655-
case_sensitive_equal = 'COLLATE utf8_bin ='
659+
660+
case_sensitive_equal = None # defined by self.get_case_sensitive_equal()
656661

657662
# TODO: AFAIK its version dependent for MySQL
658663
supports_subselects = False
659664

665+
def get_case_sensitive_equal(self):
666+
return 'COLLATE %s =' % self.db.config.RDBMS_MYSQL_BINARY_COLLATION
667+
660668
def _subselect(self, proptree):
661669
''' "I can't believe it's not a toy RDBMS"
662670
see, even toy RDBMSes like gadfly and sqlite can do sub-selects...

roundup/backends/rdbms_common.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1643,7 +1643,9 @@ class Class(hyperdb.Class):
16431643
case_insensitive_like = 'LIKE'
16441644

16451645
# For some databases (mysql) the = operator for strings ignores case.
1646-
# We define the default here, can be changed in derivative class
1646+
# We define the default here, can be changed in derivative class.
1647+
# If set to any false value, self.get_case_sensitive_equal() is
1648+
# called to set its value.
16471649
case_sensitive_equal = '='
16481650

16491651
# Some DBs order NULL values last. Set this variable in the backend
@@ -1675,6 +1677,16 @@ def disableJournalling(self):
16751677
"""
16761678
self.do_journal = 0
16771679

1680+
def get_case_sensitive_equal(self):
1681+
""" For some databases (mysql) the = operator for strings ignores
1682+
case. We define the default here, can be changed in derivative class.
1683+
1684+
It takes config as an argument because mysql has multiple collations.
1685+
The admin sets both the primary and case sensitive collation in
1686+
config.ini for mysql.
1687+
"""
1688+
raise ValueError("get_case_sensitive_equal called in error")
1689+
16781690
# Editing nodes:
16791691
def create(self, **propvalues):
16801692
""" Create a new node of this class and return its id.
@@ -2800,10 +2812,14 @@ def _filter_sql(self, search_matches, filterspec, srt=[], grp=[], retr=0,
28002812

28012813
# now add to the where clause
28022814
w = []
2815+
if not self.case_sensitive_equal:
2816+
self.case_sensitive_equal = \
2817+
self.get_case_sensitive_equal()
2818+
cse = self.case_sensitive_equal
28032819
for vv, ex in zip(v, exact):
28042820
if ex:
28052821
w.append("_%s._%s %s %s" % (
2806-
pln, k, self.case_sensitive_equal, a))
2822+
pln, k, cse, a))
28072823
args.append(vv)
28082824
else:
28092825
w.append("_%s._%s %s %s ESCAPE %s" % (

roundup/configuration.py

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1495,11 +1495,23 @@ def str2value(self, value):
14951495
"Name of the group to use in the MySQL defaults file (.my.cnf).\n"
14961496
"Only used in MySQL connections."),
14971497
(Option, 'mysql_charset', 'utf8mb4',
1498-
"Charset to use for mysql connection,\n"
1499-
"use 'default' for the mysql default, no charset option\n"
1500-
"is used when creating the connection in that case.\n"
1498+
"Charset to use for mysql connection and databases.\n"
1499+
"If set to 'default', no charset option is used when\n"
1500+
"creating the db connection and utf8mb4 is used for the\n"
1501+
"database charset.\n"
15011502
"Otherwise any permissible mysql charset is allowed here.\n"
15021503
"Only used in MySQL connections."),
1504+
(Option, 'mysql_collation', 'utf8mb4_unicode_ci',
1505+
"Comparison/order to use for mysql database/table collations.\n"
1506+
"When upgrading, you can use 'utf8' to match the\n"
1507+
"depricated 'utf8mb3'. This must be compatible with the\n"
1508+
"mysql_charset setting above. Only used by MySQL."),
1509+
(Option, 'mysql_binary_collation', 'utf8mb4_0900_bin',
1510+
"Comparison/order to use for mysql database/table collations\n"
1511+
"when matching case. When upgrading, you can use 'utf8_bin'\n"
1512+
"to match the depricated 'utf8mb3_bin' collation. This must\n"
1513+
"be compatible with the mysql_collation above. Only used\n"
1514+
"by MySQL."),
15031515
(IntegerNumberGeqZeroOption, 'sqlite_timeout', '30',
15041516
"Number of seconds to wait when the SQLite database is locked\n"
15051517
"Default: use a 30 second timeout (extraordinarily generous)\n"

0 commit comments

Comments
 (0)