@@ -150,3 +150,61 @@ NOTHING DONE
150150./roundup/cgi/__init__.py
151151./roundup/cgi/apache.py
152152./roundup/cgi/client.py
153+
154+ Joseph S. Myers notes:
155+ >The key difficulty is undoubtedly dealing with the changes to string types
156+ >- combined with how the extensibility of Roundup means people will have
157+ >Python code in their instances (detectors, etc.), both directly and
158+ >embedded in HTML - which passes strings to Roundup interfaces and gets
159+ >strings from Roundup interfaces.
160+ >
161+ >Roundup makes heavy use of string objects that really are text strings -
162+ >logically, sequences of Unicode code points. Right now, those strings,
163+ >with Python 2, are str objects, encoded in UTF-8. This means that
164+ >people's Python code in their instances, running under Python 2, will
165+ >expect str objects encoded in UTF-8 (and if their code is e.g. generating
166+ >HTML text encoded in UTF-8 to be sent to the user, it never actually has
167+ >to deal with the encoding explicitly, just passes the text through).
168+ >(The experimental Jinja2 templating engine then explicitly converts those
169+ >UTF-8 encoded str objects to unicode objects because that's what Jinja2
170+ >expects to deal with.)
171+ >
172+ >It's quite plausible people's code in their instances will work fine with
173+ >Python 3 if it gets str objects for both Python 2 and Python 3 (UTF-8
174+ >encoded str for Python 2, ordinary Unicode string objects for Python 3).
175+ >It's more likely to break if it gets Python 2 unicode objects, although
176+ >using such objects in Python 2 seems to be how a lot of people do their
177+ >porting to Python 3. And certainly if when an instance is running with
178+ >Python 3, it gets an object that's not a native sequence of Unicode code
179+ >points, but has each UTF-8 byte as a separate element of the str object,
180+ >things will break.
181+ >
182+ >(I have an instance that uses Unicode collation via PyICU on data from
183+ >Roundup, for example. That works fine with UTF-8 str objects in Python 2,
184+ >would work fine with Python 2 unicode objects though I don't use those,
185+ >works fine with Python 3 str objects when used in their native way - the
186+ >same code has a large part also used outside of Roundup that works with
187+ >both Python 2 and Python 3. Actually, I'd like to have a way to make
188+ >Roundup's built-in sorting of database objects use Unicode collation, or
189+ >otherwise have a way of computing a sort key that isn't simply naming a
190+ >particular property as the sort key, but that's another matter.)
191+ >
192+ >But Roundup *also* has strings that are sequences of bytes - String()
193+ >database fields, which can be both. Many are data displayed directly on
194+ >web pages and edited there by the user - those are ordinary strings (UTF-8
195+ >at present). But FileClass objects have a String() content property which
196+ >is arbitrary binary data such as an attached file - which logically should
197+ >appear to the user as a bytes object in Python 3. Except that some
198+ >FileClass objects use that data to store text (e.g. the msg class in the
199+ >classic scheme). So you definitely need a Bytes() alternative to String()
200+ >fields, for binary data, and may or may not also need separate text and
201+ >binary variants of FileClass.
202+ >
203+ >I've found that for text-heavy code, always using str objects for text and
204+ >having them be normal Unicode strings in Python 3 but UTF-8-encoded in
205+ >Python 2 works well with the vast bulk of code being encoding-agnostic and
206+ >just passing the strings around. Obviously things are different for the
207+ >sort of code that mixes text and binary data - that is, the sort of thing
208+ >you describe as systems programs in your porting HOWTO. I don't think
209+ >Roundup really is such a systems program, except in limited areas such as
210+ >dealing with attached files.
0 commit comments