Some porting advice from Joseph Myers.

eric-s-raymond · eric-s-raymond · commit f7feda65daac · 2017-08-24T14:41:00.000-04:00
diff --git a/2to3-done.txt b/2to3-done.txt
@@ -150,3 +150,61 @@ NOTHING DONE
 ./roundup/cgi/__init__.py
 ./roundup/cgi/apache.py
 ./roundup/cgi/client.py
+
+Joseph S. Myers notes:
+>The key difficulty is undoubtedly dealing with the changes to string types
+>- combined with how the extensibility of Roundup means people will have
+>Python code in their instances (detectors, etc.), both directly and
+>embedded in HTML - which passes strings to Roundup interfaces and gets
+>strings from Roundup interfaces.
+>
+>Roundup makes heavy use of string objects that really are text strings -
+>logically, sequences of Unicode code points.  Right now, those strings,
+>with Python 2, are str objects, encoded in UTF-8.  This means that
+>people's Python code in their instances, running under Python 2, will
+>expect str objects encoded in UTF-8 (and if their code is e.g. generating
+>HTML text encoded in UTF-8 to be sent to the user, it never actually has
+>to deal with the encoding explicitly, just passes the text through).
+>(The experimental Jinja2 templating engine then explicitly converts those
+>UTF-8 encoded str objects to unicode objects because that's what Jinja2
+>expects to deal with.)
+>
+>It's quite plausible people's code in their instances will work fine with
+>Python 3 if it gets str objects for both Python 2 and Python 3 (UTF-8
+>encoded str for Python 2, ordinary Unicode string objects for Python 3).
+>It's more likely to break if it gets Python 2 unicode objects, although
+>using such objects in Python 2 seems to be how a lot of people do their
+>porting to Python 3.  And certainly if when an instance is running with
+>Python 3, it gets an object that's not a native sequence of Unicode code
+>points, but has each UTF-8 byte as a separate element of the str object,
+>things will break.
+>
+>(I have an instance that uses Unicode collation via PyICU on data from
+>Roundup, for example.  That works fine with UTF-8 str objects in Python 2,
+>would work fine with Python 2 unicode objects though I don't use those,
+>works fine with Python 3 str objects when used in their native way - the
+>same code has a large part also used outside of Roundup that works with
+>both Python 2 and Python 3.  Actually, I'd like to have a way to make
+>Roundup's built-in sorting of database objects use Unicode collation, or
+>otherwise have a way of computing a sort key that isn't simply naming a
+>particular property as the sort key, but that's another matter.)
+>
+>But Roundup *also* has strings that are sequences of bytes - String()
+>database fields, which can be both.  Many are data displayed directly on
+>web pages and edited there by the user - those are ordinary strings (UTF-8
+>at present).  But FileClass objects have a String() content property which
+>is arbitrary binary data such as an attached file - which logically should
+>appear to the user as a bytes object in Python 3.  Except that some
+>FileClass objects use that data to store text (e.g. the msg class in the
+>classic scheme).  So you definitely need a Bytes() alternative to String()
+>fields, for binary data, and may or may not also need separate text and
+>binary variants of FileClass.
+>
+>I've found that for text-heavy code, always using str objects for text and
+>having them be normal Unicode strings in Python 3 but UTF-8-encoded in
+>Python 2 works well with the vast bulk of code being encoding-agnostic and
+>just passing the strings around.  Obviously things are different for the
+>sort of code that mixes text and binary data - that is, the sort of thing
+>you describe as systems programs in your porting HOWTO.  I don't think
+>Roundup really is such a systems program, except in limited areas such as
+>dealing with attached files.