fixed the dbm indexer test for unicode under python2.

rouilj · rouilj · commit 2f2b83e0b24c · 2019-10-30T17:48:48.000-04:00
Replaced str(text),upper() with text.upper(). The text variable is
already a string or unicode. Also changed the final line in the method
from:

   re.findall(pat,text)

to

   re.findall(pat,text,re.UNICODE).

Otherwise it was turning u'Spr\xfcnge' into a wordlist of two "words"
['Spr', 'cnge'] or some such. So those two "words" were in the index
and didn't match the search for u'Spr\xfcnge'.
diff --git a/roundup/backends/indexer_dbm.py b/roundup/backends/indexer_dbm.py
@@ -132,11 +132,11 @@ def text_splitter(self, text):
         """Split text/plain string into a list of words
         """
         # case insensitive
-        text = str(text).upper()
+        text = text.upper()
 
         # Split the raw text
         return re.findall(r'\b\w{%d,%d}\b' % (self.minlength, self.maxlength),
-                          text)
+                          text, re.UNICODE)
 
     # we override this to ignore too short and too long words
     # and also to fix a bug - the (fail) case.