Skip to content

Commit 2f2b83e

Browse files
committed
fixed the dbm indexer test for unicode under python2.
Replaced str(text),upper() with text.upper(). The text variable is already a string or unicode. Also changed the final line in the method from: re.findall(pat,text) to re.findall(pat,text,re.UNICODE). Otherwise it was turning u'Spr\xfcnge' into a wordlist of two "words" ['Spr', 'cnge'] or some such. So those two "words" were in the index and didn't match the search for u'Spr\xfcnge'.
1 parent 3edb7ee commit 2f2b83e

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

roundup/backends/indexer_dbm.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,11 +132,11 @@ def text_splitter(self, text):
132132
"""Split text/plain string into a list of words
133133
"""
134134
# case insensitive
135-
text = str(text).upper()
135+
text = text.upper()
136136

137137
# Split the raw text
138138
return re.findall(r'\b\w{%d,%d}\b' % (self.minlength, self.maxlength),
139-
text)
139+
text, re.UNICODE)
140140

141141
# we override this to ignore too short and too long words
142142
# and also to fix a bug - the (fail) case.

0 commit comments

Comments
 (0)