Skip to content

Commit f9fc7b8

Browse files
author
Richard Jones
committed
Fix RDBMS indexer indexing UTF-8 words that encode to > 30 chars
(a better fix would be nice)
1 parent f110473 commit f9fc7b8

File tree

2 files changed

+3
-2
lines changed

2 files changed

+3
-2
lines changed

CHANGES.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Fixed:
99
- Display of Multilinks where linked Class labelprop values are None
1010
- Fix references to the old * Registration Permissions
1111
- Fix missing merge of fix to sf bug 1177057
12+
- Fix RDBMS indexer indexing UTF-8 words that encode to > 30 chars
1213

1314

1415
2005-07-18 0.8.4

roundup/backends/indexer_rdbms.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,8 @@ def add_text(self, identifier, text, mime_type='text/plain'):
5656
for w in re.findall(r'(?u)\b\w{2,25}\b', text)]
5757
words = {}
5858
for word in wordlist:
59-
if is_stopword(word):
60-
continue
59+
if is_stopword(word): continue
60+
if len(word) > 25: continue
6161
words[word] = 1
6262
words = words.keys()
6363

0 commit comments

Comments
 (0)