Skip to content

Commit 8b4eb73

Browse files
committed
issue2550653: xapian search, stemming is not working
This is a partial fix for the issue. It does make stemming work (so searching for silent will also return docs with silently in them). However to do this we need to lowercase the text so the porter stemmer will work. This means capitalization is not preserved. Tests in test/test_indexer for xapian backend all pass. David Wolever (wolever) did the work.
1 parent b354961 commit 8b4eb73

File tree

4 files changed

+29
-3
lines changed

4 files changed

+29
-3
lines changed

CHANGES.txt

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,14 @@ Fixed:
169169
the same exact nosy list. Fixed a missing reinitialization that has
170170
to occur every time though the loop in do_set. Manual tests work.
171171
(John Rouillard)
172-
172+
- issue2550653: xapian search, stemming is not working
173+
This is a partial fix for the issue. It does make stemming work
174+
(so searching for silent will also return docs with silently in
175+
them). However to do this we need to lowercase the text so the
176+
porter stemmer will work. This means capitalization is not
177+
preserved. Fix done by David Wolever (wolever). Committed and doc
178+
updates John Rouillard.
179+
173180
2016-01-11: 1.5.1
174181

175182
Pay attention:

doc/installation.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,11 @@ Xapian full-text indexer
6767

6868
Roundup requires Xapian 1.0.0 or newer.
6969

70+
Note that capitalization is not preserved by the Xapian search.
71+
This is required to make the porter stemmer work so that searching
72+
for silent also returns documents with the word silently. Note that
73+
the current stemming implementation is designed for English.
74+
7075
Whoosh full-text indexer
7176
The Whoosh_ full-text indexer is also supported and will be used by
7277
default if it is available (and Xapian is not installed). This is

doc/upgrading.txt

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,20 @@ setting this value in the [main] section of the tracker's
6565
# Possible values: xapian, whoosh, native (internal).
6666
indexer =
6767

68+
Stemming improved in Xapian Indexer
69+
-----------------------------------
70+
71+
Stemming allows a search for "silent" also match silently. The Porter
72+
stemmer in Xapian works with lowercase English text. In this release we
73+
lowercase the documents as they are put into the indexer.
74+
75+
This means capitalization is not preserved, but produces more hits by
76+
using the stemmer.
77+
78+
You will need to do a roundup-admin reindex if you are using the
79+
Xapian full text indexer on your tracker.
80+
81+
6882
New config file option 'replyto_address' added
6983
----------------------------------------------
7084

roundup/backends/indexer_xapian.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ def add_text(self, identifier, text, mime_type='text/plain'):
8282
word = match.group(0)
8383
if self.is_stopword(word):
8484
continue
85-
term = stemmer(word)
85+
term = stemmer(word.lower())
8686
doc.add_posting(term, match.start(0))
8787

8888
database.replace_document(identifier, doc)
@@ -103,7 +103,7 @@ def find(self, wordlist):
103103
for term in [word.upper() for word in wordlist
104104
if self.minlength <= len(word) <= self.maxlength]:
105105
if not self.is_stopword(term):
106-
terms.append(stemmer(term))
106+
terms.append(stemmer(term.lower()))
107107
query = xapian.Query(xapian.Query.OP_AND, terms)
108108

109109
enquire.set_query(query)

0 commit comments

Comments
 (0)