Skip to content

Commit ca28160

Browse files
committed
Allow Roundup to use PostgreSQL database native full text search
back_postgreql.py - schema version changes for schema version 7. configuration.py - added indexer_language checks for postgresql. Hardcoded list for now. Docs admin_guide and upgrading Tests. This also restructures the version upgrade tests for the rdbms backends. They can run all of them now as the proper cascade is developed to roll back changes to version 6.
1 parent 2c6c932 commit ca28160

File tree

11 files changed

+699
-45
lines changed

11 files changed

+699
-45
lines changed

doc/admin_guide.txt

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -276,6 +276,173 @@ get the gzip version and not a brotli compressed version. This
276276
mechanism allows the admin to allow use of brotli and zstd for
277277
dynamic content, but not for static content.
278278

279+
Configuring native-fts Full Text Search
280+
=======================================
281+
282+
Roundup release 2.2.0 can use database native full text search.
283+
SQLite (minimum version 3.9.0) with FTS5 and PostgreSQL (minimum
284+
version 11.0) with websearch_to_tsvector are supported.
285+
286+
To enable this method, change the ``indexer`` setting in the tracker's
287+
config.ini to ``native-fts``. Then reindex using ``roundup-admin -i
288+
tracker_home reindex``. The amount of time it takes to reindex
289+
depends on the amount of data in your tracker, the speed of your
290+
disks, etc. It can take hours.
291+
292+
SQLite details
293+
--------------
294+
295+
The SQLite native-fts changes the full text search query a little bit.
296+
For the other search methods, the search terms are split on white
297+
space and each item in the index: a field (e.g. title), message
298+
content and file content is searched for all the terms. If any term is
299+
missing that item is ignored. Once the items are found they are mapped
300+
to an issue and the normal issue index is displayed.
301+
302+
When using FTS5, the search terms can use the full text search query
303+
language described at:
304+
https://www.sqlite.org/fts5.html#full_text_query_syntax. This
305+
supports:
306+
307+
* plain word search (joined with and similar to other search methods)
308+
* phrase search with terms enclosed in ``"`` quotes
309+
* proximity search with varying distances using ``NEAR()``
310+
* boolean operations by grouping with parentheses and using ``AND``
311+
and ``OR``
312+
* exclusion using ``NOT``
313+
* prefix searching by prefixing the term with``^``
314+
315+
All of the data that is indexed is in a single column, so when column
316+
specifiers are used they usually result in an error which is detected
317+
and an enhanced error message is produced.
318+
319+
Unlike the native, xapian and whoosh indexers, there are no stopwords,
320+
and there is no limit to the length of terms that are indexed. Keeping
321+
these would break proximity and phrase searching. This may be helpful
322+
or problematic for your particular tracker.
323+
324+
To support the most languages available, the unicode61 tokenizer is
325+
used without porter stemming. Using the ``indexer_language`` setting
326+
to enable stemming for ``english`` is not available in this
327+
implementation. Also ranking information is not used in this
328+
implementation. These are areas for improvement.
329+
330+
PostgreSQL info
331+
---------------
332+
333+
The PostgreSQL native-fts changes the full text search query a little
334+
bit. When using PostgreSQL full text search, two different query
335+
languages are supported.
336+
337+
1. websearch - described at the end of
338+
`Parsing Queries`_ under websearch_to_tsquery. This is the default.
339+
340+
2. tsquery - described at the beginning of `Parsing Queries`_ with
341+
to_tsquery. It is enabled by starting the search phrase with ``ts:``.
342+
343+
.. _Parsing Queries: \
344+
https://www.postgresql.org/docs/14/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
345+
346+
Websearch provides a more natural style of search and supports:
347+
348+
* plain word search (stemmed in most cases)
349+
* phrase search with terms enclosed in quotes
350+
* exclusion by prefixing a term/phrase with ``-``
351+
* alternative/or searching with ``or`` between terms
352+
* ignores non-word characters including punctuation
353+
354+
Tsquery supports:
355+
356+
* a strict query syntax
357+
* plain word search
358+
* phrase search with the ``<->`` operator or enclosing the phrase in
359+
``'`` single quotes (note this will use a stemmer on the terms
360+
in the phrase).
361+
* proximity search with varying distances using ``<N>``
362+
* boolean operations by grouping with parentheses and using ``&``
363+
and ``|``
364+
* exclusion using ``!``
365+
* prefix searching using ``:*`` at the end of the prefix
366+
367+
All of the data that is indexed is in a single column and input
368+
weighing is not used.
369+
370+
Depending on the FTS configuration (determined by the
371+
``indexer_language`` setting), stopwords are supported. PostgreSQL
372+
takes the stopwords into account when calculating the data needed for
373+
proximity/near searches. Like SQLite FTS, there is no limit to the
374+
length of terms that are indexed. Again this may be helpful or
375+
problematic for your particular tracker.
376+
377+
The config.ini ``indexer_language`` setting is used to define the
378+
configuration used for indexing. For example with the default
379+
``english`` setting a snowball stemmer (english_stem) is used. So
380+
words like 'drive' and 'driving' and 'drive-in' will all match a
381+
search for 'drive' but will not match 'driver'.
382+
383+
The indexer_language is used as the configuration name for every call
384+
to the text search functions (to_tsvector, to_tsquery). Changing this
385+
requires reindexing.
386+
387+
The `configuration list can be obtained using using psql's`_
388+
``\dF`` command.
389+
390+
.. _configuration list can be obtained using using psql's: \
391+
https://www.postgresql.org/docs/current/textsearch-psql.html
392+
393+
Roundup includes a hardcoded list for all languages supported by
394+
PostgreSQL 14.1. The list includes 5 custom "languages"
395+
``custom1`` ... ``custom5`` to allow you to set up your `own textsearch
396+
configuration`_ using one of the custom names. Depending on your
397+
PostgreSQL version, we may allow an invalid language to be configured.
398+
You will see an error about ``text search configuration ... does not
399+
exist``.
400+
401+
.. _own textsearch configuration: \
402+
https://www.postgresql.org/docs/14/textsearch-configuration.html
403+
404+
It may be possible to append to this list using the tracker's
405+
interfaces.py. For details, see ``test/test_indexer.py`` in the
406+
roundup distribution and search for ``valid_langs``. If you succeed
407+
please email roundup-users AT lists.sourceforge.net with a description
408+
of your success.
409+
410+
After changing the configuration language, you must reindex the
411+
tracker since the index must match the configuration language used for
412+
querying.
413+
414+
Also there are various `dictionaries`_ that allow you to:
415+
416+
* add stopwords
417+
* override stemming for a term
418+
* add synonyms (e.g. a search for "pg" can also match 'psql'
419+
"postgresql")
420+
* add terms that expand/contract the search space (Thesaurus
421+
dictionary)
422+
* additional transforms
423+
424+
.. _dictionaries: https://www.postgresql.org/docs/14/textsearch-dictionaries.html
425+
426+
Use of these is beyond this documentation. Please visit the
427+
appropriate PostgreSQL documents.
428+
429+
Ranking information is not used in this implementation. Also stop
430+
words set in config.ini are ignored. These are areas for improvement.
431+
432+
Cleaning up old native indexes
433+
------------------------------
434+
435+
If you are happy with the database fts indexing, you can save some space by
436+
removing the data from the native text indexing tables. This requires
437+
using the ``sqlite3`` or ``psql`` commands to execute SQL to delete the
438+
rows in the ``__textids`` and ``__words`` tables. You can do this with
439+
the following SQL commands::
440+
441+
delete from __words;
442+
delete from __textids;
443+
444+
Note this deletes data from the tables and does *not* delete the
445+
table.
279446

280447
Users and Security
281448
==================

doc/upgrading.txt

Lines changed: 32 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -94,44 +94,48 @@ want to merge the search form and footer into your template.
9494
Enhanced full-text search (optional)
9595
------------------------------------
9696

97-
SQLite's `FTS5 full-text search engine`_ is available. It should work
98-
with any recent sqlite3 installation. Any full text search field will
99-
allow searching using the MATCH query format described at:
100-
https://www.sqlite.org/fts5.html#full_text_query_syntax. A list of
101-
words behaves almost the same as the default text search
97+
SQLite's `FTS5 full-text search engine`_ is available as is
98+
`PostgreSQL's full text search`_. Both require a schema upgrade so you
99+
should run::
100+
101+
roundup-admin -i tracker_home migrate
102+
103+
to create FTS specific tables before restarting the roundup-web or
104+
email interfaces.
105+
106+
SQLite 3.9.0+ or PostgreSQL 11.0+ are required to use this feature.
107+
When using SQLite, all full text search fields will allow searching
108+
using the MATCH query format described at:
109+
https://www.sqlite.org/fts5.html#full_text_query_syntax. When using
110+
PostgreSQL either the websearch_to_tsquery or to_tsquery formats
111+
described on
112+
https://www.postgresql.org/docs/14/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
113+
can be used. The default is websearch. Prefixing the search with
114+
``ts:`` enables tsquery mode.
115+
116+
A list of words behaves almost the same as the default text search
102117
(`native`). So the search string `fts search` will find all issues
103118
that have both of those words (an AND search) in a text-field (like
104119
title) or in a message (or file) attached to the issue.
105120

106-
One thing to note is that the fts search does not ignore words longer
107-
than 25 characters or less than 2 characters. Also it does not filter
108-
out common works (i.e. there is no stoplist). So words like "and",
109-
"or", "then", "with" ... are included in the FTS5 search. The native
110-
search applies both filters.
121+
One thing to note is that native-fts searches do not ignore words
122+
longer than 25 characters or less than 2 characters. Also SQLite does
123+
not filter out common works (i.e. there is no stopword list). So words
124+
like "and", "or", "then", "with" ... are included in the FTS5 search.
111125

112-
Using SQLite FTS requires a schema change so you should run
113-
``roundup-admin -i tracker_home migrate`` as the FTS specific tables
114-
need to be created.
115-
116-
Then you must explicitly enable it by changing the ``indexer`` setting
117-
in ``config.ini`` to ``native-fts``. Native-fts is never chosen by default
118-
like xapian or whoosh. This prevents the existing native indexing from
119-
being discarded if ``indexer`` is not set.
126+
You must explicitly enable this search mechanism by changing the
127+
``indexer`` setting in ``config.ini`` to ``native-fts``. Native-fts is
128+
never chosen by default like xapian or whoosh. This prevents the
129+
existing native indexing from being discarded if ``indexer`` is not
130+
set.
120131

121132
Next re-index your data with ``roundup-admin -i tracker_home
122133
reindex``. This can take a while depending on the size of the tracker.
123134

124135
You may want to update your ``config.ini`` by following the directions
125136
above to get the latest documentation.
126137

127-
If you are happy with the fts indexing, you can save some space by
128-
removing the data from the native text indexing tables. This requires
129-
using the ``sqlite3`` command to delete the rows in the ``__textids`` and
130-
``__words`` tables. You can do this with the following sqlite3
131-
commands::
132-
133-
delete from __words;
134-
delete from __textids;
138+
See the `administration guide notes on native-fts`_ for further details.
135139

136140
Adding error reporting templates (optional)
137141
-------------------------------------------
@@ -3362,4 +3366,6 @@ copy.
33623366
.. _`administration guide`: admin_guide.html
33633367
.. _`xmlrpc guide`: xmlrpc.html
33643368
.. _FTS5 full-text search engine: https://www.sqlite.org/fts5.html
3369+
.. _PostgreSQL's full text search: https://www.postgresql.org/docs/current/textsearch.html
3370+
.. _`administration guide notes on native-fts`: admin_guide.html#configuring-native-fts-full-text-search
33653371
.. _Configuring Compression: admin_guide.html#configuring-compression

roundup/backends/back_postgresql.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,8 @@ def open_connection(self):
200200
# Need to commit here, otherwise otk/session will not find
201201
# the necessary tables (in a parallel connection!)
202202
self.commit()
203+
self._add_fts_table()
204+
self.commit()
203205

204206
def checkpoint_data(self):
205207
"""Commit the state of the database. Allows recovery/retry
@@ -264,6 +266,13 @@ def fix_version_3_tables(self):
264266
self.sql('''CREATE INDEX words_both_idx ON public.__words
265267
USING btree (_word, _textid)''')
266268

269+
def _add_fts_table(self):
270+
self.sql('CREATE TABLE __fts (_class VARCHAR(255), '
271+
'_itemid VARCHAR(255), _prop VARCHAR(255), _tsv tsvector)'
272+
)
273+
274+
self.sql('CREATE INDEX __fts_idx ON __fts USING GIN (_tsv)')
275+
267276
def fix_version_6_tables(self):
268277
# Modify length for __words._word column.
269278
c = self.cursor
@@ -273,6 +282,8 @@ def fix_version_6_tables(self):
273282
# column length and maxlength.
274283
c.execute(sql, (self.indexer.maxlength + 5,))
275284

285+
self._add_fts_table()
286+
276287
def add_actor_column(self):
277288
# update existing tables to have the new actor column
278289
tables = self.database_schema['tables']

0 commit comments

Comments
 (0)