@@ -276,6 +276,173 @@ get the gzip version and not a brotli compressed version. This
276276mechanism allows the admin to allow use of brotli and zstd for
277277dynamic content, but not for static content.
278278
279+ Configuring native-fts Full Text Search
280+ =======================================
281+
282+ Roundup release 2.2.0 can use database native full text search.
283+ SQLite (minimum version 3.9.0) with FTS5 and PostgreSQL (minimum
284+ version 11.0) with websearch_to_tsvector are supported.
285+
286+ To enable this method, change the ``indexer`` setting in the tracker's
287+ config.ini to ``native-fts``. Then reindex using ``roundup-admin -i
288+ tracker_home reindex``. The amount of time it takes to reindex
289+ depends on the amount of data in your tracker, the speed of your
290+ disks, etc. It can take hours.
291+
292+ SQLite details
293+ --------------
294+
295+ The SQLite native-fts changes the full text search query a little bit.
296+ For the other search methods, the search terms are split on white
297+ space and each item in the index: a field (e.g. title), message
298+ content and file content is searched for all the terms. If any term is
299+ missing that item is ignored. Once the items are found they are mapped
300+ to an issue and the normal issue index is displayed.
301+
302+ When using FTS5, the search terms can use the full text search query
303+ language described at:
304+ https://www.sqlite.org/fts5.html#full_text_query_syntax. This
305+ supports:
306+
307+ * plain word search (joined with and similar to other search methods)
308+ * phrase search with terms enclosed in ``"`` quotes
309+ * proximity search with varying distances using ``NEAR()``
310+ * boolean operations by grouping with parentheses and using ``AND``
311+ and ``OR``
312+ * exclusion using ``NOT``
313+ * prefix searching by prefixing the term with``^``
314+
315+ All of the data that is indexed is in a single column, so when column
316+ specifiers are used they usually result in an error which is detected
317+ and an enhanced error message is produced.
318+
319+ Unlike the native, xapian and whoosh indexers, there are no stopwords,
320+ and there is no limit to the length of terms that are indexed. Keeping
321+ these would break proximity and phrase searching. This may be helpful
322+ or problematic for your particular tracker.
323+
324+ To support the most languages available, the unicode61 tokenizer is
325+ used without porter stemming. Using the ``indexer_language`` setting
326+ to enable stemming for ``english`` is not available in this
327+ implementation. Also ranking information is not used in this
328+ implementation. These are areas for improvement.
329+
330+ PostgreSQL info
331+ ---------------
332+
333+ The PostgreSQL native-fts changes the full text search query a little
334+ bit. When using PostgreSQL full text search, two different query
335+ languages are supported.
336+
337+ 1. websearch - described at the end of
338+ `Parsing Queries`_ under websearch_to_tsquery. This is the default.
339+
340+ 2. tsquery - described at the beginning of `Parsing Queries`_ with
341+ to_tsquery. It is enabled by starting the search phrase with ``ts:``.
342+
343+ .. _Parsing Queries: \
344+ https://www.postgresql.org/docs/14/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
345+
346+ Websearch provides a more natural style of search and supports:
347+
348+ * plain word search (stemmed in most cases)
349+ * phrase search with terms enclosed in quotes
350+ * exclusion by prefixing a term/phrase with ``-``
351+ * alternative/or searching with ``or`` between terms
352+ * ignores non-word characters including punctuation
353+
354+ Tsquery supports:
355+
356+ * a strict query syntax
357+ * plain word search
358+ * phrase search with the ``<->`` operator or enclosing the phrase in
359+ ``'`` single quotes (note this will use a stemmer on the terms
360+ in the phrase).
361+ * proximity search with varying distances using ``<N>``
362+ * boolean operations by grouping with parentheses and using ``&``
363+ and ``|``
364+ * exclusion using ``!``
365+ * prefix searching using ``:*`` at the end of the prefix
366+
367+ All of the data that is indexed is in a single column and input
368+ weighing is not used.
369+
370+ Depending on the FTS configuration (determined by the
371+ ``indexer_language`` setting), stopwords are supported. PostgreSQL
372+ takes the stopwords into account when calculating the data needed for
373+ proximity/near searches. Like SQLite FTS, there is no limit to the
374+ length of terms that are indexed. Again this may be helpful or
375+ problematic for your particular tracker.
376+
377+ The config.ini ``indexer_language`` setting is used to define the
378+ configuration used for indexing. For example with the default
379+ ``english`` setting a snowball stemmer (english_stem) is used. So
380+ words like 'drive' and 'driving' and 'drive-in' will all match a
381+ search for 'drive' but will not match 'driver'.
382+
383+ The indexer_language is used as the configuration name for every call
384+ to the text search functions (to_tsvector, to_tsquery). Changing this
385+ requires reindexing.
386+
387+ The `configuration list can be obtained using using psql's`_
388+ ``\dF`` command.
389+
390+ .. _configuration list can be obtained using using psql's: \
391+ https://www.postgresql.org/docs/current/textsearch-psql.html
392+
393+ Roundup includes a hardcoded list for all languages supported by
394+ PostgreSQL 14.1. The list includes 5 custom "languages"
395+ ``custom1`` ... ``custom5`` to allow you to set up your `own textsearch
396+ configuration`_ using one of the custom names. Depending on your
397+ PostgreSQL version, we may allow an invalid language to be configured.
398+ You will see an error about ``text search configuration ... does not
399+ exist``.
400+
401+ .. _own textsearch configuration: \
402+ https://www.postgresql.org/docs/14/textsearch-configuration.html
403+
404+ It may be possible to append to this list using the tracker's
405+ interfaces.py. For details, see ``test/test_indexer.py`` in the
406+ roundup distribution and search for ``valid_langs``. If you succeed
407+ please email roundup-users AT lists.sourceforge.net with a description
408+ of your success.
409+
410+ After changing the configuration language, you must reindex the
411+ tracker since the index must match the configuration language used for
412+ querying.
413+
414+ Also there are various `dictionaries`_ that allow you to:
415+
416+ * add stopwords
417+ * override stemming for a term
418+ * add synonyms (e.g. a search for "pg" can also match 'psql'
419+ "postgresql")
420+ * add terms that expand/contract the search space (Thesaurus
421+ dictionary)
422+ * additional transforms
423+
424+ .. _dictionaries: https://www.postgresql.org/docs/14/textsearch-dictionaries.html
425+
426+ Use of these is beyond this documentation. Please visit the
427+ appropriate PostgreSQL documents.
428+
429+ Ranking information is not used in this implementation. Also stop
430+ words set in config.ini are ignored. These are areas for improvement.
431+
432+ Cleaning up old native indexes
433+ ------------------------------
434+
435+ If you are happy with the database fts indexing, you can save some space by
436+ removing the data from the native text indexing tables. This requires
437+ using the ``sqlite3`` or ``psql`` commands to execute SQL to delete the
438+ rows in the ``__textids`` and ``__words`` tables. You can do this with
439+ the following SQL commands::
440+
441+ delete from __words;
442+ delete from __textids;
443+
444+ Note this deletes data from the tables and does *not* delete the
445+ table.
279446
280447Users and Security
281448==================
0 commit comments