|
| 1 | +--- |
| 2 | +myst: |
| 3 | + html_meta: |
| 4 | + "description": "" |
| 5 | + "property=og:description": "" |
| 6 | + "property=og:title": "" |
| 7 | + "keywords": "" |
| 8 | +--- |
| 9 | + |
| 10 | +# Catalog indexing strategies |
| 11 | + |
| 12 | +You may have two different interests in regard to indexing your custom content type objects: |
| 13 | + |
| 14 | +- Making particular fields searchable via Plone's main search facility; |
| 15 | +- Indexing particular fields for custom lookup. |
| 16 | + |
| 17 | +## Making content searchable |
| 18 | + |
| 19 | +Plone's main index is called *SearchableText*. This is the index which is searched when you use the main portal search. Fields in your custom content types are not necessarily added to SearchableText. Fields added via Dublin-core behaviors are automatically part of SearchableText; others are not. |
| 20 | + |
| 21 | +So, you may need to explicitly add fields to SearchableText if you wish their information to be findable via the main search. There are all sorts of highly customizable ways to do this, but the easiest is to use the the behavior `plone.textindexer` that is shipped with plone.app.dexterity. |
| 22 | + |
| 23 | +It allows you to easily add fields to SearchableText. Once you turn on this behavior, you will then need to specify fields for addition to SearchableText. |
| 24 | + |
| 25 | +:::{note} |
| 26 | +Note that if you turn on the `Full-Text Indexing` behavior for a content type, then you must specify all fields that need SearchableText indexing. Dublin core fields like Title and Description are no longer automatically handled. |
| 27 | +::: |
| 28 | + |
| 29 | +Once you have turned on the indexer behavior, edit the XML field model to add `indexer:searchable="true"` to the `field` tag for each field you wish to add to the SearchableText index. |
| 30 | + |
| 31 | +See the [/backend/indexing](https://6-dev-docs.plone.org/backend/indexing) package documentation for details and for information on how to use it via Python schema. |
| 32 | + |
| 33 | +## Creating and using custom indexes |
| 34 | + |
| 35 | +**How to create custom catalog indexes** |
| 36 | + |
| 37 | +The ZODB is a hierarchical object store where objects of different schemata and sizes can live side by side. |
| 38 | +This is great for managing individual content items, but not optimal for searching across the content repository. |
| 39 | +A naive search would need to walk the entire object graph, loading each object into memory and comparing object metadata with search criteria. |
| 40 | +On a large site, this would quickly become prohibitive. |
| 41 | + |
| 42 | +Luckily, Zope comes with a technology called the *ZCatalog*, which is basically a table structure optimised for searching. |
| 43 | +In Plone, there’s a ZCatalog instance called `portal_catalog`. |
| 44 | +Standard event handlers will index content in the catalog when it is created or modified, and unindex when the content is removed. |
| 45 | + |
| 46 | +The catalog manages *indexes*, which can be searched, and *metadata* (also known as *columns*), which are object attributes for which the value is copied into the catalog. |
| 47 | +When we perform a search, the result is a lazily loaded list of objects known as *catalog brains*. |
| 48 | +Catalog brains contain the value of metadata columns (but not indexes) as attributes. |
| 49 | +The functions `getURL()`, `getPath()` and `getObject()` can be used to get the URL and path of the indexed content item, and to load the full item into memory. |
| 50 | + |
| 51 | +:::{note} |
| 52 | +Dexterity objects are more lightweight than Archetypes objects. |
| 53 | +This means that loading objects into memory is not quite as undesirable as is sometimes assumed. |
| 54 | +If you’re working with references, parent objects, or a small number of child objects, it is usually OK to load objects directly to work with them. |
| 55 | +However, if you are working with a large or unknown-but-potentially-large number of objects, you should consider using catalog searches to find them and use catalog metadata to store frequently used values. |
| 56 | +There is an important trade-off to be made between limiting object access and bloating the catalog with unneeded indexes and metadata, though. |
| 57 | +In particular, large strings (such as the body text of a document) or binary data (such as the contents of image or file fields) should not be stored as catalog metadata. |
| 58 | +::: |
| 59 | + |
| 60 | +Plone comes with a number of standard indexes and metadata columns. |
| 61 | +These correspond to much of the *Dublin Core* set of metadata as well as several Plone-specific attributes. |
| 62 | +You can view the indexes, columns and the contents of the catalog through the ZMI pages of the `portal_catalog` tool. |
| 63 | +If you’ve never done this, it is probably instructive to have a look, both to understand how the indexes and columns may apply to your own content types, and to learn what searches are already possible. |
| 64 | + |
| 65 | +Indexes come in various types. The most common ones are: |
| 66 | + |
| 67 | +`FieldIndex` |
| 68 | + |
| 69 | +: the most common type, used to index a single value. |
| 70 | + |
| 71 | +`KeywordIndex` |
| 72 | + |
| 73 | +: used to index lists of values where you want to be able to search for a subset of the values. |
| 74 | + As the name implies, commonly used for keyword fields, such as the `Subject` Dublin Core metadata field. |
| 75 | + |
| 76 | +`DateIndex` |
| 77 | + |
| 78 | +: used to index Zope 2 `DateTime` objects. |
| 79 | + Note that if your type uses a *Python* `datetime` object, you’ll need to convert it to a Zope 2 `DateTime` using a custom indexer! |
| 80 | + |
| 81 | +`DateRangeIndex` |
| 82 | + |
| 83 | +: used mainly for the effective date range. |
| 84 | + |
| 85 | +`ZCTextIndex` |
| 86 | + |
| 87 | +: used mainly for the `SearchableText` index. |
| 88 | + This is the index used for full-text search. |
| 89 | + |
| 90 | +`ExtendedPathIndex` |
| 91 | + |
| 92 | +: a variant of `PathIndex`, which is used for the `path` index. |
| 93 | + This is used to search for content by path and optionally depth. |
| 94 | + |
| 95 | +### Adding new indexes and metadata columns |
| 96 | + |
| 97 | +When an object is indexed, the catalog will by default attempt to find attributes and methods that match index and column names on the object. Methods will be called (with no arguments) in an attempt to get a value. |
| 98 | +If a value is found, it is indexed. |
| 99 | + |
| 100 | +:::{note} |
| 101 | +Objects are normally acquisition-wrapped when they are indexed, which means that an indexed value may be acquired from a parent. |
| 102 | +This can be confusing, especially if you are building container types and creating new indexes for them. |
| 103 | +If child objects don’t have attributes/methods with names corresponding to indexes, the parent object’s value will be indexed for all children as well. |
| 104 | +::: |
| 105 | + |
| 106 | +Catalog indexes and metadata can be installed with the `catalog.xml` GenericSetup import step. It is useful to look at the one in Plone (`parts/omelette/Products/CMFPlone/profiles/default/catalog.xml`). |
| 107 | + |
| 108 | +As an example, let’s index the `track` property of a `Session` in the catalog, and add a metadata column for this property as well. In |
| 109 | +`profiles/default/catalog.xml`, we have: |
| 110 | + |
| 111 | +```xml |
| 112 | +<?xml version="1.0"?> |
| 113 | +<object name="portal_catalog"> |
| 114 | + <index name="track" meta_type="FieldIndex"> |
| 115 | + <indexed_attr value="track"/> |
| 116 | + </index> |
| 117 | + <column value="track"/> |
| 118 | +</object> |
| 119 | +``` |
| 120 | + |
| 121 | +Notice how we specify both the index name and the indexed attribute. |
| 122 | +It is possible to use an index name (the key you use when searching) that is different to the indexed attribute, although they are usually the same. |
| 123 | +The metadata column is just the name of an attribute. |
| 124 | + |
| 125 | +### Creating custom indexers |
| 126 | + |
| 127 | +Indexing based on attributes can sometimes be limiting. |
| 128 | +First of all, the catalog is indiscriminate in that it attempts to index every attribute that’s listed against an index or metadata column for every object. |
| 129 | +Secondly, it is not always feasible to add a method or attribute to a class just to calculate an indexed value. |
| 130 | + |
| 131 | +Plone 3.3 and later ships with a package called [plone.indexer] to help make it easier to write custom indexers: |
| 132 | +components that are invoked to calculate the value which the catalog sees when it tries to index a given attribute. |
| 133 | +Indexers can be used to index a different value to the one stored on the object, or to allow indexing of a “virtual” attribute that does not actually exist on the object in question. |
| 134 | +Indexers are usually registered on a per-type basis, so you can have different implementations for different types of content. |
| 135 | + |
| 136 | +To illustrate indexers, we will add three indexers to `program.py`. |
| 137 | +Two will provide values for the `start` and `end` indexes, normally used by Plone’s `Event` type. |
| 138 | +We actually have attributes with the correct name for these already, but they use Python `datetime` objects whereas the `DateIndex` requires a |
| 139 | +Zope 2 `DateTime.DateTime` object. |
| 140 | +(Python didn’t have a `datetime` module when this part of Zope was created!) |
| 141 | +The third indexer will be used to provide a value for the `Subject` index that takes its value from the `tracks` list. |
| 142 | + |
| 143 | +```python |
| 144 | +from DateTime import DateTime |
| 145 | +from plone.indexer import indexer |
| 146 | +... |
| 147 | + |
| 148 | +@indexer(IProgram) |
| 149 | +def startIndexer(obj): |
| 150 | + if obj.start is None: |
| 151 | + return None |
| 152 | + return DateTime(obj.start.isoformat()) |
| 153 | + |
| 154 | +@indexer(IProgram) |
| 155 | +def endIndexer(obj): |
| 156 | + if obj.end is None: |
| 157 | + return None |
| 158 | + return DateTime(obj.end.isoformat()) |
| 159 | + |
| 160 | +@indexer(IProgram) |
| 161 | +def tracksIndexer(obj): |
| 162 | + return obj.tracks |
| 163 | +``` |
| 164 | + |
| 165 | +And we need to register the indexers in ZCML: |
| 166 | + |
| 167 | +```xml |
| 168 | +<adapter factory=".indexers.startIndexer" name="start" /> |
| 169 | +<adapter factory=".indexers.endIndexer" name="end" /> |
| 170 | +<adapter factory=".indexers.tracksIndexer" name="Subject" /> |
| 171 | +``` |
| 172 | + |
| 173 | +Here, we use the `@indexer` decorator to create an indexer. |
| 174 | +This doesn’t register the indexer component, though, so we need to use ZCML to finalise the registration. |
| 175 | +Crucially, this is where the indexer’s `name` is defined. |
| 176 | +This is the name of the indexed attribute for which the indexer is providing a value. |
| 177 | + |
| 178 | +:::{note} |
| 179 | +Since all of these indexes are part of a standard Plone installation, we won’t register them in `catalog.xml`. |
| 180 | +If you are creating custom indexers and need to add new catalog indexes or columns for them, remember that the “indexed attribute” name (and the column name) must match the name of the indexer as set in its adapter registration. |
| 181 | +::: |
| 182 | + |
| 183 | +### Searching using your indexes |
| 184 | + |
| 185 | +Once we have registered our indexers and re-installed our product (to ensure that the `catalog.xml` import step is allowed to install new indexes in the catalog), we can use our new indexes just like we would any of the default indexes. |
| 186 | + |
| 187 | +The pattern is always the same: |
| 188 | + |
| 189 | +```python |
| 190 | +from plone import api |
| 191 | +# get the tool |
| 192 | +catalog = api.portal.get_tool(name='portal_catalog') |
| 193 | +# execute a search |
| 194 | +results = catalog(track='Track 1') |
| 195 | +# examine the results |
| 196 | +for brain in results: |
| 197 | + start = brain.start |
| 198 | + url = brain.getURL() |
| 199 | + obj = brain.getObject() # Performance hit! |
| 200 | +``` |
| 201 | + |
| 202 | +This shows a simple search using the `portal_catalog` tool, which we look up from some context object. |
| 203 | +We call the tool to perform a search, passing search criteria as keyword arguments, where the left hand side refers to an installed index and the right hand side is the search term. |
| 204 | + |
| 205 | +Some of the more commonly used indexes are: |
| 206 | + |
| 207 | +`Title` |
| 208 | + |
| 209 | +: the object’s title. |
| 210 | + |
| 211 | +`Description` |
| 212 | + |
| 213 | +: the object’s description. |
| 214 | + |
| 215 | +`path` |
| 216 | + |
| 217 | +: the object’s path. The argument is a string like `/foo/bar`. |
| 218 | + To get the path of an object (e.g. a parent folder), do |
| 219 | + `'/'.join(folder.getPhysicalPath())`. |
| 220 | + Searching for an object’s path will return the object and any children. |
| 221 | + To depth-limit the search, e.g. to get only those 1 level deep, |
| 222 | + use a compound query, e.g. |
| 223 | + `path={'query': '/'.join(folder.getPhysicalPath()), 'depth': 1}`. |
| 224 | + If a depth is specified, the object at the given path is not returned |
| 225 | + (but any children within the depth limit are). |
| 226 | + |
| 227 | +`object_provides` |
| 228 | + |
| 229 | +: used to match interfaces provided by the object. |
| 230 | + The argument is an interface name or list of interface names (of |
| 231 | + which any one may match). |
| 232 | + To get the name of a given interface, you can call |
| 233 | + `ISomeInterface.__identifier__`. |
| 234 | + |
| 235 | +`portal_type` |
| 236 | + |
| 237 | +: used to match the portal type. |
| 238 | + Note that users can rename portal types, |
| 239 | + so it is often better not to hardcode these. |
| 240 | + Often, using an `object_provides` search for a type-specific |
| 241 | + interface will be better. |
| 242 | + Conversely, if you are asking the user to select a particular type to |
| 243 | + search for, then they should be choosing from the currently installed |
| 244 | + `portal_types`. |
| 245 | + |
| 246 | +`SearchableText` |
| 247 | + |
| 248 | +: used for full-text searches. |
| 249 | + This supports operands like `AND` and `OR` in the search string. |
| 250 | + |
| 251 | +`Creator` |
| 252 | + |
| 253 | +: the username of the creator of a content item. |
| 254 | + |
| 255 | +`Subject` |
| 256 | + |
| 257 | +: a `KeywordIndex` of object keywords. |
| 258 | + |
| 259 | +`review_state` |
| 260 | + |
| 261 | +: an object’s workflow state. |
| 262 | + |
| 263 | +In addition, the search results can be sorted based on any `FieldIndex`, |
| 264 | +`KeywordIndex` or `DateIndex` using the following keyword arguments: |
| 265 | + |
| 266 | +- Use `sort_on='<index name>'` to sort on a particular index. |
| 267 | + For example, `sort_on='sortable_title'` will produce a sensible title-based sort. |
| 268 | + `sort_on='Date'` will sort on the publication date, or the creation date if this is not set. |
| 269 | +- Add `sort_order='reverse'` to sort in reverse. |
| 270 | + The default is `sort_order='ascending'`. |
| 271 | + `'descending'` can be used as an alias for `'reverse'`. |
| 272 | +- Add `sort_limit=10` to limit to approximately 10 search results. |
| 273 | + Note that it is possible to get more results due to index optimisations. |
| 274 | + Use a list slice on the catalog search results to be absolutely sure that you have got the maximum number of results, e.g. |
| 275 | + `results = catalog(…, sort_limit=10)[:10]`. |
| 276 | + Also note that the use of `sort_limit` requires a `sort_on` as well. |
| 277 | + |
| 278 | +Some of the more commonly used metadata columns are: |
| 279 | + |
| 280 | +*Creator* |
| 281 | + |
| 282 | +: the user who created the content object. |
| 283 | + |
| 284 | +*Date* |
| 285 | + |
| 286 | +: the publication date or creation date, whichever is later. |
| 287 | + |
| 288 | +*Title* |
| 289 | + |
| 290 | +: the object’s title. |
| 291 | + |
| 292 | +*Description* |
| 293 | + |
| 294 | +: the object’s description. |
| 295 | + |
| 296 | +*getId* |
| 297 | + |
| 298 | +: the object’s id (note that this is an attribute, not a function). |
| 299 | + |
| 300 | +*review_state* |
| 301 | + |
| 302 | +: the object’s workflow state. |
| 303 | + |
| 304 | +*portal_type* |
| 305 | + |
| 306 | +: the object’s portal type. |
| 307 | + |
| 308 | +For more information about catalog indexes and searching, see the |
| 309 | +[ZCatalog chapter in the Zope 2 book]. |
| 310 | + |
| 311 | +#### How to setup the index TTW: |
| 312 | + |
| 313 | +Now that the fields are index-able, we need to create the index itself. |
| 314 | + |
| 315 | +- Go to the Zope Management Interface |
| 316 | +- Go on 'portal_catalog' |
| 317 | +- Click 'Indexes' tab |
| 318 | +- There's a drop down menu to the top right to let you choose what type of index to add - if you are using a plain text string field you would select 'FieldIndex' |
| 319 | +- As the 'id' put in the programmatical name of your Dexterity type field that you want to index |
| 320 | +- Hit OK, tick your new index and click 'Reindex' |
| 321 | + |
| 322 | +You should now see content being indexed. |
| 323 | + |
| 324 | +See the {doc}`documentation </develop/plone/searching_and_indexing/indexing>` for further information |
| 325 | + |
| 326 | +[plone.indexer]: http://pypi.python.org/pypi/plone.indexer |
| 327 | +[zcatalog chapter in the zope 2 book]: https://zope.readthedocs.io/en/latest/zopebook/SearchingZCatalog.html |
0 commit comments