High traffic sites that heavily depend on search can run into server resource issues and cause a sluggish experience for users. A good solution for this is to allow a third-party service to handle search for your site. We recently had a client that needed just that and ElasticSearch was a good fit. 10up’s ElasticPress plugin was used for integration with WordPress.
ElasticSearch provides a wide variety of filters and tokenizers that will fit nearly every project. The ones mentioned below highlight some that were helpful in resolving search issues during the project.
Searching with special characters
ElasticSearch uses tokenizers in custom analyzers for search. The problem is that the standard tokenizer doesn’t generate tokens for punctuation like ampersands. The whitespace tokenizer needs to be used to split tokens by whitespace and preserve punctuation. This can be done by updating ElasticSearch mapping by using ElasticPress filters and WP CLI command. The default mapping can be found includes/mappings.php and using the ElasticPress ep_config_mapping_file filter to return an updated mapping file location. This will fix some special characters, but others such as hyphens and periods will have to be added to the word_deliminator filter:
'wds_es_word_delimiter' => array( 'type' => 'word_delimiter', 'preserve_original' => true, 'type_table' => array( '.' => 'ALPHA', '-' => 'ALPHA', '#' => 'ALPHA', ), ),
Narrowing search results
A more narrowed title based searching was needed because a variety of other searches were also available on the site and title based searches were returning too many results. The default ElasticPress minimum similarity value can be altered via the ep_min_similarity filter. ElasticSearch assigns a similarity based on how close a string matches and using this filter allows results to be removed from results if the similarity was too low. The Ngram filter can also be disabled in ES mapping to remove partial word matching.
For search strings like “NYC” to match “New York City” a synonym filter needs to be used. This will allow ES to convert these strings during search and tokenization. A filter can be added to the mapping and your custom analyzer for cases such as these.
'wds_es_synonym_filter' => array( 'type' => 'synonym', 'synonyms' => array( 'nyc, ny, new york city, new york', 'half, 1/2', 'quarter, 1/4', '\'n, \'n\', n, and, &, &’, ), ),
Duplicate results during pagination
When paginating results with a custom sorter over multiple shards the data can sometimes be returned more than once because of differences in how results with the same value are sorted. A preference query string can be added to the request with a unique value to ensure the same shard is used. ElasticPress has a ep_search_request_path filter that will allow the preference parameter to be added.
ElasticSearch is an excellent solution for sites that depend heavily on search. It’s capable of scaling with your site and is much more efficient than regular WordPress search. ElasticPress can integrate ElasticSearch into your WordPress site seamlessly and has a bunch of helpful hooks, filters, and functions that makes tweaking search easy.