Emulate a SQL LIKE search with ElasticSearch -
i'm beginning elasticsearch , trying implement autocomplete feature based on it.
i have autocomplete
index field city
of type string
. here's example of document stored index:
{ "_index":"autocomplete_1435797593949", "_type":"listing", "_id":"40716", "_source":{ "city":"rome", "tags":[ "listings" ] } }
the analyse configuration looks this:
{ "analyzer":{ "autocomplete_term":{ "tokenizer":"autocomplete_edge", "filter":[ "lowercase" ] }, "autocomplete_search":{ "tokenizer":"keyword", "filter":[ "lowercase" ] } }, "tokenizer":{ "autocomplete_edge":{ "type":"ngram", "min_gram":1, "max_gram":100 } } }
the mappings:
{ "autocomplete_1435795884170":{ "mappings":{ "listing":{ "properties":{ "city":{ "type":"string", "analyzer":"autocomplete_term" }, } } } } }
i'm sending following query es:
{ "query":{ "multi_match":{ "query":"rio", "analyzer":"autocomplete_search", "fields":[ "city" ] } } }
as result, following:
{ "took":2, "timed_out":false, "_shards":{ "total":5, "successful":5, "failed":0 }, "hits":{ "total":1, "max_score":2.7742395, "hits":[ { "_index":"autocomplete_1435795884170", "_type":"listing", "_id":"53581", "_score":2.7742395, "_source":{ "city":"rio", "tags":[ "listings" ] } } ] } }
for part, works. find document city = "rio"
before user has type whole word ("ri"
enough).
and here lies problem. want return "rio de janeiro"
, too. "rio de janeiro"
, need send following query:
{ "query":{ "multi_match":{ "query":"rio d", "analyzer":"standard", "fields":[ "city" ] } } }
notice "<whitespace>d"
there.
another related problem i'd expect @ least cities start "r"
returned following query:
{ "query":{ "multi_match":{ "query":"r", "analyzer":"standard", "fields":[ "city" ] } } }
i'd expect "rome"
, etc... (which document exists in index), however, "rio"
, again. behave sql like
condition, i.e ... 'cityname%'
.
what doing wrong?
i this:
- change tokenizer
edge_ngram
since said needlike 'cityname%'
(meaning prefix match):
"tokenizer": { "autocomplete_edge": { "type": "edge_ngram", "min_gram": 1, "max_gram": 100 } }
- have field specify
autocomplete_search
search_analyzer
. think it's choice havekeyword
,lowercase
:
"mappings": { "listing": { "properties": { "city": { "type": "string", "index_analyzer": "autocomplete_term", "search_analyzer": "autocomplete_search" } } } }
- and query simple as:
{ "query": { "multi_match": { "query": "r", "fields": [ "city" ] } } }
the detailed explanation goes this: split city names in edge ngrams. example, rio de janeiro
you'll index like:
"city": [ "r", "ri", "rio", "rio ", "rio d", "rio de", "rio de ", "rio de j", "rio de ja", "rio de jan", "rio de jane", "rio de janei", "rio de janeir", "rio de janeiro" ]
you notice lowercased. now, you'd want query take text (lowercase or not) , match what's in index. so, r
should match list above.
for happen want input text lowercased , kept user set it, meaning shouldn't analyzed. why you'd want this? because have split city names in ngrams , don't want same input text. if user inputs "ri", elasticsearch lowercase - ri
- , match against has in index.
a faster alternative multi_match
use term
, requires application/website lowercase text. reason term
doesn't analyze input text @ all.
{ "query": { "filtered": { "filter": { "term": { "city": { "value": "ri" } } } } } }
Comments
Post a Comment