大家好,欢迎来到IT知识分享网。
基于elasticsearch7.6.1 和 kibana7.6.1
本文通过案例进行讲解,希望读者耐心阅读
一、介绍
- 字段中心查询式,就是以字段为中心,代表就是 best_fields和most_fields,把所有的字段全都散列,然后从中查询结果。举个简单的例子,家庭住址不可能直接存储 “湖北省武汉市东湖高新区” 这样的字符串,一般存储的时候划分省/市/区,定义”provice”, “city”, “area”三个字段,当搜索 “湖北省 武汉市 东湖高新区” 的时候,会把所有包含 “湖北省”、”武汉市”、”东湖高新区”的数据 都检索出来,这里包含大量重复无用数据。
- 词条中心查询式,就是以词条为中心,代表就是 cross_fields,就是解决上面的问题的,以词条为中心搜索,关联多个字段。
二、best_fields、most_fields 多字段查询策略(字段中心查询式)
1. best_fields 多字段查询策略
best_fields策略获取最佳匹配字段的得分(最佳匹配字段得分就是最终得分)
A. 案例
查询address 中包含 “湖北省” 或 “开封市” ,content 中包含 “elasticsearch” 或 “beginner” 的员工。
a. 构建基础数据
POST /multi_search_staff/_bulk {"index":{"_id":1}} {"empId":"1","name":"员工001","age":20,"sex":"男","mobile":"","salary":23343,"deptName":"技术部","address":"湖北省武汉市洪山区光谷大厦","content":"i like to write best elasticsearch article"} {"index":{"_id":2}} {"empId":"2","name":"员工002","age":25,"sex":"男","mobile":"","salary":15963,"deptName":"销售部","address":"湖北省武汉市江汉路","content":"i think java is the best programming language"} {"index":{"_id":3}} {"empId":"3","name":"员工003","age":30,"sex":"男","mobile":"","salary":20000,"deptName":"技术部","address":"湖北省武汉市经济开发区","content":"i am only an elasticsearch beginner"} {"index":{"_id":4}} {"empId":"4","name":"员工004","age":20,"sex":"女","mobile":"","salary":15600,"deptName":"销售部","address":"湖北省武汉市沌口开发区","content":"elasticsearch and hadoop are all very good solution, i am a beginner"} {"index":{"_id":5}} {"empId":"5","name":"员工005","age":20,"sex":"男","mobile":"","salary":19665,"deptName":"测试部","address":"湖北省武汉市东湖隧道","content":"spark is best big data solution based on scala, an programming language similar to java"} {"index":{"_id":6}} {"empId":"6","name":"员工006","age":30,"sex":"女","mobile":"","salary":30000,"deptName":"技术部","address":"湖北省武汉市江汉路","content":"i like java developer"} {"index":{"_id":7}} {"empId":"7","name":"员工007","age":60,"sex":"女","mobile":"","salary":52130,"deptName":"测试部","address":"湖北省黄冈市边城区","content":"i like elasticsearch developer"} {"index":{"_id":8}} {"empId":"8","name":"员工008","age":19,"sex":"女","mobile":"","salary":60000,"deptName":"技术部","address":"湖北省武汉市江汉大学","content":"i like spark language"} {"index":{"_id":9}} {"empId":"9","name":"员工009","age":40,"sex":"男","mobile":"","salary":23000,"deptName":"销售部","address":"河南省郑州市郑州大学","content":"i like java developer"} {"index":{"_id":10}} {"empId":"10","name":"张湖北","age":35,"sex":"男","mobile":"","salary":18000,"deptName":"测试部","address":"湖北省武汉市东湖高新","content":"i like java developer, i also like elasticsearch"} {"index":{"_id":11}} {"empId":"11","name":"王河南","age":61,"sex":"男","mobile":"","salary":10000,"deptName":"销售部","address":"河南省开封市河南大学","content":"i am not like java"} {"index":{"_id":12}} {"empId":"12","name":"张大学","age":26,"sex":"女","mobile":"","salary":11321,"deptName":"测试部","address":"河南省开封市河南大学","content":"i am java developer, java is good"} {"index":{"_id":13}} {"empId":"13","name":"李江汉","age":36,"sex":"男","mobile":"","salary":11215,"deptName":"销售部","address":"河南省郑州市二七区","content":"i like java and java is very best, i like it, do you like java"} {"index":{"_id":14}} {"empId":"14","name":"王技术","age":45,"sex":"女","mobile":"","salary":16222,"deptName":"测试部","address":"河南省郑州市金水区","content":"i like c++"} {"index":{"_id":15}} {"empId":"15","name":"张测试","age":18,"sex":"男","mobile":"","salary":20000,"deptName":"技术部","address":"河南省郑州市高新开发区","content":"i think spark is good"}
b. 构建查询语句
GET /multi_search_staff/_search { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "best_fields", "fields": [ "content", "address" ] } }, "size": 15 } # 查看执行计划 GET /multi_search_staff/_explain/3 { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "best_fields", "fields": [ "content", "address" ] } } }
name=员工003,查询得分的解释
# address 贡献 2.
# content 贡献 3.
# final_score = max(2., 3.) = 3.
c. 构建查询语句(演示tie_breaker参数)
GET /multi_search_staff/_search { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "best_fields", "fields": [ "address", "content" ], "tie_breaker": 0.1 } } } # 查看执行计划 GET /multi_search_staff/_explain/3 { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "best_fields", "fields": [ "address", "content" ], "tie_breaker": 0.1 } } }
# 添加参数tie_breaker,设置其他匹配字段也参与final_score的计算
# 采用 best_fields 查询,并添加参数 tie_breaker=0.1
# 如果其他字段也匹配到了,final_score = 其他匹配字段得分 * 0.1 + 最佳匹配字段得分name=员工003,查询得分的解释
# address 贡献 2.
# content 贡献 3.
# final_score = 3. + 2. * 0.1 = 3.
d. 构建查询语句(演示minimum_should_match参数)
GET /multi_search_staff/_search { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "best_fields", "fields": [ "address", "content" ], "tie_breaker": 0.1, "minimum_should_match": 5 } }, "size": 15 }
# minimum_should_match 提高匹配条件阈值,提升搜索精度
# 我没有指定analyzer和search_analyzer,因此构建索引和查询时都采用默认standard分析器。
# 用户查询词”elasticsearch beginner 湖北省 开封市”, 被拆分成”elasticsearch”、”beginner”、”湖”、”北”、”省”、”开”、”封”、”市”, 基于multi_match查询就是8个should匹配条件。# minimum_should_match=5, 也就是起码需要满足8个should中的5个。address需要满足5个, content也需要满足5个, 最终也只有content字段满足条件。
# 因此只有id=3和id=4的员工满足查询条件。(“湖”、”北”、”省”、”市”、”区” 和content字段相匹配)
B. 补充
a. best_fields查询方式,等价于dis_max查询方式(重要)
# best_fields查询方式 GET /multi_search_staff/_search { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "best_fields", "fields": [ "address", "content" ] } }, "size": 15 } # dis_max查询方式 GET /multi_search_staff/_search { "query": { "dis_max": { "queries": [ { "match": { "address": { "query": "elasticsearch beginner 湖北省 开封市", "operator": "or" } } }, { "match": { "content": { "query": "elasticsearch beginner 湖北省 开封市", "operator": "or" } } } ] } }, "size": 15 }
b. 如下DSL语句貌似导致final_score的计算方式,并不是取最佳匹配字段的得分(best_fields策略算分失效了吗),什么原因呢。是由于minimum_should_match=5,导致的吗。
GET /multi_search_staff/_explain/3 { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "best_fields", "fields": [ "address", "content" ], "minimum_should_match": 5 } } } # 如下DSL解释了上面的疑问,因为content字段不满足minimum_should_match=5的条件,导致content字段不参与final_score的计算,给人一种best_fields策略算分失效的错觉!!! GET /multi_search_staff/_explain/3 { "query": { "dis_max": { "queries": [ { "match": { "address": { "query": "elasticsearch beginner 湖北省 开封市", "operator": "or", "minimum_should_match": 5 } } }, { "match": { "content": { "query": "elasticsearch beginner 湖北省 开封市", "operator": "or", "minimum_should_match": 5 } } } ] } } }
2. most_fields 多字段查询策略
most_fields策略获取全部匹配字段的累计得分(综合全部匹配字段的得分)
A. 案例
查询address 中包含 “湖北省” 或 “开封市” ,content 中包含 “elasticsearch” 或 “beginner” 的员工。
a. 构建查询语句(案例)
GET /multi_search_staff/_search { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "most_fields", "fields": [ "address", "content" ] } } } # 查看执行计划 GET /multi_search_staff/_explain/3 { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "most_fields", "fields": [ "address", "content" ] } } }
name=员工003,查询得分的解释
# address 贡献 0.(湖) + 0.(北) + 0.0(省) + 1.0(开) + 0.0(市) = 2.0
# content 贡献 1.(elasticsearch) + 1.(beginner) = 3.
# final_score = 2.0 + 3. = 5.
b. 构建查询语句(演示minimum_should_match参数)
GET /multi_search_staff/_search { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "most_fields", "fields": [ "address", "content" ], "minimum_should_match": 5 } } } GET /multi_search_staff/_explain/3 { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "most_fields", "fields": [ "address", "content" ], "minimum_should_match": 5 } } }
# minimum_should_match 提高匹配条件阈值,提升搜索精度
# final_score的结果也存在most_fields策略算分失效的错觉,参照best_fields方式找到等价dsl语句理解即可!!!
B. 补充
a. most_fields查询方式,等价于bool should查询方式(重要)
# most_fields查询方式 GET /multi_search_staff/_search { "query": { "multi_match": { "query": "elasticsearch beginner 湖北省 开封市", "type": "most_fields", "fields": [ "address", "content" ] } }, "size": 15 } # bool should查询方式 GET /multi_search_staff/_search { "query": { "bool": { "should": [ { "match": { "content": { "query": "elasticsearch beginner 湖北省 开封市", "operator": "or" } } }, { "match": { "address": { "query": "elasticsearch beginner 湖北省 开封市", "operator": "or" } } } ] } }, "size": 15 }
三、cross_fields 多字段查询策略(词条中心查询式)
# 举个简单的例子,地址存储的时候不会直接存储”湖北省武汉市东湖高新区”这样的字符串,一个完整的地址需要用多个字段来唯一标识。
# 一般存储的时候划分”provice”, “city”, “area”三个字段。
# 那我搜寻 “湖北省 武汉市 江汉区” 这个完整地址的时候,会如何查询 provice=”湖北省”, city=”武汉市”, area=”江汉区”呢。# 如果用most_fields 是什么效果?还是通过案例来讲解。
A. 案例
a. 构建基础数据
POST /multi_search_staff_2/_bulk {"index":{"_id":1}} {"empId":"1","name":"员工001","age":20,"sex":"男","mobile":"","salary":17333,"deptName":"技术部","provice":"湖北省","city":"武汉市","area":"光谷大道","address":"湖北省武汉市洪山区光谷大厦","content":"i like to write best elasticsearch article"} {"index":{"_id":2}} {"empId":"2","name":"员工002","age":25,"sex":"男","mobile":"","salary":15963,"deptName":"销售部","provice":"湖北省","city":"武汉市","area":"江汉区","address":"湖北省武汉市江汉路","content":"i think java is the best programming language"} {"index":{"_id":3}} {"empId":"3","name":"员工003","age":30,"sex":"男","mobile":"","salary":20000,"deptName":"技术部","provice":"湖北省","city":"武汉市","area":"经济技术开发区","address":"湖北省武汉市经济开发区","content":"i am only an elasticsearch beginner"} {"index":{"_id":4}} {"empId":"4","name":"员工004","age":20,"sex":"女","mobile":"","salary":15600,"deptName":"销售部","provice":"湖北省","city":"武汉市","area":"沌口开发区","address":"湖北省武汉市沌口开发区","content":"elasticsearch and hadoop are all very good solution, i am a beginner"} {"index":{"_id":5}} {"empId":"5","name":"员工005","age":20,"sex":"男","mobile":"","salary":19665,"deptName":"测试部","provice":"湖北省","city":"高新开发区","area":"武汉市","address":"湖北省武汉市东湖隧道","content":"spark is best big data solution based on scala, an programming language similar to java"} {"index":{"_id":6}} {"empId":"6","name":"员工006","age":30,"sex":"女","mobile":"","salary":30000,"deptName":"技术部","provice":"武汉市","city":"湖北省","area":"江汉区","address":"湖北省武汉市江汉路","content":"i like java developer"} {"index":{"_id":7}} {"empId":"7","name":"员工007","age":60,"sex":"女","mobile":"","salary":52130,"deptName":"测试部","provice":"湖北省","city":"黄冈市","area":"边城区","address":"湖北省黄冈市边城区","content":"i like elasticsearch developer"} {"index":{"_id":8}} {"empId":"8","name":"员工008","age":19,"sex":"女","mobile":"","salary":60000,"deptName":"技术部","provice":"湖北省","city":"武汉市","area":"汉阳区","address":"湖北省武汉市江汉大学","content":"i like spark language"} {"index":{"_id":9}} {"empId":"9","name":"员工009","age":40,"sex":"男","mobile":"","salary":23000,"deptName":"销售部","provice":"河南省","city":"郑州市","area":"二七区","address":"河南省郑州市郑州大学","content":"i like java developer"} {"index":{"_id":10}} {"empId":"10","name":"张湖北","age":35,"sex":"男","mobile":"","salary":18000,"deptName":"测试部","provice":"湖北省","city":"武汉市","area":"高新开发区","address":"湖北省武汉市东湖高新","content":"i like java developer i also like elasticsearch"} {"index":{"_id":11}} {"empId":"11","name":"王河南","age":61,"sex":"男","mobile":"","salary":10000,"deptName":"销售部","provice":"河南省","city":"开封市","area":"金明区","address":"河南省开封市河南大学","content":"i am not like java "} {"index":{"_id":12}} {"empId":"12","name":"张大学","age":26,"sex":"女","mobile":"","salary":14321,"deptName":"测试部","provice":"河南省","city":"开封市","area":"金明区","address":"河南省开封市河南大学","content":"i am java developer, thing java is good"} {"index":{"_id":13}} {"empId":"13","name":"李江汉","age":36,"sex":"男","mobile":"","salary":11215,"deptName":"销售部","provice":"河南省","city":"郑州市","area":"二七区","address":"河南省郑州市二七区","content":"i like java and java is very best, i like it do you like java "} {"index":{"_id":14}} {"empId":"14","name":"王技术","age":45,"sex":"女","mobile":"","salary":16222,"deptName":"测试部","provice":"河南省","city":"郑州市","area":"金水区","address":"河南省郑州市金水区","content":"i like c++"} {"index":{"_id":15}} {"empId":"15","name":"张测试","age":18,"sex":"男","mobile":"","salary":20000,"deptName":"技术部","provice":"河南省","city":"郑州市","area":"高新开发区","address":"河南省郑州高新开发区","content":"i think spark is good"}
b. 构建查询语句(案例)
# 通过most_fields 进行查询 GET /multi_search_staff_2/_search { "query": { "multi_match": { "query": "湖北省 武汉市 江汉区", "fields": [ "provice", "city", "area" ], "type": "most_fields", "operator": "and" } } } GET /multi_search_staff_2/_search { "query": { "multi_match": { "query": "湖北省 武汉市 江汉区", "fields": [ "provice", "city", "area" ], "type": "most_fields", "operator": "or" } } }
# 没有一个文档可以匹配到,因为查询的关键字分布在多个字段中
# operator=and, 就是provice包含”湖北省 武汉市 江汉区” && city包含”湖北省 武汉市 江汉区” && area包含”湖北省 武汉市 江汉区”。
c. 构建查询语句(案例)
GET /multi_search_staff_2/_search { "query": { "multi_match": { "query": "湖北省 武汉 江汉区", "fields": [ "provice", "city", "area" ], "type": "cross_fields", "operator": "and" } } } # id=6的员工数据,查询执行计划 GET /multi_search_staff_2/_explain/2 { "query": { "multi_match": { "query": "湖北省 武汉 江汉区", "fields": [ "provice", "city", "area" ], "type": "cross_fields", "operator": "and" } } }
# cross_fields查询方式
# 如果以词条为中心的查询试,解析结果是,”湖北省”、”武汉市”、”江汉区” 都必须出现,但是可以出现在任意字段中。
# provice中包含 “湖北省” || “武汉市” || “江汉区”
# city中包含 “湖北省” || “武汉市” || “江汉区”
# area中包含 “湖北省” || “武汉市” || “江汉区”
# 匹配逻辑:
# 8个shuold条件,每个should条年必须都要满足
# 整体查询逻辑如下:{provice=”湖” || city=”湖” || area=”湖”} && {provice=”北” || city=”北” || area=”北”} …. && {provice=”区” || city=”区” || area=”区”}
# id=6的员工数据,即使province和city关系颠倒了也是可以检索到的。
参考资料:
Elasticsearch实战(七)—BestFields MostFields CrossFields 多字段搜索策略_jzjie的博客-CSDN博客
Elasticsearch实战(八)— 词条为中心的 CrossFields 多字段搜索策略_jzjie的博客-CSDN博客
免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://haidsoft.com/158856.html