Elasticsearch-Query DSL详解

Query DSL

0.基础操作

查看集群的状况:GET /_cat

查看集群健康状态:

1 2	# v是用来要求在结果中返回表头 GET /_cat/health?v

health状态值说明

1
2
3

Green - 最佳状态
Yellow - 数据和集群可用，但是集群的备份有的是坏的
Red - 数据和集群都不可用

查看节点状态:GET /_cat/nodes?v
查看所有索引:GET /_cat/indices?v

列出每个Index所包含的Type:

1 2	# 增加pretty参数，会让Es美化输出(pretty-print)JSON响应以便更加容易阅读 GET /_mapping?pretty=true

1.索引(index)

注意：索引的名称必须是小写的，不可重名

查询所有index:GET /_cat/indices?v
查询index:GET /test or GET /test*
新增index(新增一个名为test的index):PUT /test
删除index(删除test这个index):DELETE /test

2.文档(document)

index中的单条记录称之为document。

2.1 新增document

2.1.1 新增时指定ID(用PUT)

格式:/Index/Type/Id

# 指定了文档ID=1
PUT /test/_doc/1
{
    "name": "lxy1",
    "ename": "lxy1",
    "age": 18,
    "about": "I am a good coder",
    "interest": [
        "eat",
        "coding"
    ],
    "interest_count": 2
}
# 指定了文档ID=2
PUT /test/_doc/2
{
    "name": "lxy2",
    "ename": "lxy2",
    "age": 29,
    "about": "I am a tester",
    "interest": [
        "eat",
        "testing"
    ],
    "interest_count": 2,
    "job": null
}
# 指定了文档ID=3
PUT /test/_doc/3
{
    "name": "lxy3",
    "ename": "lxy3",
    "age": 3,
    "about": "I am a baby",
    "interest": [
        "eat",
        "play",
        "sleep"
    ],
    "interest_count": 3,
    "job": []
}

2.1.2 由ES自动生成ID(用POST新增文档，ES自动为文档生成20位的ID)

POST /test/_doc
{
    "name": "小康康",
    "age": 2,
    "about": "I am a cat",
    "interest": [
        "eat"
    ],
    "interest_count": 1,
    "job": [
        null,
        "get mouse"
    ]
}

2.1.3 指定操作类型参数：op_type

新增文档时，可以同时指定op_type，当es里已经存在相同ID的文档时，就会新增失败

# 指定操作类型op_type=create
# 执行会报:document already exists
PUT /test/_doc/3?op_type=create
{
    "name": "lxy3",
    "ename": "lxy3",
    "age": 3,
    "about": "I am a baby",
    "interest": [
        "eat",
        "play",
        "sleep"
    ],
    "interest_count": 3,
    "job": []
}
# 等价于：指定是create文档
PUT /test/_doc/3/_create
{
    "name": "lxy3",
    "ename": "lxy3",
    "age": 3,
    "about": "I am a baby",
    "interest": [
        "eat",
        "play",
        "sleep"
    ],
    "interest_count": 3,
    "job": []
}

2.1.4 指定文档版本号version(新版本改为seq_no和primary_term，所以此处直接介绍seq_no方式)

每个文档的版本号”_version”起始值都为1,每次对当前文档成功操作后都加1

而序列号”_seq_no”则可以看做是索引的信息,在第一次为索引插入数据时为0,后面每对索引内数据操作成功一次加1，并且文档会记录是第几次操作使它成为现在的情况的。

#先新增一个ID=100的文档
PUT /test/_doc/100
{
  "name": "羽哥"
}
# 执行结果可以看到_seq_no

当我们要修改时，带上if_seq_no参数（值是我们认定的更新前，当前文档的_seq_no值），如果我们指定的与ES里文档相等，则能成功，否则报异常。

# 指定更新前文档_seq_no=17
PUT /test/_doc/100?if_seq_no=17&if_primary_term=1
{
  "name": "羽哥2"
}
# id=100的文档的_seq_no=18，所以修改成功，_seq_no变为18

2.2 查询document

2.2.1 查询index的所有document

GET /test/_search

2.2.2 根据id查询document(查询id=1的文档)

GET /test/_doc/1?pretty

2.2.3 通过size指定返回结果条数

# 只查询一条，默认返回10条
GET /test/_search
{
  "query": {"match_all": {}},
  "size": 1
}

2.2.4 用sort排序查询

# 对age进行倒序查询
GET /test/_search
{
  "query": {"match_all": {}},
  "sort": [{"age": {"order": "desc"}}]
}
# 或者
GET /test/_search
{
  "query": {"match_all": {}},
  "sort": {"age": { "order": "desc"}}
}

2.2.5 用from和size分页查询

# 查询前2条数据, from是从0开始的
GET /test/_search
{
  "query": {"match_all": {}},
  "sort": [{"age": {"order": "desc"}}],
  "from": 0,
  "size": 2
}

2.2.6 根据ID判断是否存在

1 2	# 存在返回200，不存在返回404 HEAD /test/_doc/1

2.2.7 用_source参数过滤返回的字段

# 不返回_source
GET /test/_doc/1?_source=false

# 只返回_source
GET /test/_doc/1/_source

# 只返回ID=1文档的name,age
GET /test/_doc/1?_source=name,age

# 只返回_source中某些字段
GET /test/_search
{
  "_source": {
    "includes": ["name", "age"]
  }
}

# 如果只是_source_includes可以简写如下
GET /test/_search
{
  "_source": ["name", "age"]
}

# 只返回以a开头的字段
GET /test/_search
{
  "_source": ["w*"]
}

# _source_excludes:屏蔽部分字段
GET /test/_search
{
  "_source": {
    "excludes": ["name", "age"]
  }
}

# 都有时，excludes优先级>includes优先级
# includes包含了name，但excludes也包含了name。返回结果中只有age。
GET /test/_search
{
  "_source": {
    "includes": ["name", "age"],
    "excludes": ["name", "interest"]
  }
}

# 只返回a开头并且不是e结尾的字段
GET /test/_search
{
  "_source": {
    "includes": "a*",
    "excludes": "*e"
  }
}

# 带上查询条件和_source过滤
GET /test/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "age": 3
        }
      }
    }
  }, 
  "_source": ["name", "age"]
}

# 只排除某些字段
GET /test/_doc/1?_source_excludes=about,interest

# 判断_source是否存在,存在返回200，不存在返回404
HEAD test/_doc/1/_source

# 在ES7.X中,可以有如下写法
GET test/_source/1

GET test/_source/1?_source=name

2.3 修改document

2.3.1 用PUT替换document

# 首先查看id=1的文档
GET /test/_doc/1
# 用PUT方式更新id=1的文档
PUT /test/_doc/1
{
  "name": "羽哥"
}
# 更新后再查看id=1的文档
GET /test/_doc/1
# 通过上面发现用PUT是替换了整个文档，而不是更新name这一个字段

2.3.2 用POST更新document

先恢复id=1的文档数据，可通过再次执行2.1.1中id=1的语句

# 修改name,并新增interesting这个字段
# POST /pigg/_doc/1/_update,新版本格式改为/{index}/_update/{id}
POST /test/_update/1
{
  "doc":{
      "name": "羽哥",
      "interesting": "watching TV"
  }
}
# 这时发现用POST更新的是文档的局部字段,原来有的字段更新,没有的字段则新增这个字段

2.3.3 用script更新document

# 查询当前id=1的人的age是18,现在要对age加1
POST /test/_update/1
{
  "script": "ctx._source.age += 1"
}

2.3.4 用script更新document其他例子

# age-1
POST /test/_update/1
{
  "script": {
    "source": "ctx._source.age -= 1"
  }
}

# age=30
POST /test/_update/1
{
  "script": {
    "source": "ctx._source.age = 30"
  }
}

# name='witerking'
POST /test/_update/1
{
  "script": {
    "source": "ctx._source.name = 'lxyking'"
  }
}

# 给数组添加一个值,就算存在也添加,语言用painless
POST /test/_update/1
{
  "script": {
    "source": "ctx._source.interest.add(params.interest)",
    "lang": "painless",
    "params": {
      "interest": "sleep"
    }
  }
}

# 给数组添加一个值,不存在才添加,语言用painless
POST /test/_update/1
{
  "script": {
    "source": "if(!ctx._source.interest.contains(params.interest)) {ctx._source.interest.add(params.interest)}",
    "lang": "painless",
    "params": {
      "interest": "sleeping"
    }
  }
}

# 给文档添加一个新字段new_name 
POST /test/_update/1
{
  "script": {
    "source": "ctx._source.new_name = '天才'",
    "lang": "painless"
  }
}

# 字段直接复制
POST /test/_update/1
{
  "script": {
    "source": "ctx._source.new_name = ctx._source.name",
    "lang": "painless"
  }
}

# 删除一个字段,不修改mapping
POST /test/_update/1
{
  "script": "ctx._source.remove('new_name')"
}

2.4 删除document

2.4.1 根据ID删除

DELETE /test/_doc/1

2.4.2 根据查询结果进行删除(delete_by_query)

# delete_by_query
POST /test/_delete_by_query
{
  "query": {
    "term": {
      "_id": "5"
    }
  }
}

# 忽略版本冲突,继续执行删除操作
POST /pig/_delete_by_query?conflicts=proceed
{
  "query": {
    "term": {
      "_id": "5"
    }
  }
}

3.DSL文档查询

3.1 准备测试数据

PUT /test/_doc/1
{
  "name": "lxy",
  "age": 18,
  "hometown": "杭州",
  "gender": "male",
  "interesting": "watching TV"
}
PUT /test/_doc/2
{
  "name": "yfw",
  "age": 28,
  "hometown": "福建",
  "gender": "female",
  "interesting": "watching movie"
}
PUT /test/_doc/3
{
  "name": "yb",
  "age": 30,
  "hometown": "苏州",
  "gender": "female"
}

3.2 查询部分字段

GET /test/_search
{
  "_source": ["name", "age"]
}

3.3 match

如果有多个搜索关键字,Elastic认为它们是or的关系。

# 查询interesting匹配"watching TV"(查询watching或者TV)
GET /test/_search
{
  "query": {
    "match": {
      "interesting": "watching TV"
    }
  }
}
# 或者通过9200端口的HTTP形式查询
curl -XPOST 'localhost:9200/test/_search?pretty' -d '
{
  "query" : { "match" : { "interesting" : "watching TV" }}
}'

看到结果也返回了”interesting”= “watching movie”的数据, 其中id=1的_score要比id=2的要高,这个_score说明是匹配的程度,id=1的要比id=2的更加匹配

# 查询interesting匹配"TV"或者"moive"
GET /test/_search
{
  "query": {
    "match": {
      "interesting": "TV movie"
    }
  }
}

上面结果命中了2个人, “_score”都是0.6931472,说明匹配度两者相同

# 查询age=30的
GET /test/_search
{
  "query": {
    "match": {
      "age": 30
    }
  }
}

3.4 match_phrase

# 短语查询,这个会将"watching TV"作为一个短语去进行匹配查询(精确匹配)
GET /test/_search
{
  "query": {
    "match_phrase": {
      "interesting": "watching TV"
    }
  }
}

3.5 must(且)

# 查询interesting匹配"watching TV",并且gender匹配"female"
GET /test/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": {"interesting": "watching TV" }},
        { "match": {"gender": "female" }}
      ]
    }
  }
}

{ "match": {"interesting": "watching TV" }}这条件语句能返回id=1或2的数据,{ "match": {"gender": "female" }}这条件语句能返回id=2或3的数据,这两条语句是且的关系,所有最后返回id=2的数据

3.6 should(或)

3.6.1 查询interesting匹配”watching mobile”,或gender匹配”female”

#查询interesting匹配"watching mobile",或gender匹配"female"
GET /test/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": {"interesting": "watching movie" }},
        { "match": {"gender": "female" }}
      ]
    }
  }
}

从上面结果看,id=2的数据匹配得分最高,另外两个匹配度相同,注意这次查询的是”watching mobile”,不是”watching TV”

3.6.2 minimum_should_match
这个是指或的条件,必须满足多少条

# 下面的minimum_should_match=2,所以一条都查不到
GET /test/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "range": {
            "age": {
              "gte": 0,
              "lte": 3
            }
          }
        },
        {
         "match": { "hometown.keyword": "杭州" }
        }
      ],
      "minimum_should_match": 2
    }
  }
}

3.7 must_not(非)

# 查询interesting不匹配"watching TV",并且gender不匹配"female"
GET /test/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "match": {"interesting": "watching TV"} },
        { "match": {"gender": "female"} }
      ]
    }
  }
}

查询结果是空的,没有匹配的数据

3.8 组合多查询bool

# 用SQL表示如下:where gender != 'male' and ((age >= 0 and age <= 3) or hometown = '福建')
GET /test/_search
{
  "query": {
    "bool": {
      "must_not": [{"match": {"gender": "male"}}],
      "should": [
        {
          "range": {
            "age": {
              "gte": 0,
              "lte": 3
            }
          }
        },
        {
         "match": {"hometown.keyword": "福建"}
        }
      ]
    }
  }
}

3.9 不影响评分的filter

# 如果不希望age的比较影响评分,可以放到filter里
GET /test/_search
{
  "query": {
    "bool": {
      "must": [{"match": {"interesting": "watching TV"}}], 
      "filter": {
        "range": {
          "age": {
            "gte": 10,
            "lte": 29
          }
        }
      }
    }
  }
}

4.DSL聚合查询

4.1 准备测试数据

数据同3.1，此处略

4.2 group by(举例: 按照性别分组)

# SQL描述
SELECT gender,COUNT(1) 
FROM test 
GROUP BY gender 
ORDER BY COUNT(1) DESC 
# DSL描述
GET /test/_search
{
  "size": 0,
  "aggs": {
    "group_by_gender": {
      "terms": {
        "field": "gender.keyword"
      }
    }
  }
}

4.3 avg(举例: 求平均年龄)

# SQL描述
SELECT AVG(age) FROM test
# DSL描述
# "size":0不返回查询结果，仅返回聚合结果
GET /test/_search
{
  "size":0,
  "aggs": {
    "avg_age": {
      "avg": {
        "field": "age"
      }
    }
  }
}

4.4 avg和group by组合(举例: 求不同性别的平均年龄)

# SQL描述
SELECT gender, AVG(age) as 'avg_age'
FROM test
GROUP BY gender 
ORDER BY 'avg_age' DESC
# DSL描述
GET /test/_search
{
  "size": 0, 
  "aggs": {
    "group_by_gender":{
      "terms": {
        "field": "gender.keyword",
        "order": {
          "avg_age": "desc"
        }
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

4.5 date_range

# 查询最近7天的数据
GET /test/_search
{
  "size": 0,
  "aggs": {
    "range": {
      "date_range": {
        "field": "date",
        "format": "yyyy-MM-dd",
        "ranges": [
          {
            "from": "now-7d/d",
            "to": "now"
          }
        ]
      }
    }
  }
}

4.6 date_histogram 日期直方图

基于日期类型字段，以日期间隔来分桶聚合。可用的时间间隔类型为：year、quarter、month、week、day、hour、minute、second，其中，除了year、quarter 和 month，其余可用小数形式。

# 统计每个月的人数
GET /test/_search
{
  "size": 0,
  "aggs": {
    "dates": {
      "date_histogram": {
        "field": "date",
        "interval": "month",
        "format": "yyyy-MM-dd"
      }
    }
  }
}
# 去除统计的doc_count=0的数据
GET /test/_search
{
  "size": 0,
  "aggs": {
    "dates": {
      "date_histogram": {
        "field": "date",
        "interval": "month",
        "format": "yyyy-MM",
        "min_doc_count": 1
      }
    }
  }
}
# 先确定日期范围，然后再统计(统计2021年1月之后的数据)
GET /test/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": {
        "range": {
          "date": {
            "gte": "2021-01-01"
          }
        }
      }
    }
  },
  "aggs": {
    "dates": {
      "date_histogram": {
        "field": "date",
        "interval": "month",
        "format": "yyyy-MM",
        "min_doc_count": 1
      }
    }
  }
}

4.7 missing:缺失值的桶聚合

# 总人数是7，有friend是2个人，所以返回值是5
POST /test/_search?size=0
{
  "aggs": {
    "account_without_friend": {
      "missing": {"field": "friend.keyword"}
    }
  }
}

4.8 collapse:折叠查询

# 按照阵营group分组，取每个group里年龄最大的人，同时展示每个group里年龄前2名的人。
GET /test/_search
{
  "collapse": {
    "field": "group.keyword",
    "inner_hits": {
      "name": "old_age",
      "size": 2,
      "sort": [{"age": "desc"}]
    }
  },
  "sort": [{"age": {"order": "desc"}}]
}

ES的聚合功能非常强大,远不止上面的例子,除了AVG还有MAX、MIN、SUM、COUNT、STATS等

5.桶聚合(Terms Aggregation)

桶聚合的种类很多，一篇短文难以覆盖，这篇先介绍Terms Aggregation（按字段分组），类似MySQL中的Group By，它是最常用的聚合方式。注意用于聚合的字段不能是text类型。

5.1 准备测试数据

PUT /user/_doc/1
{
  "id": "1",
  "name": "张一",
  "dept": "web",
  "path": "dept1,man1",
  "birthday": "2008-11-16",
  "status": "1"
}

PUT /user/_doc/2
{
  "id": "2",
  "name": "张二",
  "dept": "web",
  "path": "dept1,man2",
  "birthday": "2008-12-17",
  "status": "1"
}

PUT /user/_doc/3
{
  "id": "3",
  "name": "张三",
  "dept": "web",
  "path": "dept1,man3",
  "birthday": "2009-10-10",
  "status": "1"
}

PUT /user/_doc/4
{
  "id": "4",
  "name": "李四",
  "dept": "java",
  "path": "dept2,man4",
  "birthday": "2012-01-01",
  "status": "1"
}

PUT /user/_doc/5
{
  "id": "5",
  "name": "王五",
  "dept": "java",
  "path": "dept2,man5",
  "birthday": "2012-07-01",
  "status": "0"
}

PUT /user/_doc/6
{
  "id": "6",
  "name": "王六",
  "dept": "data",
  "status": "0",
  "path": "dept3,man6",
  "birthday": "2009-12-12",
  "gender": "man"
}

5.2 Terms Aggregation（根据字段的值分组）

Terms聚合用于分组

5.2.1 count:根据dept分组，求每个部门的数量

# SQL
GET _sql?format=txt
{
  "query": "SELECT dept, COUNT(*) num FROM user GROUP BY dept" 
}

# DSL
GET /user/_doc/_search
{
  "size": 0,
  "aggs": {
    "depts": {
      "terms": {
        "field": "dept.keyword"
      }
    }
  }
}

5.2.2 count+order:根据dept分组，求每个部门的数量，并排序

# SQL
GET _sql?format=txt
{
  "query": "SELECT dept, COUNT(*) num FROM user GROUP BY dept ORDER BY num DESC"
}
# DSL("_count"按照数量排序,改成"_key"则按照key(这里就是指dept)排序)
GET /user/_doc/_search
{
  "size": 0,
  "aggs": {
    "depts": {
      "terms": {
        "field": "dept.keyword",
        "order": {"_count": "desc"}
      }
    }
  }
}

5.2.3 having:对统计结果过滤(聚合完，再过滤)

# SQL
GET _sql?format=txt
{
  "query": "SELECT dept, COUNT(*) num FROM user GROUP BY dept HAVING num > 1 ORDER BY num DESC"
}

# SQL转DSL语句
POST /_sql/translate
{
  "query": "SELECT dept, COUNT(*) num FROM user GROUP BY dept HAVING num > 1 ORDER BY num DESC"
}

5.2.4 先过滤后，再聚合(类似where)

# SQL
GET _sql?format=txt
{
  "query": "SELECT dept, COUNT(*) num FROM user WHERE status = 1 GROUP BY dept ORDER BY num DESC"
}
# DSL
GET /user/_doc/_search
{
  "size": 0,
  "query": {
    "term": {
      "status": 1
    }
  },
  "aggs": {
    "dept_count": {
      "terms": {
        "field": "dept.keyword",
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

6.Index API

6.1 Index API简介

Index API可以把JSON文档新增或修改到指定的索引(Index)中，从而使该JSON文档能够被搜索到。下面举例在一个叫”dept”的索引(Index)中，新增一个id为1的部门数据。

PUT /dept/_doc/1
{
  "name": "部门1",
  "code": "dept1"
}

执行结果如下：

{
  "_index" : "dept",//index的名称是dept
  "_type" : "_doc",//type的名称是_doc
  "_id" : "1",//指定id为1
  "_version" : 1,//数据版本号，如果上面put语句再执行一次，_version会+1
  "result" : "created",//created表明是新增操作，如果上面put语句再执行一次，就会是updated，表明是修改
  "_shards" : {
    "total" : 2,//表明应该有多少分片（包括主分片和副本分片）执行索引，这里2表明应该有一个主分片和一个副本分片
    "successful" : 1,//这里是1，因为我本机就启动了一个es节点，无有效的副本分片，只有1个主分片成功
    "failed" : 0//在副本分片上操作失败的数量，这里我本机没有副本分片，所以没有在副本分片上操作，所以是0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

7.Query DSL 查询教程

ES提供了一个基于JOSN格式的查询语句-Query DSL，它包含2种类型：叶子查询和组合查询

7.1 叶子查询

叶子查询语句是在一个特定字段(field)上查询特定的值(value)，例如match,term或range查询，它们可以单独使用。例如：

GET /test/_search
{
  "query": {
    "term": {
      "age": 18
    }
  }
}

7.2 组合查询

组合查询套在别的叶子查询或组合查询外面，它是用来组合逻辑上复杂多样的查询。例如：

GET /test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "age": 18
          }
        },
        {
          "match": {
            "name": "lxy"
          }
        }
      ]
    }
  }
}

7.3 词项查询（Term Level Query）

Term Level Query是在字段上查询精确的值，而不是进行全文匹配（比如match）。通常用到的词项查询有term，terms，ids，range，prefix，exists等等，下面一一介绍。

7.3.1 查看mapping

1	GET /test/_mapping

因为没有提前定义mapping，所以是es自生成的mapping。其中name属性如下，name本身是text类型，内部包含一个keyword类型，用精确查询时，就用name.keyword属性。

7.3.2 term

(一定要理解term是包含的意思，不是等于)

term query判断文档的某个字段(field)是否包含某一个确定的值。一般都是用在结构化数据上，比如keyword，int，long，ip，date等，避免用在text类型的字段上，在text上应该用match匹配查询。

# 准备数据
PUT /test/_doc/10
PUT /test/_doc/10
{
  "name":"刘一",
  "age":18
}
# 查询age=18
GET /test/_search
{
  "query": {
    "term": {
      "age": 18
    }
  }
}

# 查询name.keyword="刘一"(能查到结果)
GET /test/_search
{
  "query": {
    "term": {
      "name.keyword": "刘一"
    }
  }
}

为何要避免在text类型上用Term Level Query

name是text类型，默认下es会分词。”刘一”被分词为”刘”，”一”这2个词项，es会根据这2个词项产生对应的倒排索引。

如果term查询name=”刘一”，因为没有”刘一”这个词项，所以下面语句是查不到数据的。

# 查询name="刘一"
GET /test/_search
{
  "query": {
    "term": {
      "name": "刘一"
    }
  }
}

7.3.3 terms

- terms和term类似，它判断文档的某个字段(field)是否**包含某一个或多个确定的值**（value）。
- value参数是一个数组，里面包含多个你想要查询的值，**只要有一个值命中就算符合**。
- value数组里的最多能放65536个值，如果想修改这个限制，可以修改setting里index.max_terms_count这个配置

# 包含play或sleep
GET /test/_search
{
  "query": {
    "terms": {
      "interest": ["play", "sleep"]
    }
  }
}
# 用must组合查询interest=["eat", "sleep"]
GET /test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "interest.keyword": "eat"
          }
        },
        {
          "term": {
            "interest.keyword": "sleep"
          }
        },
        {
          "term": {
            "interest_count": 2
          }
        }
      ]
    }
  }
}

7.3.4 terms_set

terms_set查询和terms类似，但它定义了一个最小命中数。比如value数组里有3个值，最小命中数定义为2，说明字段里的值至少命中数组里的2个，才算这个文档复合。而terms查询只要命中数组里任意1个，就算文档符合查询条件。

# 至少命中[“eat”, “play”, “sleep”]中2个
GET /test/_search
{
  "query": {
    "terms_set": {
      "interest.keyword": {
        "terms": ["eat","play","sleep"],
        "minimum_should_match_script": {
          "source": "2"
        }
      }
    }
  }
}
# 用terms_set解决interest=[“eat”, “sleep”]
GET /test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "terms_set": {
            "interest.keyword": {
              "terms": ["eat","sleep"],
              "minimum_should_match_script": {
                "source": "2"
              }
            }
          }
        },
        {
          "term": {
            "interest_count": {"value": 2}
          }
        }
      ]
    }
  }
}
# 用terms_set解决interest=[“eat”, “sleep”]且不评分
# 文章最上面讲过用filter context，这样不评分也提高查询性能，以后如果是无需评分的，最好放到filter里面。
GET /test/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "terms_set": {
                "interest.keyword": {
                  "terms": ["eat","sleep"],
                  "minimum_should_match_script": {
                    "source": "2"
                  }
                }
              }
            },
            {
              "term": {
                "interest_count": {"value": 2}
              }
            }
          ]
        }
      }
    }
  }
}

7.3.5 ids

根据文档ID查询文档

# 下面返回1和4两个文档，因为id=100的不存在。
GET /test/_search
{
  "query": {
    "ids": {
      "values": ["1","4","100"]
    }
  }
}

7.3.6 exists

在MySQL中，常用is null和is not null，ES用exists。
- 下面情况认定字段不存在
- 源JSON中的字段是null或[]
- 字段在mapping中设置为”index” : false
- 字段值的长度超出了mapping中设置的ignore_above
- 字段值格式错误，并且mapping中设置的了ignore_malformed
- 下面特殊情况认定字段存在
- 空字符串，例如””或”-“
- 包含null和另一个值的数组，例如[null, “foo”]
- 在mapping中设置的自定义null-value

# 查询interesting值存在的文档
GET /test/_search
{
  "query": {
    "exists": {
      "field": "interesting"
    }
  }
}
# 查询gender值不存在的文档
# 利用must_not+exists组合查询，代替之前老版本ES的missing查询
GET /test/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "exists": {
            "field": "gender"
          }
        }
      ]
    }
  }
}

7.3.7 range

range查询判断某个字段是否在某个范围内。range有如下参数：

参数	说明
gte	>=
gt	>
lte	<=
lt	<
boost	查询权重，默认1.0

# 查询age在[3,30)之间的数据
GET /test/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "gte": 3,
            "lt": 30
          }
        }
      }
    }
  }
}

7.3.8 prefix

prefix查询返回在指定字段中包含特定前缀的文档。这个就像我们SQL语句的 like “xx%”。
您可以使用mapping设置中的index_prefixes参数来加快前缀查询的速度。如果启用该参数，Elasticsearch只会为单独的字段索引2到5个字符的前缀。这使Elasticsearch在较大的索引上更有效率执行前缀查询，从而减少成本。

# 查询name以“y”开头的数据。
GET /test/_search
{
  "query": {
    "prefix": {
      "name.keyword": "y"
    }
  }
}

7.3.9 wildcard

wildcard查询就像MySQL的Like查询，它查询效率比较低，一般也不用

# 查询hometown里包含“州”的人：
GET /test/_search
{
  "query": {
    "wildcard": {
      "hometown.keyword": {
        "value": "*州*"
      }
    }
  },
  "_source": ["name","hometown"]
}

8.Mapping 字段类型(keyword text date numeric)

类似于Mysql的表结构，虽然ES有mapping动态映射，但是它自动生成的不一定是我们期望的。

8.1 keyword

keyword是关键词类型，ES把keyword类型的值当作词根存在倒排索引中，不进行分词。
keyword适合存结构化数据，比如name,age,性别,手机号,status(数据状态),tags(标签)，HttpCode(404,200,500)等。
字段常用来精确查询，过滤，排序，聚合时，应设为keyword，而不是数值型。
最长支持32766个UTF-8类型的字符，但放入倒排索引时，只截取前一段字符串，长度由ignore_above参数决定。
8.2 text
text文本类型，如果要对字符串进行分词分析，可以设置为text。

ES自带了很多分词器，如果是中文，可以给ES安装ik中文分词插件。

# 分析name是怎么被分词的
GET /test/_analyze
{
  "field": "name",
  "text": "I am a coder"
}
# 执行结果：(可以看到这短语被分成4个词项，其中大写"I"还转换为小写"i")
{
  "tokens" : [
    {
      "token" : "i",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "am",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "a",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "coder",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 3
    }
  ]
}

8.3 Boolean

判断	ES接受的值
真	true,”true”
假	false,”false”,””（空字符串）

8.4 日期类型

date的格式可以被指定的，如果没有特殊指定，默认格式是”strict_date_optional_time||epoch_millis”

这段话可以理解为格式为strict_date_optional_time或者epoch_millis

8.4.1 什么是epoch_millis？
epoch_millis就是从开始纪元（1970-01-01 00:00:00 UTC）开始的毫秒数-长整型。
8.4.2 什么是strict_date_optional_time？
strict_date_optional_time是date_optional_time的严格级别，这个严格指的是年份、月份、天必须分别以4位、2位、2位表示，不足两位的话第一位需用0补齐。
- 常见的格式有如下：
  - yyyy
  - yyyyMM
  - yyyyMMdd
  - yyyyMMddHHmmss
  - yyyy-MM
  - yyyy-MM-dd
  - yyyy-MM-ddTHH:mm:ss
  - yyyy-MM-ddTHH:mm:ss.SSS
  - yyyy-MM-ddTHH:mm:ss.SSSZ

工作常见到是”yyyy-MM-dd HH:mm:ss”，但是ES是不支持这格式的，需要在dd后面加个T，这个是固定格式。上面最后一个里大写的”Z”表示时区。

如果直接插入”yyyy-MM-dd HH:mm:ss”格式会报错。

8.4.3 如果你就是想用yyyy-MM-dd HH:mm:ss要怎么设置。
date类型，还支持一个参数format，它让我们可以自己定制化日期格式。
比如format配置了“格式A||格式B||格式C”，插入一个值后，会从左往右匹配，直到有一个格式匹配上。

# 创建索引
PUT /test_date_index
{
  "mappings": {
    "properties": {
      "birthday": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}

#2020/03/01 17:44:09的毫秒级时间戳
PUT /test_date_index/_doc/1
{
  "birthday": 1583055849000
}

PUT /test_date_index/_doc/2
{
  "birthday": "2020-03-01 16:29:41"
}

PUT /test_date_index/_doc/3
{
  "birthday": "2020-02-29"
}
#上面3条语句都可以保存成功

8.5 数字：Numeric

为了提高性能和减少存储空间，选择一个满足存放你数据的类型就可以，没有必要选择过长的类型。比如各地人口数量，一般用integer存储足够了，没有必要使用long类型。

类型	说明
byte	8位，-128 ~ 127
short	16位，-32768 ~ 32767
integer	32位，-231 ~ 231-1
long	64位，-263 ~ 263-1
float	单精度、32位、符合IEEE 754标准的浮点数
double	双精度、64位、符合IEEE 754标准的浮点数
half_float	16位半精度IEEE 754浮点类型
scaled_float	缩放类型的的浮点数

8.6 二进制类型：binary

ES能接受以Base64编码的二进制值，binary字段是不会被分析存储和检索的。因为它的值就是一巨长的乱码，对它分析毫无意义，它只是被原模原样的存储。

工作中可能用binary存储图像，但情况也不多，用ES存图像不是很好的选择。

8.7 数值范围：range

8.7.1 简介

range字段类型表示一个字段的值是一个数值范围.
例如一个range类型的字段的值是[10, 20], 那么用12来匹配该字段,则会命中该文档.
range字段类型如下:

类型	说明
integer_range	[-231, 231-1 ]
long_range	[-263, 263-1]
float_range	单精度、符合IEEE 754标准的浮点数的范围
double_range	双精度、符合IEEE 754标准的浮点数的范围
date_range	日期范围, 在es内部,日期以64位long类型表示的毫秒数存储
ip_range	IP范围, 支持IPv4和IPv6的范围

区间的开闭
- 设定一个范围可以有多种选择, 可以包含边界值, 也可以不包含边界值, 还可以一个边界是无界限的.
- 定义的时候可以用gt, gte, lt, lte表示左右界限的开闭.

下面举例说明:

类型	说明	表达
左闭, 右闭	[10, 20]	gte: 10, lte: 20
左闭, 右开	[10, 20)	gte: 10, lt: 20
左闭, 右无限大	[10, +∞)	gte: 10

8.7.2 准备测试数据

# 创建一个新的索引
PUT test_range
{
  "mappings": {
    "properties": {
      "range_of_integer": {
        "type": "integer_range"
      },
      "range_of_date": {
        "type": "date_range", 
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}
# 新增测试文档
PUT test_range/_doc/1
{
  "range_of_integer": {
    "gte": 10,
    "lte": 20
  },
  "range_of_date": {
    "gte": "2021-05-01 10:20:00",
    "lte": "2021-05-02"
  }
}
PUT test_range/_doc/2
{
  "range_of_integer": {
    "gt": 10,
    "lt": 20
  },
  "range_of_date": {
    "gt": "2021-05-01 10:20:00",
    "lt": "2021-05-02"
  }
}
PUT test_range/_doc/3
{
  "range_of_integer": {
    "gt": 20
  },
  "range_of_date": {
    "gt": "2021-05-01 10:20:00"
  }
}

8.7.3 使用term查询匹配

# 查询包含10的文档,
GET test_range/_search
{
  "query": {
    "term": {
      "range_of_integer": {
        "value": 10
      }
    }
  }
}

# 查询日期范围包含"2021-05-01 10:20:00"的数据
GET test_range/_search
{
  "query": {
    "term": {
      "range_of_date": {
        "value": "2021-05-01 10:20:00"
      }
    }
  }
}

8.7.4 使用range查询匹配

range查询有个参数relation,它可以有3个值INTERSECTS, CONTAINS,WITHIN.

类型	说明
INTERSECTS	默认设置, 有交集的文档才匹配
CONTAINS	目标文档的值的范围包含查询条件
WITHIN	查询条件的范围包含目标文档的值的范围

GET test_range/_search
{
  "query": {
    "range": {
      "range_of_integer": {
        "gte": 18,
        "lte": 20,
        "relation" : "INTERSECTS"
      }
    }
  }
}

GET test_range/_search
{
  "query": {
    "range": {
      "range_of_integer": {
        "gte": 18,
        "lte": 20,
        "relation" : "CONTAINS"
      }
    }
  }
}

GET test_range/_search
{
  "query": {
    "range": {
      "range_of_integer": {
        "gte": 18,
        "lte": 20,
        "relation" : "WITHIN"
      }
    }
  }
}

8.8 object

elasticsearch支持某个字段存储一个JSON对象, 这个字段类型就是object.

8.9 nested

9.reindex 重建索引(备份数据)

reindex不会复制index的设置，所以得先设置好目标索引的setting和mapping，然后再index。

9.1 简单的reindex

# source里是源index，dest里是目标索引
POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test2"
  }
}

9.2 只创建目标索引中缺少的文档

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test2",
    "op_type": "create"
  }
}

9.3 设置批次大小

# reindex底层是scroll，默认批次是1000条，可以设置多点
POST _reindex
{
  "source": {
    "index": "test",
    "size": 5000
  },
  "dest": {
    "index": "test2"
  }
}

9.4 遇到冲突继续

POST _reindex
{
  "conflicts": "proceed", 
  "source": {
    "index": "test"
  },
  "dest": 
    "index": "test2",
    "op_type": "create"
  }
}

9.5 只reindex符合条件的数据

POST _reindex
{
  "source": {
    "index": "test",
    "query": {
      "term": {
        "name.keyword": {
          "value": "lxy"
        }
      }
    }
  },
  "dest": {
    "index": "test2"
  }
}

9.6 只同步源index里部分字段

POST _reindex
{
  "source": {
    "index": "test",
    "_source": ["name", "age"]
  },
  "dest": {
    "index": "test2"
  }
}

9.7 屏蔽掉不想同步的字段

POST _reindex
{
  "source": {
    "index": "test",
    "_source": {
      "excludes": ["name"]
    }
  },
  "dest": {
    "index": "test2"
  }
}

9.8 用script脚本在同步时做数据处理

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test2"
  },
  "script": {
    "source": "ctx._source.age += 2",
    "lang": "painless"
  }
}

9.9 字段重新命名

# 同样是用script，将name属性重命名为newName
POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test2"
  },
  "script": {
    "source": "ctx._source.newName = ctx._source.remove(\"name\")",
    "lang": "painless"
  }
}