Elasticsearch 计数分词中的token使用实例

时间:2023-02-03 Elasticsearch 人气:0

正文

在我们针对 text 类型的字段进行分词时，分词器会把该字段分解为一个个的 token。如果你对分词器还不是很理解的话，请参考我之前的文章 “Elasticsearch: analyzer”。在分词时，有一个叫做 token_count 的类型。该类型是 token 的计数器，也就是说，我们可以使用它来了解在索引字段时在字符串中生成的 token 数量。

我们下面用一个比较简单的例子来进行展示。在我们的示例中，我们将索引一些书名，并且我们将过滤标题中只有 2 个 token 的书。

`
 PUT book_token_count_test
 {
   "mappings": {
     "properties": {
       "book_name": {
         "type": "text",
         "fields": {
           "size": {
             "type": "token_count",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
`![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)

使用命令写入文档

我们使用如下的命令来写入一下文档：

  POST book_token_count_test/_bulk
  {"index":{}}
  { "book_name": "Ulysses" }
  {"index":{}}
  { "book_name": "Don Quixote" }
  {"index":{}}
  { "book_name": "One Hundred Years of Solitude" }

搜索 token 文档

我们使用如下的命令来搜索 token 数为 2 的文档：

 GET book_token_count_test/_search
 {
   "query": {
     "term": {
       "book_name.size": {
         "value": "2"
       }
     }
   }
  }

上面搜索的结果为：

`
 {
   "took": 273,
   "timed_out": false,
   "_shards": {
     "total": 1,
     "successful": 1,
     "skipped": 0,
     "failed": 0
   },
    "hits": {
      "total": {
        "value": 1,
        "relation": "eq"
      },
      "max_score": 1,
      "hits": [
        {
          "_index": "book_token_count_test",
          "_id": "cxczBoYB6OPboMnB7TQu",
          "_score": 1,
          "_source": {
            "book_name": "Don Quixote"
          }
        }
      ]
    }
  }
`![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)

我们可以使用 range 查询来检索 book_name 中包含 3 个以上 token 的文档，我们只会得到标题为 “One Hundred Years of Solitude” 的文档。

  GET book_token_count_test/_search
  {
    "query": {
      "range": {
        "book_name.size": {
          "gte": 3
        }
      }
    }
  }

上面搜索的结果为：

`
  {
    "took": 1,
    "timed_out": false,
    "_shards": {
      "total": 1,
      "successful": 1,
      "skipped": 0,
      "failed": 0
    },
    "hits": {
      "total": {
        "value": 1,
        "relation": "eq"
      },
      "max_score": 1,
      "hits": [
        {
          "_index": "book_token_count_test",
          "_id": "dBczBoYB6OPboMnB7TQu",
          "_score": 1,
          "_source": {
            "book_name": "One Hundred Years of Solitude"
          }
        }
      ]
    }
  }
`![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)

加载全部内容