Elasticsearch(一)

发表于 2024-12-25 更新于 2025-02-20 本文字数： 12k 阅读时长 ≈ 11 分钟

Elasticsearch(一)

Elasticsearch搜索引擎的黑马学习笔记

原文：‍‬⁠⁠⁠‬‌‌‍‬‌‬⁠‬‬‌⁠‌day08-Elasticsearch - 飞书云文档 (feishu.cn)

一、引入

Elasticsearch是由elastic公司开发的一套搜索引擎技术，它是elastic技术栈中的一部分。

Elasticsearch：用于数据存储、计算和搜索
Logstash/Beats：用于数据收集
Kibana：用于数据可视化

整套技术栈的核心就是用来存储、搜索、计算的Elasticsearch，因此我们接下来学习的核心也是Elasticsearch。

Kibana提供了一个开发控制台（DevTools），在其中对Elasticsearch的Restful的API接口提供了语法提示

所以我们需要安装Elasticsearch以及Kibana

1、安装Elasticsearch

docker run -d \
  --name es \
  -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \ 
  -e "discovery.type=single-node" \ #单机使用
  -v es-data:/usr/share/elasticsearch/data \
  -v es-plugins:/usr/share/elasticsearch/plugins \
  --privileged \ #授予系统权限
  --network hm-net \
  -p 9200:9200 \
  -p 9300:9300 \
  elasticsearch:7.12.1

打开控制台，查看Elasticsearch基本信息

2、安装Kibana

docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \ #指定Elasticsearch的地址
--network=hm-net \
-p 5601:5601  \
kibana:7.12.1

进入Kibana的DevTools工具, 执行**Get /**指令，也可以查看Elasticsearch基本信息

3、倒排索引

Elasticsearch之所以有如此高性能的搜索表现，正是得益于底层的倒排索引技术。

倒排索引中有两个非常重要的概念：

文档（Document）：用来搜索的数据，其中的每一条数据就是一个文档。例如一个网页、一个商品信息
词条（Term）：对文档数据或用户搜索数据，利用某种算法分词，得到的具备含义的词语就是词条。例如：我是中国人，就可以分为：我、是、中国人、中国、国人这样的几个词条

创建倒排索引：

将文档数据利用分词算法分词一个个的词条
创建表，每行数据包含词条、包含此词条的文档id、位置等等信息
由于词条唯一，可以创建正向索引

词条（索引）	文档id
小米	1，3，4
手机	1，2
华为	2，3
充电器	3
手环	4

正向索引

优点：

可以给多个字段创建索引
有索引的字段搜索特别快

缺点：

非索引字段搜索特别慢

倒排索引

优点：

由于是根据词条搜索，进行模糊搜索的时候，特别快

缺点

只能根据词条创建索引，字段不行
无法根据字段排序

4、基础概念

文档：原本数据库数据表中的一行数据，在Elasticsearch中就一个文档。

字段：数据表中某一列，就是某个文档的字段

索引：同类型的文档放在一起，就叫做索引(Index)

映射：作为索引表中字段约束信息。类似表的结构约束

MySQL与Elasticsearch对比

MySQL	Elasticsearch	说明
Table	Index	索引(index)，就是文档的集合，类似数据库的表(table)
Row	Document	文档（Document），就是一条条的数据，类似数据库中的行（Row），文档都是JSON格式
Column	Field	字段（Field），就是JSON文档中的字段，类似数据库中的列（Column）
Schema	Mapping	Mapping（映射）是索引中文档的约束，例如字段类型约束。类似数据库的表结构（Schema）
SQL	DSL	DSL是elasticsearch提供的JSON风格的请求语句，用来操作elasticsearch，实现CRUD

5、IK分词器

对文档内容的分词，需要优秀的分词算法，IK分词器就是一种高效，准确的中文分词算法。

创建倒排索引时，对文档分词
用户搜索时，对输入的内容分词

在线安装：

1	docker exec -it es ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.12.1/elasticsearch-analysis-ik-7.12.1.zip

离线安装：

之前将elasticsearch的插件挂载到了/var/lib/docker/volumes/es-plugins/_data。我们将下载下来的IK分词器上传到这个目录即可。

安装完后都需要重启ES

1、使用

可以知道官方的分词器对中文的分词支持不好。我们可以试试IK分词器

ik_smart：智能切分，粗粒度
ik_max_word：最细切分，细粒度

POST /_analyze
{
  "analyzer": "ik_smart",
  "text": "黑马程序员学习java太棒了"
}

结果非常好。不展示了。

2、拓展

再厉害的分词算法，也抵不过热梗的出现。蔡徐坤就没办法识别。会被分成3块。

IK分词器有拓展分词功能。

1、我们打开虚拟机目录

2、点击IKAnalyzer.cfg.xml配置分词器词典

3、在词典写下分好的词

二、索引库操作

Mapping映射常见类型

type：字段数据类型，常见的简单类型有：
- 字符串：text（可分词的文本）、keyword（精确值，例如：品牌、国家、ip地址）
- 数值：long、integer、short、byte、double、float、
- 布尔：boolean
- 日期：date
- 对象：object
index：是否创建索引，默认为true
analyzer：使用哪种分词器
properties：该字段的子字段

1、创建索引库和映射

PUT /sy
{
  "mappings": { 
    "properties": {
      "info":{ 
        "type": "text", 
        "analyzer": "ik_smart"
      },
      "email":{
        "type": "keyword",
        "index": "false"
      },
      "name":{
        "properties": {
          "firstName": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

2、查询索引库

GET /sy

得到索引库的一些信息。

3、修改索引库

由于重新创建倒排索引非常麻烦。因此索引库一旦创建，无法修改mapping。但是可以增加新的字段到mapping中

PUT /sy/_mapping
{
  "properties": {
    "age":{
      "type": "integer"
    }
  }
}

4、删除索引库

1	DELETE /sy

三、文档操作

1、新增文档

向索引库中添加文档

POST /sy/_doc/1
{
    "info": "黑马程序员Java讲师",
    "email": "zy@itcast.cn",
    "name": {
        "firstName": "云",
        "lastName": "赵"
    }
}

这样的话就在sy索引中添加了一个id为1的文档。文档中的内容需要符合mapping格式

2、查询文档

1	GET /sy/_doc/1

3、删除文档

1	DELETE /sy/_doc/1

4、修改文档

1.全量修改 先删除再添加

PUT /sy/_doc/1
{
    "info": "黑马程序员高级Java讲师",
    "email": "zy@itcast.cn",
    "name": {
        "firstName": "云",
        "lastName": "赵"
    }
}

2.局部修改

POST /sy/_update/1
{
  "doc": {
    "email": "ZhaoYun@itcast.cn"
  }
}

只修改email字段

5、批处理

POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

_index：指定索引库名

_id：指定要操作的文档id

index代表新增操作
- { "field1" : "value1" }：则是要新增的文档内容
delete代表删除操作
update代表更新操作
- { "doc" : {"field2" : "value2"} }：要更新的文档字段

批量新增：

POST /_bulk
{"index": {"_index":"heima", "_id": "3"}}
{"info": "黑马程序员C++讲师", "email": "ww@itcast.cn", "name":{"firstName": "五", "lastName":"王"}}
{"index": {"_index":"heima", "_id": "4"}}
{"info": "黑马程序员前端讲师", "email": "zhangsan@itcast.cn", "name":{"firstName": "三", "lastName":"张"}}

批量删除：

1
2
3

POST /_bulk
{"delete":{"_index":"heima", "_id": "3"}}
{"delete":{"_index":"heima", "_id": "4"}}

四、RestAPI

1、客户端操作ES

ES官方提供了各种不同语言的客户端，用来操作ES。这些客户端的本质就是组装DSL语句，通过http请求发送给ES。https://www.elastic.co/guide/en/elasticsearch/client/index.html

可以用UI操作ES

2、Java代码操作ES

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>

因为SpringBoot默认的ES版本是7.17.10，所以我们需要覆盖默认的ES版本：

<properties>
    <maven.compiler.source>11</maven.compiler.source>
    <maven.compiler.target>11</maven.compiler.target>
    <elasticsearch.version>7.12.1</elasticsearch.version>
</properties>

初始化RestHighLevelClient

1
2
3

RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
        HttpHost.create("http://192.168.150.101:9200")
));

创建一个测试类

public class IndexTest {

    private RestHighLevelClient client;

    @BeforeEach //在之前执行
    void setUp() { //初始化RestHighLevelClient
        this.client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.150.101:9200")
        ));
    }

    @Test
    void testConnect() {
        System.out.println(client);
    }

    @AfterEach //在之后执行
    void tearDown() throws IOException {
        this.client.close(); //关闭连接
    }
}

3、创建索引库

不是一股脑将Mysql的字段变成Mapping映射。我们先应该分析哪些需要搜索。

创建索引API

创建Request参数

因为是创建索引库的操作，因此Request是CreateIndexRequest
添加请求参数
- 其实就是Json格式的Mapping映射参数。因为json字符串很长，这里是定义了静态字符串常量MAPPING_TEMPLATE，让代码看起来更加优雅。
发送请求

client.``indices``()方法的返回值是IndicesClient类型，封装了所有与索引库操作有关的方法。例如创建索引、删除索引、判断索引是否存在等

@Test
void testCreateIndex() throws IOException {
    //1、创建索引库
    CreateIndexRequest request = new CreateIndexRequest("items");
    //2、准备请求参数
    request.source(MAPPING_TEMPLATE, XContentType.JSON);
    //3、发送请求
    client.indices().create(request, RequestOptions.DEFAULT);
}

private static final  String MAPPING_TEMPLATE = "{\n" +
        "  \"mappings\": {\n" +
        "    \"properties\": {\n" +
        "      \"id\":{\n" +
        "        \"type\": \"keyword\"\n" +
        "      },\n" +
        "      \"name\":{\n" +
        "        \"type\": \"text\"\n" +
        "        , \"analyzer\": \"ik_smart\"\n" +
        "      },\n" +
        "      \"price\":{\n" +
        "        \"type\":\"integer\"\n" +
        "      },\n" +
        "      \"image\":{\n" +
        "        \"type\": \"keyword\"\n" +
        "        ,\"index\": false\n" +
        "      },\n" +
        "      \"category\":{\n" +
        "        \"type\": \"keyword\"\n" +
        "      },\n" +
        "      \"brand\":{\n" +
        "        \"type\": \"keyword\"\n" +
        "      },\n" +
        "      \"sold\":{\n" +
        "        \"type\": \"integer\"\n" +
        "      },\n" +
        "      \"commentCount\":{\n" +
        "        \"type\":\"integer\"\n" +
        "        , \"index\": false\n" +
        "        \n" +
        "      },\n" +
        "      \"isAD\":{\n" +
        "        \"type\": \"boolean\"\n" +
        "      },\n" +
        "      \"updateTime\":{\n" +
        "        \"type\":\"date\"\n" +
        "      }\n" +
        "    }\n" +
        "  }\n" +
        "}";

4、删除索引库

@Test
void testDeleteIndex() throws IOException {
    // 1.创建Request对象
    DeleteIndexRequest request = new DeleteIndexRequest("items");
    // 2.发送请求
    client.indices().delete(request, RequestOptions.DEFAULT);
}

5、判断索引库是否存在

@Test
void testExistsIndex() throws IOException {
    // 1.创建Request对象
    GetIndexRequest request = new GetIndexRequest("items");
    // 2.发送请求
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
    // 3.输出
    System.err.println(exists ? "索引库已经存在！" : "索引库不存在！");
}

索引库操作的基本步骤：

初始化RestHighLevelClient
创建XxxIndexRequest。XXX是Create、Get、Delete
准备请求参数（ Create时需要，其它是无参，可以省略）
发送请求。调用**RestHighLevelClient#indices().xxx()**方法，xxx是create、exists、delete

五、RestClient

1、新增文档

首先要创建文档doc对应的实体类
查询数据库得到数据库实体类，将实体类转为doc对应的实体类
将doc对应的实体类转为JSON，插入。

@Test
void testIndexDoc() throws IOException {
    //0、准备文档数据
    //0.1 根据id查询数据库
    Item item = itemService.getById(1533902L);
    //0.2 把数据转为文档数据
    ItemDoc itemDoc = BeanUtil.copyProperties(item, ItemDoc.class);

    //1、准备Request
        IndexRequest request = new IndexRequest("items").id(itemDoc.getId());
    //2、准备请求参数
    request.source(JSONUtil.toJsonStr(itemDoc),XContentType.JSON);
    //3、发送请求

    client.index(request,RequestOptions.DEFAULT);
}

2、查询文档

@Test
void testGetDocumentById() throws IOException {
    // 1.准备Request对象
    GetRequest request = new GetRequest("items").id("100002644680");
    // 2.发送请求
    GetResponse response = client.get(request, RequestOptions.DEFAULT);
    // 3.获取响应结果中的source
    String json = response.getSourceAsString();
    
    ItemDoc itemDoc = JSONUtil.toBean(json, ItemDoc.class);
    System.out.println("itemDoc= " + ItemDoc);
}

3、删除文档

@Test
void testDeleteDoc() throws IOException {

    //1、准备request
    DeleteRequest request = new DeleteRequest("items", "1533902");
    //2、发送请求
    client.delete(request,RequestOptions.DEFAULT);
}

4、修改文档

@Test
void testUpdateDoc() throws IOException {

    //1、准备request
    UpdateRequest request = new UpdateRequest("items", "1533902");
    //2、准备请求参数
    request.doc(
            "price",25600
    );
    //2、发送请求
    client.update(request,RequestOptions.DEFAULT);
}

5、批量导入

创建Request，用BulkRequest
准备请求参数
发送请求，用client.bulk()

BulkRequest本身没有请求参数，本质是将多个普通的CRUD请求组合在一起发送。因此BulkRequest中有add方法可以添加文档的CRUD请求。

@Test
void testBulkDoc() throws IOException {

    //分页查询。一页500个
    int pageNo = 1,pageSize=500;
    while (true){
        //1、准备文档数据
        Page<Item> page = itemService.lambdaQuery()
                .eq(Item::getStatus, 1)  //只查询启用的商品
                .page(Page.of(pageNo, pageSize)); //分页

        List<Item> records = page.getRecords();
        if(records==null || records.isEmpty()){
            return;
        }
        //2、准备Request
        BulkRequest request = new BulkRequest();
        //3、准备请求参数

        
        //循环处理查询数据库的结果
        for (Item item : records) {

            ItemDoc itemDoc = BeanUtil.copyProperties(item, ItemDoc.class);

            request.add(new IndexRequest("items") //不断添加
                    .id(item.getId().toString())
                    .source(JSONUtil.toJsonStr(itemDoc),XContentType.JSON));
        }

        //4、发送请求
        client.bulk(request,RequestOptions.DEFAULT);

        //5、翻页
        pageNo++;

    }
}

文档操作的基本步骤：

初始化RestHighLevelClient
创建XxxRequest。
- XXX是Index、Get、Update、Delete、Bulk
准备参数（Index、Update、Bulk时需要）
发送请求。
- 调用RestHighLevelClient#.xxx()方法，xxx是index、get、update、delete、bulk
解析结果（Get时需要）