elasticsearch 生产环境配置

Posted on 2016-12-01 Edited on 2020-09-16 In linux , elastic

elasticsearch 简介

ElasticSearch是一个基于Lucene构建的开源，分布式，RESTful搜索引擎;设计用于云计算；能够达到实时搜索，稳定，可靠，快速。

ElasticSearch的一些概念:

集群 (cluster)

在一个分布式系统里面,可以通过多个elasticsearch运行实例组成一个集群,这个集群里面有一个节点叫做主节点(master),elasticsearch是去中心化的,所以这里的主节点是动态选举出来的,不存在单点故障。

在同一个子网内，只需要在每个节点上设置相同的集群名,elasticsearch就会自动的把这些集群名相同的节点组成一个集群。节点和节点之间通讯以及节点之间的数据分配和平衡全部由elasticsearch自动管理。
在外部看来elasticsearch就是一个整体。

节点(node)

每一个运行实例称为一个节点,每一个运行实例既可以在同一机器上,也可以在不同的机器上.所谓运行实例,就是一个服务器进程.在测试环境内,可以在一台服务器上运行多个服务器进程,在生产环境建议每台服务器运行一个服务器进程。

索引(index)

这里的索引是名词不是动词,在elasticsearch里面支持多个索引。类似于关系数据库里面每一个服务器可以支持多个数据库一样。在每一索引下面又支持多种类型，类似于关系数据库里面的一个数据库可以有多张表。但是本质上和关系数据库有很大的区别。这里暂时可以这么理解。

分片(shards)

把一个索引分解为多个小的索引，每一个小的索引叫做分片。分片后就可以把各个分片分配到不同的节点中。

副本(replicas)

每一个分片可以有0到多个副本，每个副本都是分片的完整拷贝，可以用来增加速度，同时也可以提高系统的容错性，一旦某个节点数据损坏，其他节点可以代替他。

实验环境介绍

实验环境规划

主机名	ip address	操作系统	职责
linux-node1.example.com	192.168.56.11	centos7	elastic-master，nfs-server
linux-node2.example.com	192.168.56.12	centos7	elastic-slave

系统版本环境

[root@linux-node1 mount10:20:25]#uname -r
3.10.0-229.el7.x86_64
[root@linux-node1 mount12:07:36]#uname -m
x86_64
[root@linux-node1 mount12:07:41]#cat  /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)

elastic安装过程

安装elastic软件

先安装java环境

使用saltstack快速安装java环境

开始安装elastic

cd /usr/local/src/
curl -L -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.1.1/elasticsearch-2.1.1.tar.gz
tar -zxf elasticsearch-2.1.1.tar.gz
mv elasticsearch-2.1.1 /usr/local/
cd ../
ln -s elasticsearch-2.1.1/ elastic

以上配置linux-node1,linux-node2 都执行一遍

elasticsearch yum 安装

rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
cat >/etc/yum.repos.d/elasticsearch.repo <<EOF
[elasticsearch-2.x]
name=Elasticsearch repository for 2.x packages
baseurl=https://packages.elastic.co/elasticsearch/2.x/centos
gpgcheck=1
gpgkey=https://packages.elastic.co/GPG-KEY-elasticsearch
enabled=1
EOF
yum install elasticsearch

编辑elastic的配置文件

完整的配置文件

elastic主节点配置

grep '^[a-z]' /usr/local/elastic/config/elasticsearch.yml
cluster.name: biglittleant
node.name: "linux-node1"
index.number_of_shards: 5
index.number_of_replicas: 1
path.conf: /usr/local/elastic/config
path.data: /usr/local/elastic/data
path.work: /usr/local/elastic/work
path.logs:  /usr/local/elastic/logs
path.plugins: /usr/local/elastic/plugins
bootstrap.mlockall: true
transport.tcp.port: 9300
http.port: 9200
network.host: 192.168.56.11
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.56.11", “192.168.56.12"]
path.repo: ["/data/mount/"]

elastic从节点配置

cluster.name: biglittleant
node.name: "linux-node2"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.56.11", “192.168.56.12"]
path.repo: ["/data/mount/"]

elastic单节点配置

cluster.name: biglittleant
node.name: "linux-node1"
index.number_of_shards: 1
index.number_of_replicas: 0##单节点复制集要为0
path.conf: /usr/local/elastic/config
path.data: /usr/local/elastic/data
path.work: /usr/local/elastic/work
path.logs:  /usr/local/elastic/logs
path.plugins: /usr/local/elastic/plugins
bootstrap.mlockall: true
http.port: 9200
network.host: 192.168.56.11
path.repo: ["/data/mount/"]

elasticsearch配置文件解释

cluster.name: biglittleant##集群节点的名称，一旦配置后不能更改。
node.name: "linux-node1"#当前节点的名称
index.number_of_shards: 5##索引分几个分片。
index.number_of_replicas: 1##创建几个副本。
path.conf: /usr/local/elastic/config##config 存放的位置
path.data: /usr/local/elastic/data ##数据存放的位置
#path.data: /path/to/data1,/path/to/data2 ###可以配置多个路径。
path.work: /usr/local/elastic/work##临时文件存放路径
path.logs:  /usr/local/elastic/logs##日志存放的路径
path.plugins: /usr/local/elastic/plugins##插件存放的位置
bootstrap.mlockall: true #锁住内存
#transport.tcp.port: 9300 ###集群交互的端口。
http.port: 9200 ##对外的端口
network.host: 192.168.56.11#监听的网络，如果不配置默认为127.0.0.1
discovery.zen.ping.multicast.enabled: false##禁用组播
discovery.zen.ping.unicast.hosts: ["192.168.56.11", “192.168.56.12"]#集群服务器的ip列表
path.repo: ["/data/mount/"]#集群的备份仓库

安装elastic的服务器管理插件

使用head插件来查看索引数据

1 2	/usr/local/elastic/bin/plugin install mobz/elasticsearch-head http://192.168.56.11:9200/_plugin/head/

使用kopf来备份集群节点

1 2	/usr/local/elastic/bin/plugin install lmenezes/elasticsearch-kopf http://192.168.56.11:9200/_plugin/kopf/

使用bigdesk查看集群性能

1 2	/usr/local/elastic/bin/plugin install hlstudio/bigdesk http://192.168.56.11:9200/_plugin/bigdesk/

安装中文分词插件

第一步编译分词插件：

yum install maven
git cloen https://github.com/medcl/elasticsearch-analysis-ik
cd elasticsearch-analysis-ik
mvn package
#编译完成：
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/2.0.1/plexus-utils-2.0.1.jar (217 KB at 43.5 KB/sec)
[INFO] Reading assembly descriptor: /usr/local/src/elasticsearch-analysis-ik-1.10.0/src/main/assemblies/plugin.xml
[INFO] Building zip: /usr/local/src/elasticsearch-analysis-ik-1.10.0/target/releases/elasticsearch-analysis-ik-1.10.0.zip
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16:26.127s
[INFO] Finished at: Mon Oct 10 16:58:59 CST 2016
[INFO] Final Memory: 25M/61M
[INFO] ------------------------------------------------------------------------

## 将编译好的IK分词拷贝到plugins目录中。
cd target/releases/
unzip elasticsearch-analysis-ik-1.10.0.zip -d ik
mv elasticsearch-analysis-ik-1.10.0 /data/app/elastic/plugins/ik

## 修改ES的配置文件，打开config/elasticsearch.yml,在最后添加配置

index.analysis.analyzer.default.type: ik

第三步：重启elastic服务器，并查看日志

1
2
3

systemctl restart elasticsearch
# 重启elasticsearch即可,重启会看到以下plugins信息
[2016-03-14 23:47:33,184][INFO ][plugins                  ] [Lightspeed] modules [lang-expression, lang-groovy], plugins [elasticsearch-analysis-ik], sites []

安装analysis-pinyin的中文插件（可以不安装，步骤跟安装ik分词插件一样）

git clone https://github.com/medcl/elasticsearch-analysis-pinyin.git
cd elasticsearch-analysis-pinyin
mvn clean install -Dmaven.test.skip
#复制target/releases目录下的*-pinyin.zip并解压到elasticsearch/plugins/

index.analysis.analyzer.default.type: keyword

测试分词插件的效果

http://192.168.0.211:9200/_analyze?analyzer=ik&pretty=true&text=helloworld,欢迎你

关于分词器的详细解释可以参考文章最后elastic分词器详细解释。

服务器启动前调优

常见优化参数

提高索引性能和速度从几下方面着手：

增大索引实时时间设置：index.engine.robin.refresh_interval :10s (默认为1s) 。
增大内存缓冲区： indices.memory.index_buffer_size:20% (默认为heap大小的10%)。
增加translog方面的设置： index.translog.flush_threshold:10000 (默认为5000）。
增加分配给ES的内存，默认为1g。
减小replaca. 索引时可设置为0. 完成索引后再设置成想要的。
增加机器数。
index.merge.policy.use_compound_file 设置为false. 这样的话，可以减少Merge （保证open file size 够大）。

配置汇总

## 第一部分
index.analysis.analyzer.default.type: ik
index.cache.field.type: soft
index.cache.field.max_size: 50000
index.cache.field.expire: 5m

## 第二部分
index.cache.query.enable: true
indices.cache.query.size: 5%

## 第三部分
index.search.slowlog.level: TRACE

## 第四部分
index.store.compress.stored: true
index.store.compress.tv: true

## 第五部分
#indices.store.throttle.type: none
indices.store.throttle.max_bytes_per_sec: 100mb
#index.routing.allocation.total_shards_per_node: 2
#script.disable_dynamic: false

配置解释

第一部分

1. 设置es的缓存类型为Soft Reference，它的主要特点是据有较强的引用功能。只有当内存不够的时候，才进行回收这类内存，因此在内存足够的时候，它们通常不被回收。另外，这些引 用对象还能保证在Java抛出OutOfMemory 异常之前，被设置为null。它可以用于实现一些常用图片的缓存，实现Cache的功能，保证最大限度的使用内存而不引起OutOfMemory。在es的配置文件加上index.cache.field.type: soft即可。
2. 上index.cache.field.type: soft ## 最大限度使用内存。
3. index.cache.field.max_size: 50000## es最大缓存数据条数。
4. index.cache.field.expire: 10m ##把过期时间设置成10分钟。

第二部分

1
2
3

index.cache.query.enable: true ##默认配置是false。

indices.cache.query.size: 2%  ## 默认配置 1%。

第三部分

1	index.search.slowlog.level: TRACE

慢查询的级别：TRACE，表示追踪模式。还可以设置成info模式。

第四部分-索引相关

1
2
3

index.store.compress.stored: true

index.store.compress.tv: true

在elasticsearch.yml设置这两个属性可压缩数据文件，极大的减少文件的大小。

第五部分-硬盘写入速率的设置

#indices.store.throttle.type: none

indices.store.throttle.max_bytes_per_sec: 100mb

#index.routing.allocation.total_shards_per_node: 2

#script.disable_dynamic: false

写入磁盘的速率，默认是20m/s 适用于机械硬盘，100m/s-200m/s 适用于SSD硬盘。参考文档

配置NFS来存放elastic的备份

#linux-node1上执行
yum install nfs-utils rpcbind -y
cat >> /etc/exports<<EOF
/data/backup 192.168.56.0/24(rw,sync,all_squash)
EOF
mkdir /data/{backup,mount}  -p
chown -R nfsnobody.nfsnobody /data/backup
systemctl start rpcbind
systemctl start nfs
mount.nfs 192.168.56.11:/data/backup /data/mount/
#linux-node2上执行
yum install nfs-utils rpcbind -y
mkdir /data/mount -p
mount.nfs 192.168.56.11:/data/backup /data/mount/

java参数的调优

Heap不要超过系统可用内存的一半，并且不要超过32GB。JVM参数呢？对于初级用户来说，并不需要做特别调整，仍然遵从官方的建议，将xms和xmx设置成和heap一样大小，避免动态分配heap size就好了。虽然有针对性的调整JVM参数可以带来些许GC效率的提升，当有一些“坏”用例的时候，这些调整并不会有什么魔法效果帮你减轻heap压力，甚至可能让问题更糟糕。

vim /usr/local/elastic/bin/elasticsearch.in.sh
if [ "x$ES_MIN_MEM" = "x" ]; then
    ES_MIN_MEM=256m
fi
if [ "x$ES_MAX_MEM" = "x" ]; then
    ES_MAX_MEM=256m
fi
#虚拟机环境，所以配置了256M的内存，实际物理机器根据内存大小动态调节。

elastic 开启jmx 监控

/usr/local/elastic/bin/elasticsearch.in.sh
JMX_PORT=9305
JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"
JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.ssl=false"
JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.authenticate=false"
JAVA_OPTS="$JAVA_OPTS -Djava.rmi.server.hostname=192.168.56.11"

服务器优化

sysctl -w vm.max_map_count=262144##生产上一定要打开文件描述符。
mkdir /usr/local/elastic/{data,logs,work,plugins} -p##创建相应的目录
useradd elastic ##创建启动用户
chown -R elastic.elastic /usr/local/elasticsearch-2.1.1/

启动elastic服务

su -c '/usr/local/elastic/bin/elasticsearch -d ' elastic
##启动服务
##查看端口是否存在
ss -lntup |grep 9300
tcp    LISTEN     0      50                    :::9300                 :::*      users:(("java",9757,56))
# ss -lntup |grep 9200
tcp    LISTEN     0      50                    :::9200                 :::*      users:(("java",9757,94))

##curl 查看结果 ：
curl http://192.168.56.11:9200
{
  "status" : 200,
  "name" : "linux-node1",
  "cluster_name" : "biglittleant",
  "version" : {
    "number" : "1.7.1",
    "build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
    "build_timestamp" : "2015-07-29T09:54:16Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

管理集群配置

查看集群设置

1	curl -XGET http://10.10.160.129:9200/_cluster/settings

停止分片同步

curl -XPUT http://10.10.160.129:9200/_cluster/settings -d '{
  "transient" : {
    "cluster.routing.allocation.enable" : "none"
  }
}'

启动分片同步

curl -XPUT http://10.10.160.129:9200/_cluster/settings -d '{
  "transient" : {
    "cluster.routing.allocation.enable" : "all"
  }
}'

备份elasticsearch 数据

先导入一些数据进行备份

1
2
3

curl -XPOST 'http://192.168.56.11:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
curl -XPOST 'http://192.168.56.11:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json
curl -XPOST 'http://192.168.56.11:9200/_bulk?pretty' --data-binary @logs.jsonl

使用API创建一个镜像仓库

curl -XPOST http://192.168.56.11:9200/_snapshot/my_backup -d '
{
    "type": "fs",
    "settings": {
        "location": "/data/mount"
        "compress":  true
    }
}'
##解释：
镜像仓库的名称：my_backup
镜像仓库的类型：fs。还支持curl，hdfs等。
镜像仓库的位置：/data/mount 。这个位置必须在配置文件中定义。
是否启用压缩：compres：true 表示启用压缩。

备份前检查配置

必须确定备份使用的目录在配置文件中声明了，否则会爆如下错误

{
  "error": {
    "root_cause": [
      {
        "type": "repository_exception",
        "reason": "[test-bakcup] failed to create repository"
      }
    ],
    "type": "repository_exception",
    "reason": "[test-bakcup] failed to create repository",
    "caused_by": {
      "type": "creation_exception",
      "reason": "Guice creation errors:\n\n1) Error injecting constructor, RepositoryException[[test-bakcup] location [/data/mount] doesn't match any of the locations specified by path.repo because this setting is empty]\n  at org.elasticsearch.repositories.fs.FsRepository.<init>(Unknown Source)\n  while locating org.elasticsearch.repositories.fs.FsRepository\n  while locating org.elasticsearch.repositories.Repository\n\n1 error",
      "caused_by": {
        "type": "repository_exception",
        "reason": "[test-bakcup] location [/data/mount] doesn't match any of the locations specified by path.repo because this setting is empty"
      }
    }
  },
  "status": 500
}

开始创建一个快照

##在后头创建一个快照
curl -XPUT  http://192.168.56.20:9200/_snapshot/my_backup/snapshot_1
##也可以在前台运行。
curl -XPUT  http://192.168.56.11:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true
##上面的参数会在my_backup仓库里创建一个snapshot_1 的快照。

可以选择相应的索引进行备份

curl -XPUT  http://192.168.56.20:9200/_snapshot/my_backup/snapshot_2 -d '
{
    "indices": "bank,logstash-2015.05.18"
}'
##解释：
创建一个snapshot_2的快照，只备份bank,logstash-2015.05.18这两个索引。

查看备份状态

整个备份过程中，可以通过如下命令查看备份进度

1 2	curl -XGET http://192.168.0.1:9200/_snapshot/my_backup/snapshot_20150812/_status

主要由如下几种状态：

INITIALIZING 集群状态检查，检查当前集群是否可以做快照，通常这个过程会非常快
STARTED 正在转移数据到仓库
FINALIZING 数据转移完成，正在转移元信息
DONE　完成
FAILED 备份失败

取消备份

1	curl -XDELETE http://192.168.0.1:9200/_snapshot/my_backup/snapshot_20150812

获取所有快照信息。

1
2
3

curl -XGET http://192.168.56.20:9200/_snapshot/my_backup/_all |python -mjson.tool
##解释
查看my_backup仓库下的所有快照。

手动删除快照

1
2
3

curl -XDELETE http://192.168.56.20:9200/_snapshot/my_backup/snapshot_2
##解释
删除my_backup仓库下的snapshot_2的快照。

备份恢复

恢复备份

1	curl -XPOST http://192.168.0.1:9200/_snapshot/my_backup/snapshot_20150812/_restore

同备份一样，也可以设置wait_for_completion=true等待恢复结果

1	curl -XPOST http://192.168.0.1:9200/_snapshot/my_backup/snapshot_20150812/_restore?wait_for_completion=true

默认情况下，是恢复所有的索引，我们也可以设置一些参数来指定恢复的索引，以及重命令恢复的索引，这样可以避免覆盖原有的数据.

curl -XPOST http://192.168.0.1:9200/_snapshot/my_backup/snapshot_20150812/_restore
{
    "indices": "index_1",
    "rename_pattern": "index_(.+)",
    "rename_replacement": "restored_index_$1"
}
上面的indices, 表示只恢复索引’index_1’
rename_pattern: 表示重命名索引以’index_’开头的索引.
rename_replacement: 表示将所有的索引重命名为’restored_index_xxx’.如index_1会被重命名为restored_index_1.

查看所有索引的恢复进度

1	curl -XGET http://192.168.0.1:9200/_recovery/

查看索引restored_index_1的恢复进度

1	curl -XGET http://192.168.0.1:9200/_recovery/restored_index_1

取消恢复

只需要删除索引，即可取消恢复

1	curl -XDELETE http://192.168.0.1:9200/restored_index_1

动态缩写或者扩容副本分片数量

副本节点的数量可以在运行中的集群中动态的变更，这允许我们可以根据需求扩大或者缩小规模。

比如我们执行一次缩小规模操作:

curl -XPUT  http://192.168.56.12:9200/shakespeare/_settings '
{
   "number_of_replicas" : 3
}'
执行结果返回:
{
    "acknowledged": true

这时,我们看到片的信息分又重新做了调整: 主分片分布在节点es-node1,es-node3,es-node4上.从分片分布在es-node2,es-node3,es-node4上.

运维相关

如何重启elastic单台节点

停止数据写入，在重启单台节点，启动后分配同步会很快。
如果开启数据写入，在重启单台节点，分片同步会很耗时。

elastic 帮助文档

elastic调优参考

elastic监控

Mastering Elasticsearch(中文版)

ELK-权威指南

Elasticsearch 权威指南

ELK 之二：ElasticSearch 和Logstash高级使用

elastic-生产部署时遇到的问题

out of memory错误

因为默认情况下es对字段数据缓存（Field Data Cache）大小是无限制的，查询时会把字段值放到内存，特别是facet查询，对内存要求非常高，它会把结果都放在内存，然后进行排序等操作，一直使用内存，直到内存用完，当内存不够用时就有可能出现out of memory错误。

问题原理

设置es的缓存类型为Soft Reference，它的主要特点是据有较强的引用功能。只有当内存不够的时候，才进行回收这类内存，因此在内存足够的时候，它们通常不被回收。另外，这些引用对象还能保证在Java抛出OutOfMemory 异常之前，被设置为null。它可以用于实现一些常用图片的缓存，实现Cache的功能，保证最大限度的使用内存而不引起OutOfMemory。在es的配置文件加上index.cache.field.type: soft即可。
设置es最大缓存数据条数和缓存失效时间，通过设置index.cache.field.max_size: 50000来把缓存field的最大值设置为50000，设置index.cache.field.expire: 10m把过期时间设置成10分钟。

解决办法

1
2
3

index.cache.field.type: soft
index.cache.field.max_size: 50000
index.cache.field.expire: 5m