Before using, please readData model and data format introduction.

Overview

Sensors Analytics supports the Logstash + Filebeat approachReal-time import of back-end data for Sensors analytics.

Logstash is an open source, server-side data processing pipeline from Elastic that captures data from multiple sources simultaneously, transforms it, and sends it to a specified repository.Logstash Official Introduction.

Filebeat is a lightweight log collector designed by Elastic to solve the Logstash overload problem. It uses Logstash + Filebeat to collect logs generated by a large number of servers, VMS, and containers.Filebeat Official Introduction.

The data collection process based on Logstash + Filbeat is as follows: back-end SDK generates data files => Filebeat Reads the file. Logstash Beat input => Logstash sensors_analytic output => Sensors analytics.

The structure is shown in the following figure:

This article describes how to use Logstash + Filebeat in the following three scenarios to complete the data acquisition and send to the Sensors analytics.

Before reading the details, please read the instructions for Logstash and FilebeatVersion support information.

Logstash Instructions for use

Logstash must have a sensors_analytics_output plug-in in either solution.

Logstash Download and install

Please refer to installing Logstash Official Explanatory Document , choose your preferred download and installation method.

Install logstash-output-sensors_analytics plugin

The plugin will check whether the data is in Json format, and add some field values required by Sensors such as lib, data, etc., package the data, compress it and send it to the data receiving address of Shenze after base64.
The plugin has been released to the official Ruby public library,Github repository : logstash-output-sensors_analytics . The installation can be performed in the Logstash directory. The installation takes some time.

bin/logstash-plugin install logstash-output-sensors_analytics
CODE

Execute after installation is completed:

bin/logstash-plugin list
CODE


See the newly installed plugin logstash-output-sensors_analytics to confirm the successful installation.

The plugin can be directly configured in the output section when used.

output{ sensors_analytics { url => "https://example.sensorsdata.cn/sa" } }
CODE


Sensors Analytics parameter description:

Parameter NameTypeRequiredDescription
urllistYes

The data receiving URL of Sensors Analytics, the complete URL ending with sa, if there is a port number, it needs to be added. For example, url => "https://example.sensorsdata.cn/sa". If the cluster reports on the intranet IP, multiple data receiving URLs can be configured simultaneously, for example: url => ["http://10.120.157.227:8106/sa","http://10.120.72.166:8106/sa"].

projectstringNoProject name, the default is default. If configured, it will override the project specified in the event and url. The priority is: project configuration > project specified in the event > project specified in the url.
flush_interval_secnumberNoThe time interval (in seconds) that triggers flush, with a default value of 2.
flush_batch_sizenumberNoThe maximum number of records that trigger batch sending, with a default value of 100.
enable_filebeat_status_reportbooleannoEnabled by default, shows the Filebeat read status for activities within a minute in the log display.

Logstash Configuration

Logstash Pipeline Configuration

Logstash supports running multiple pipelines simultaneously, with each pipeline being independent and having its own input and output configurations. The configuration file for pipelines is located in config/pipelines.yml. If you are currently using Logstash for other log collection tasks, you can add a pipeline specifically for collecting data from Sensors Analytics and sending it to Sensors Analytics for analysis.

  • An example of pipelines.yml is as follows:
# 原来使用的 Pipeline 配置 - pipeline.id: elastic-output pipeline.workers: 4 path.config: "/home/app/logstash/elastic_output.config" # 新增的 sensorsdata 的管道配置 - pipeline.id: sensorsdata-output # 使用不同的 Logstash 配置 pipeline.workers: 1 queue.type: persisted # 使用不同的输入输出配置 path.config: "/home/app/logstash/beat_sa_output.config"
YML

Note: The input for Sensors Analytics logs should be different from other pipelines imported into Logstash. For example, if you previously used Filebeat to collect and send logs to port 5044 of Logstash, Filebeat responsible for collecting Sensors Analytics logs can send data to port 5055 to apply the pipeline with id = sensorsdata-output.

For more information on Pipelines, please refer to the Multiple-Pipelines official documentation.

Logstash Input and Output Configuration

The configuration mainly includes the input, filter, and output sections. When processing Sensors Analytics log data, only the input and output need to be configured.

  • An example of beat_sa_output.conf is as follows:
# 使用 beats 作为输入 input { beats { port => "5044" } } # 使用 sensors_analytics 作为输出 output{ sensors_analytics { url => "https://example.sensorsdata.cn/sa" } }
YML

Reminder: When using the logstash-file-input-plugin, renaming read files (that still match the pattern) while Logstash is closed will result in duplicate reading upon startup.

Logstash Operational configuration

Logstash Use config/logstash.yml as a run configuration default.

Here are some things to note:

1. To ensure the sequence of data import, modify the configurationpipeline.workers of value is 1. Configuration item pipeline.workers By default, the value of "workers" is the number of cpu cores. When the value of "workers" is greater than 1, the data processing sequence changes.
2. To ensure that data transmission is not lost due to unexpected termination of the program, please set queue.type: persisted, this configuration is the type of buffer queue used by Logstash, so that the configuration can continue to send data in the buffer queue after Logstash is restarted. queue.type default value ismemory (memory-based).
3. Recommended setting queue.drain is true, this configuration item causes Logstash to send all the data in the buffer queue before exiting normally.

For more information about logstash please refer to: logstash.yml Official documentation .

Logstash boot

  • Direct boot, will use config/pipelines.yml as a pipeline configure and run.
bin/logstash
CODE
  • Appoint ~/logstash/beat_sa_output.conf as start of an input/output profile, will be used config/logstash.yml as operational configuration.
bin/logstash -f ~/logstash/beat_sa_output.conf
CODE
  • Specify input/output startup through command parameters, you can use config/logstash.yml as operational configuration.
bin/logstash -e 'output { sensors_analytics { url => "https://example.sensorsdata.cn/sa" }}'
CODE

For more startup information, please refer to:Getting Started with Logstash Official Document.

Logstash schedule

When Logstash uses Filebeat as input, the file read progress is controlled by Filebeat. When using other input methods, such as Logstash reading files, consuming Kafka, etc., the data read progress is stored in the Logstash directory data/plugins , the disk based data buffer queue is stored indata/queue 中。可在 logstash.yml 中配置 path.data to specify the Logstash startup to use data/ directory location.

sensors-output-plugin Upgrade and Rollback

  • Plug-ins have been installed and upgraded to the latest version
bin/logstash-plugin update logstash-output-sensors_analytics
CODE
  • Install the specified version of the plug-in
# v0.1.0 bin/logstash-plugin install --version 0.1.0 logstash-output-sensors_analytics # v0.1.2 bin/logstash-plugin install --version 0.1.2 logstash-output-sensors_analytics # v0.1.4 bin/logstash-plugin install --version 0.1.4 logstash-output-sensors_analytics
CODE
  • Uninstall plug-in
bin/logstash-plugin remove logstash-output-sensors_analytics
CODE


Filebeat Instructions for use

Filebeat Download and install

Please refer to:Install Filebeat Official Document . Choose your preferred download and installation method.

Filebeat Configuration

Use Filebeat to read buried log files generated by the backend SDK. The default Filebeat configuration file is:filebeat.yml . Note When modifying the configuration file, use the log type as the input for Filebeat. paths specifies the location of data files `*` matches the file name path output by the backend SDK.

  • Input/output configuration of filebeat 'filebeat.yml' Reference example:
# Filebeat 收集 /var/logs/ 目录下所有以 service_log. 开头的数据文件 filebeat.shutdown_timeout: 5s filebeat.inputs: - type: log paths: - /var/logs/service_log.* # 将数据发送至地址为 10.42.32.70:5044 或 10.42.50.1:5044 的 logstash output.logstash: hosts: ["10.42.32.70:5044","10.42.50.1:5044"]
YML

What should be noted is:

  1.  The imported data must be in the Sensors data format.
  2. Do not set additional Settings if you need to ensure the import order loadbalance : true , when multiple Logstash hosts are configured as data receivers, this setting will use polling to send data to all Logstash hosts, which may cause the data order to be disrupted. The default configuration of Filebeat isloadbalance : false .
  3.  Filebeat file read progress is stored indata/registry directory, used to restore progress at startup.
  4. While Filebeat is running, avoid editing files with editors such as vim that can generate copies of files. Filebeat reads temporarily generated files in the directory.
  5. You are advised to add filebeat.shutdown_timeout: 5s. When filebeat exits, a small number of data duplicities may occur.

For more configuration information, please refer to:Filebeat Official Document.

Active Filebeat

./filebeat -e -c filebeat.yml 
CODE

-c To specify filebeat.yml  Location of the configuration file, -e Filebeat logs can be displayed on the terminal.

Filebeat schedule

If you have multiple unread files in your directory, filebeat will read multiple files at the same time, and the read progress of the files will be stored in the Filebeat directory data/registry, when Filebeat is restarted, sending will continue according to the progress.

Data collection in the server scenario

If your back-end application for logging production is deployed directly on the server, this section describes how to use Filebeat + Logstash to collect the generated log data. It can also be used in this scenario LogAgent complete the log collection.

Deploy Logstash

If you are already using Logstash for some other log collection, please refer toLogstash 配置 。

Refer to Logstash 使用说明 Deploy Logstash directly on one or more of your servers.

  • Logstash I/O configuration logstash.conf Example:
# 使用 beats 作为输入 input { beats { port => "5044" } } # 使用 sensors_analytics 作为输出 output{ sensors_analytics { url => "https://example.sensorsdata.cn/sa" } }
CODE
  • Logstash running configuration logstash.yml example:
pipeline.workers: 1 queue.type: persisted queue.drain: true
CODE
  • Start Logstash application logstash.conf, for more startup methods, please refer to Logstash Start and Logstash official documentation.
bin/logstash -f logstash.conf
CODE

Deploy Filebeat

Deploy Filebeat on servers that generate buried logs to collect and send logs from specified directories to Sensa Analysis.

The SDKs of various back-end languages in Sensa Analytics all support writing data to files, such as using ConcurrentLoggingConsumer in Java SDK, FileConsumer in PHP SDK, and LoggingConsumer in Python SDK. They can write log files to the specified directories.

  • Using Java SDK as an example:
// 使用 ConcurrentLoggingConsumer 初始化 SensorsAnalytics // 将数据输出到 /data/sa_log 下的 service_log 开头的文件中,每天一个文件 final SensorsAnalytics sa = new SensorsAnalytics( new SensorsAnalytics.ConcurrentLoggingConsumer("/data/sa_log/service_log")); // 使用神策分析记录用户行为数据 sa.track(distinctId, true, "UserLogin"); sa.track(distinctId, true, "ViewProduct"); // 程序结束前,停止神策分析 SDK 所有服务 sa.shutdown();
JAVA

The above configuration will generate data files in the directory /data/sa_log, one file per day, with the file list as follows:

service_log.20170923 service_log.20170924 service_log.20170925
CODE

Filebeat reads the log files starting with filebeat.yml in the directory /data/sa_log and sends them to the deployed Logstash.

  • filebeat.yml reference example:
# Filebeat 收集 /data/sa_log 目录下所有以 service_log. 开头的数据文件 filebeat.inputs: - type: log paths: - /data/sa_log/service_log.* # 将数据发送至地址为 10.42.32.70:5044 或 10.42.50.1:5044 的 Logstash output.logstash: hosts: ["10.42.32.70:5044","10.42.50.1:5044"]
YML

When there are multiple log-generating directories on a server, Filebeat can be configured to read multiple directories simultaneously.

  • Read multiple directories filebeat.yml reference example:
filebeat.inputs: - type: log paths: # 收集 /data/sa_log 目录下所有以 service_log. 开头的数据文件 - /data/sa_log/service_log.* # 收集 /another/logs/ 目录下所有以 sdk_log. 开头的数据文件 - /another/logs/sdk_log.* # 将数据发送至地址为 10.42.32.70:5044 或 10.42.50.1:5044 的 Logstash output.logstash: hosts: ["10.42.32.70:5044","10.42.50.1:5044"]
YML
  • Start Filebeat in the background
nohup ./filebeat -c filebeat.yml > /dev/null 2>&1 &
CODE

Data collection in Docker containerized scenario

Deploy Logstash

To ensure the stable operation of Logstash, it is recommended to deploy Logstash directly. The docker deployment method below is for reference only.

If you are already using Logstash for other log collection tasks, please refer to Logstash Configuration . To avoid data loss when the container closes unexpectedly, try to save the data in the buffer.

First, obtain a Logstash image with the sensors_analytics output plugin

Method 1: Directly download the Logstash image with the installed plugin.

docker pull sensorsdata/logstash:latest
CODE

Method 2: Create your own Logstash image with the sensors_analytics output plugin.

  • Dockerfile example:
FROM docker.elastic.co/logstash/logstash:7.2.0 RUN /usr/share/logstash/bin/logstash-plugin install logstash-output-sensors_analytics
CODE

Prepare the required configuration file.

  • Logstash input and output configuration logstash.conf example:
input { beats { port => "5044" } } output { sensors_analytics{ url => "http://10.42.34.189:8106/sa?project=default" project => "default" } }
CODE
  • Logstash runtime configuration logstash.yml example:
pipeline.workers: 1 queue.type: persisted queue.drain: true
CODE

Since Logstash needs to use disk as a buffer queue, we create a Volume specifically for saving Logstash's progress and buffer queue. When restarting the Logstash container, please reuse this Volume.

docker volume create logstash-data
CODE

Mount the configuration file and the volume for storing the cache queue when starting the container.

  • Reference example for starting the container:
docker run -d -p 5044:5044 --name logstash \ --mount source=logstash-data,target=/usr/share/logstash/data \ -v ~/local/logstash/logstash.conf:/usr/share/logstash/pipeline/logstash.conf \ -v ~/local/logstash/logstash.yml:/usr/share/logstash/config/logstash.yml \ sensorsdata/logstash:latest
CODE

Deploy Filebeat

Solution 1: Install Filebeat in the SDK container to collect logs and send them to Logstash (recommended)

Install Filebeat on a container that can generate buried point logs and send them to the deployed Logstash. Filebeat is a lightweight log collector that uses approximately 10 MB of memory and does not impose too much burden on your work container.

  • Advantages: Easy deployment, no need to worry about Filebeat's progress issue.
  • Disadvantages: Invades the original SDK container.

An example using JavaSDK as the working container:

  • Dockerfile example:
FROM centos ADD jdk-8u211-linux-x64.tar.gz /usr/local/ ENV JAVA_HOME /usr/local/jdk1.8.0_211 ENV CLASSPATH $JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar ENV PATH $PATH:$JAVA_HOME/bin COPY javasdk.jar /home # 在容器中安装一个 Filebaet ADD filebeat-7.2.0-linux-x86_64.tar.gz /home # 一份默认的配置文件 COPY filebeat.yml /etc COPY run.sh /home WORKDIR /home # 在 run.sh 中启动 SDK 进程和 Filebeat 进程 CMD ["/bin/bash","-e","run.sh"]
CODE
  • Contents of run.sh:
#!/bin/bash nohup java -jar javasdk.jar > /dev/null 2>&1 & nohup filebeat-7.2.0-linux-x86_64/filebeat -c /etc/filebeat.yml > /dev/null 2>&1 & while [[ true ]]; do sleep 10000 done
CODE

In the container, ensure that the log writing path of the SDK is the same as the log reading path of Filebeat.

  • Using Java SDK as an example:
// 使用 ConcurrentLoggingConsumer 初始化 SensorsAnalytics // 将数据输出到 /data/sa_log 下的 service_log 开头的文件中,每天一个文件 final SensorsAnalytics sa = new SensorsAnalytics( new SensorsAnalytics.ConcurrentLoggingConsumer("/data/sa_log/service_log")); // 使用神策分析记录用户行为数据 sa.track(distinctId, true, "UserLogin"); sa.track(distinctId, true, "ViewProduct"); // 程序结束前,停止神策分析 SDK 所有服务 sa.shutdown();
JAVA
  • Filebeat configuration example:
# Filebeat 收集 /data/sa_log 目录下所有以 service_log. 开头的数据文件 filebeat.inputs: - type: log paths: - /data/sa_log/service_log.* # 将数据发送至地址为 10.42.32.70:5044 或 10.42.50.1:5044 的 Logstash output.logstash: hosts: ["10.42.32.70:5044","10.42.50.1:5044"]
YML
  • Reference example for starting:
docker run -d --name sdk-beat \ -v ~/local/filebeat/filebeat.yml:/filebeat.yml \ sdk-beat
CODE

Solution 2: SDK uses shared data to save logs for Filebeat to read

Backend SDK and Filebeat run on separate containers. The SDK stores the generated logs on a data volume, and Filebeat reads the data from the volume and sends it to the deployed Logstash.

  • Advantages: No intrusion into the existing SDK container.
  • Disadvantages: More complicated to use.

First, create a data volume and choose your preferred storage method. The following example uses a local disk. Ensure that your container has write permissions to the data volume.

docker volume create sa-log
CODE

Launch the Backend SDK container and mount the log directory to the data volume.

docker run -d --name sdk \ --mount source=sa-logs,target=/your/logs/path \ your-sdk-image
CODE

Launch the Filebeat container and mount the log reading directory to the data volume. Also, mount the directory for storing file reading progress to the data volume. Each data volume will have its own reading progress. When Filebeat is restarted, you can reuse this progress to resume sending logs.

  • Example configuration file filebeat.yml:
filebeat.inputs: - type: log paths: - /usr/share/filebeat/input/service_log.* # 将数据发送至地址为 10.42.32.70:5044 或 10.42.50.1:5044 的 Logstash output.logstash: hosts: ["10.42.32.70:5044","10.42.50.1:5044"]
YML

Mount both the log reading directory and the progress directory to the data volume.

docker run -d --name filebeat \ --mount source=sa-logs,target=/usr/share/filebeat/input \ --mount source=sa-logs,target=/usr/share/filebeat/data/ \ -v ~/docker_workspace/filebeat/filebeat.yml:/usr/share/filebeat/filebeat.yml \ docker.elastic.co/beats/filebeat:7.2.0
CODE

If you want to mount the same data volume to multiple SDK containers, it is recommended to use the environment variable HOSTNAME as the path name to store the log files. Then, mount the parent directory to the data volume.

  • For example:

The output path for the log within the container: /mount/${HOSTNAEM}-logs/service_log.20190708

Mount the /mount directory in the container to the volume.

Therefore, the storage format for the log directory in the volume is:

|-- Volume | |-- c1369239e7ba-logs | |-- fcdfdb3bdb2b-logs | | |-- service_logs.20190702 | | |-- service_logs.20190703 | |-- da86e6ba6ca1-logs | | |-- service_logs.20190701 | | |-- service_logs.20190702 | | |-- service_logs.20190703
CODE

Change the file reading path for Filebeat:

filebeat.inputs: - type: log paths: - /usr/share/filebeat/input/*/service_log.*
CODE

Using Java SDK as an example, generate a path with the hostname to store the logs:

// 获取 HOSTNAME String hostname = System.getenv("HOSTNAME"); File logPath = new File("/mount/" + hostname + "-logs/"); if (!logPath.exists()) { logPath.mkdirs(); } // 使用 ConcurrentLoggingConsumer 初始化 SensorsAnalytics // 将数据输出到 /mount/${HOSTNAME}-logs/ 目录下以 service_log 为开头保存,运行容器时将 /mount 目录挂载到宿主机上 final SensorsAnalytics sa = new SensorsAnalytics( new SensorsAnalytics.ConcurrentLoggingConsumer(logPath.getAbsolutePath() + "/service_log")); // 使用神策分析记录用户行为数据 sa.track(distinctId, true, "ViewProduct"); // 程序结束前,停止神策分析 SDK 所有服务 sa.shutdown();
JAVA

If you do not want to change the log storage method of the original container, you can create a symbolic link pointing to the log directory when the container starts and mount the symbolic link to the volume.

rm -rf /your/logs/path \ && mkdir -p /mount/${HOSTNAME}_logs \ && ln -s /mount/${HOSTNAME}_logs /your/logs/path \ && bin/sdk start
CODE

Mount /mount to the volume when starting the container.

docker run -d --name sdk \ --mount source=sa-logs,target=/mount \ your-sdk-image
CODE

Data Collection in Kubernetes (K8s) container orchestration scenarios

Logstash deployment

To ensure stable operation of Logstash, you are advised to deploy Logstash directly on the server. The following describes the deployment mode of K8s for reference.
If you are already using Logstash for some other log collection, please refer to Logstash 配置. To avoid data loss due to unexpected closure of the container, try to save the data in the buffer.

Register a Logstash configuration file, use Filebeat as input, sensors_analytics as output, and specify the run configuration.

  • Configuration file logstash-conf.yaml Reference example:
apiVersion: v1 kind: ConfigMap metadata: name: logstash-set labels: sa-app: logstash data: logstash.yml: |- http.host: 0.0.0.0 pipeline.workers: 1 queue.type: persisted # 用于限制缓冲队列的大小默认值为 1024MB ,该数值在设置时应该小于 Pod 使用的存储卷大小。 queue.max_bytes: 900mb queue.drain: true --- apiVersion: v1 kind: ConfigMap metadata: name: logstash-pipe-conf labels: sa-app: logstash data: logstash.conf: |- input { beats { port => "5044" } } output { sensors_analytics { url => "http://10.42.34.189:8106/sa?project=default" } }
YML

In order not to lose data, a hard disk based data buffer queue is used (queue.type: persisted), therefore, you need to save the Logstash progress information outside the container, so that when Logstash restarts, it can continue to finish sending.

It is recommended to deploy in StatefulSet mode to save Logstash state.

First, create a StorageClass to generate PV of saving progress and set manual recycling. The following takes NFS as an example.

  • logstash-sc.yaml reference examples are as follows:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: logstash-nfs-storage provisioner: nfs-provisioner reclaimPolicy: Retain 
YML

Then, create a StatefulSet applicationlogstash-nfs-storage, provide network access to each Logstash Pod through Headless Service.

  • logstash-sts.yaml Reference examples are as follows:
apiVersion: v1 kind: Service metadata: name: logstash labels: app: logstash spec: ports: - port: 5044 name: beat-in clusterIP: None selector: app: logstash --- apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: logstash spec: # 使用上面的 Headless Service serviceName: "logstash" selector: matchLabels: app: logstash replicas: 3 template: metadata: labels: app: logstash spec: containers: - name: logstash image: sensorsdata/logstash:latest ports: - containerPort: 5044 name: beat-in volumeMounts: - name: logstash-pipe-conf mountPath: /usr/share/logstash/pipeline/logstash.conf subPath: logstash.conf - name: logstash-set mountPath: /usr/share/logstash/config/logstash.yml subPath: logstash.yml # 容器中 /usr/share/logstash/data 目录下保存着缓冲队列 ,与进度信息。 - name: ldata mountPath: /usr/share/logstash/data volumes: - name: logstash-pipe-conf configMap: name: logstash-pipe-conf - name: logstash-set configMap: name: logstash-set volumeClaimTemplates: # Logstash 进度数据使用的 PVC 模板 - metadata: name: ldata spec: accessModes: [ "ReadWriteOnce" ] # 使用的存储类名称,需要提前创建。 storageClassName: "logstash-nfs-storage" resources: requests: # 大小要高于缓冲队列的最大长度限制 storage: 1Gi
YML

StatefulSet Specifies the Pod name generation rule after the creation StatefulSetName - Pod - Number.

The above configuration file generates Pods named logstash-0, logstash-1, and logstash-2. Pod copies are also created in sequence 0 to N-1 and deleted in sequence N-1 to 0.

Headless Service creates a DNS domain name for each Pod copy it controls, and the full domain rule is:(pod name).(headless server name).namespace.svc.cluster.local, so Filebeat looks for Logstash by domain name, not IP. If the default namespace is used, you can omit itnamespace.svc.cluster.local .

StatefulSet according to volumeClaimTemplates, create a PVC for each Pod, with the PVC naming prefix:namespace-volumeMounts.name - volumeClaimTemplates.name - pod_name, deleting a copy of a Pod does not delete the PVC, and the new Pod reuses the progress in the previous PVC after restarting.

After the creation is complete, check the running status:

kubectl get pods -l app=logstash NAME READY STATUS RESTARTS AGE logstash-0 1/1 Running 0 3h56m logstash-1 1/1 Running 0 3h56m logstash-2 1/1 Running 0 3h56m
CODE

Take a look at the creation of the data volume:

kubectl get pvc -l app=logstash NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ldata-logstash-0 Bound pvc-c1833d35-d2ee-49a5-ae16-3a9d3227ebe5 1Gi RWO logstash-nfs-storage 3h56m ldata-logstash-1 Bound pvc-9aa4b50c-45f7-4b64-9e4d-056838906675 1Gi RWO logstash-nfs-storage 3h56m ldata-logstash-2 Bound pvc-95bcdbf0-e84d-4068-9967-3c69c731311b 1Gi RWO logstash-nfs-storage 3h56m
CODE

Check the DNS creation inside the cluster:

for i in 0 1; do kubectl exec logstash-$i -- sh -c 'hostname'; done logstash-0 logstash-1 logstash-2 kubectl run -i --tty --image busybox:1.28.3 dns-test --restart=Never --rm /bin/sh nslookup logstash-0.logstash Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: logstash-0.logstash Address 1: 10.244.7.54 logstash-0.logstash.default.svc.cluster.local nslookup logstash-1.logstash Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: logstash-1.logstash Address 1: 10.244.5.150 logstash-1.logstash.default.svc.cluster.local nslookup logstash-2.logstash Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: logstash-2.logstash Address 1: 10.244.34.177 logstash-2.logstash.default.svc.cluster.local
CODE
  • Expansion and reduction of Logstash capacity

StatefulSet's update policy is also sequential.

Change the StatefulSet capacity previously set from 3 to 5.

kubectl scale sts web --replicas=5 kubectl get pods -l app=logstash NAME READY STATUS RESTARTS AGE logstash-0 1/1 Running 0 6h1m logstash-1 1/1 Running 0 6h1m logstash-2 1/1 Running 0 6h1m logstash-3 1/1 Running 0 1h3m logstash-4 1/1 Running 0 1h3m
CODE

The number of the new Pod increases on the basis of the original.

Change the StatefulSet capacity from 5 back to 3.

kubectl scale sts web --replicas=3
CODE

The previously added PVC will not be deleted and will continue to be reused the next time the capacity is reached. Don't worry about Filebeat sending data to a deleted Logstash, Filebeat will find another Logstash that works.
Since queue.drain: true is set, the deleted Logstash will send all the data in the buffer before closing.

Deploy Filebeat

Solution 1: Package Filebeat and back-end SDK in the same Pod to collect log files (recommended)

Configure the Filebeat container in the same Pod as the back-end SDK container that generates logs. The back-end SDK writes the logs to emptyDir, which Filebeat reads and sends to Logstash.

  • Advantages:Easy to deploy, add a Filebeat to any Pod that has logs.
  • Disadvantages:Coupled to the SDK Pod, Filebeat may be numerous and somewhat redundant.

Sensors SDK policy analytics the back-end language support the data written to the file, for example, the Java SDK ConcurrentLoggingConsumer, PHP SDK FileConsumer, Python SDK LoggingConsumer.

  • Take the Java SDK as an example:
// 使用 ConcurrentLoggingConsumer 初始化 SensorsAnalytics // 将数据输出到 /data/sa_log 下的 service_log 开头的文件中,每天一个文件 final SensorsAnalytics sa = new SensorsAnalytics( new SensorsAnalytics.ConcurrentLoggingConsumer("/data/sa_log/service_log")); // 使用神策分析记录用户行为数据 sa.track(distinctId, true, "UserLogin"); sa.track(distinctId, true, "ViewProduct"); // 程序结束前,停止神策分析 SDK 所有服务 sa.shutdown();
JAVA

The above configuration will generate data files in the /data/sa_log directory, one file a day, the file list is as follows:

service_log.20170923 service_log.20170924 service_log.20170925
YML

When deploying Pod, first place the SDK container /data/sa_log The contents of the directory are mounted to emptyDir: {} . Then set the file directory read by Filebeat to:/var/log/containers/service_log.*. Filebeat will read all the files in the directoryservice_log. starting file. Finally put Filebeat into the container/var/log/containers/ directory is also mounted to emptyDir: {} , log files generated by the SDK container can be read at runtime.

  • Deployment file pod.yaml Reference example:
apiVersion: v1 kind: ConfigMap metadata: name: filebeat-config-in labels: sa-app: filebeat data: filebeat.yml: |- filebeat.inputs: - type: log # 读取 /var/log/containers 目录下以 service_log 开头的文件。 paths: - /var/log/containers/service_log.* output.logstash: # 集群内网 Logstash hosts: ["logstash-0.logstash:5044","logstash-1.logstash:5044"] --- apiVersion: apps/v1beta1 kind: Deployment metadata: name: javasdk-beat labels: sa-app: javasdk-beat spec: replicas: 3 template: metadata: name: javasdk-beat labels: sa-app: javasdk-beat spec: containers: - name: javasdk image: javasdk:20190705 command: ["/bin/bash", "-c"] args: - "bin/javasdk start" volumeMounts: - name: log-path # /data/sa_log 为后端 SDK 存放日志的目录,挂载到 emptyDir 上 mountPath: /data/sa_log - name: filebeat image: docker.elastic.co/beats/filebeat:7.2.0 args: [ "-c", "/etc/filebeat.yml", "-e", ] volumeMounts: - name: config mountPath: /etc/filebeat.yml readOnly: true subPath: filebeat.yml - name: log-path # 文件读取目录也挂载到 emptyDir 上 mountPath: /var/log/containers readOnly: true volumes: - name: log-path emptyDir: {} - name: config configMap: name: filebeat-config-in
YML

Solution 2: Filebeat is deployed on the K8s node to collect log files

Filebeat is deployed in DaemonSet mode on the K8s node to collect log data. The back-end SDK running on the node stores logs in the specified directory of the host machine, which Filebeat reads and sends to Logstash.

  • Advantages:Filebeat is easy to deploy and uncoupled to SDK Pods.
  • Disadvantages:The directory issue needs to be resolved, there will be additional log files on the host.

Given that there may be multiple identical back-end SDK containers on the same host, you need to make each container use a different directory when writing logs to the host directory. When starting the container, you are advised to use the system environment variable HOSTNAME as the path name to store log files, and then mount the upper-level directory to the host directory.

  • Give an example:

The output path of the container log in the container: /mount/${HOSTNAEM}-logs/service_log.20190708

The path where the host stores the logs: /home/data/javasdk_logs/

Mount /mount to /home/data/javasdk_logs/ below.

Therefore, the contents stored in the /home/data/javasdk_logs/ directory on the host are roughly as follows:

[root@node-1 javasdk_logs]$ pwd /home/data/javasdk_logs/ [root@node-1 javasdk_logs]$ ls -l drwxr-xr-x 2 root root 22 Jul 8 12:06 javasdk-7d878c784d-5fpjz_logs drwxr-xr-x 2 root root 22 Jul 6 18:33 javasdk-7d878c784d-7xmbb_logs drwxr-xr-x 2 root root 22 Jul 6 18:52 javasdk-7d878c784d-vv9fz_logs drwxr-xr-x 2 root root 22 Jul 8 12:08 javasdk-7d878c784d-w7q65_logs drwxr-xr-x 2 root root 22 Jul 8 11:19 javasdk-7d878c784d-wkvxd_logs [root@node-1 javasdk_logs]$ cd javasdk-7d878c784d-5fpjz_logs [root@node-1 javasdk-7d878c784d-5fpjz_logs]$ ls -l -rw-r--r-- 1 root root 6592991 Jul 8 23:59 service_log.20190706 -rw-r--r-- 1 root root 4777188 Jul 8 23:58 service_log.20190707 -rw-r--r-- 1 root root 137778 Jul 8 12:03 service_log.20190708
CODE
  • Take Java SDK as an example, the log is saved using HOSTNAME as the path, as shown below:
// 获取 HOSTNAME String hostname = System.getenv("HOSTNAME"); File logPath = new File("/mount/" + hostname + "-logs/"); if (!logPath.exists()) { logPath.mkdirs(); } // 使用 ConcurrentLoggingConsumer 初始化 SensorsAnalytics // 将数据输出到 /mount/${HOSTNAME}-logs/ 目录下以 service_log 为开头保存,运行容器时将 /mount 目录挂载到宿主机上 final SensorsAnalytics sa = new SensorsAnalytics( new SensorsAnalytics.ConcurrentLoggingConsumer(logPath.getAbsolutePath() + "/service_log")); // 使用神策分析记录用户行为数据 sa.track(distinctId, true, "ViewProduct"); // 程序结束前,停止神策分析 SDK 所有服务 sa.shutdown();
JAVA
  • Refer to the configuration of javasdk.yaml:
apiVersion: apps/v1beta1 kind: Deployment metadata: name: javasdk labels: k8s-app: javasdk spec: replicas: 3 template: metadata: name: javasdk labels: k8s-app: javasdk spec: containers: - name: javasdk image: java-sdk-host:0715 command: ["/bin/bash", "-c"] args: - "bin/javasdk start" volumeMounts: - name: logfile mountPath: /mount volumes: - name: logfile hostPath: path: /home/data/javasdk_logs/ type: DirectoryOrCreate
YML

If you do not want to change the storage method of the original container log path, you can establish a soft link to the log directory when the container starts, and mount the soft link on the host.

  • Refer to the configuration of javasdk.yaml:
apiVersion: apps/v1beta1 kind: Deployment metadata: name: javasdk labels: k8s-app: javasdk spec: replicas: 3 template: metadata: name: javasdk labels: k8s-app: javasdk spec: containers: - name: javasdk image: java-sdk:0712 command: ["/bin/bash", "-c"] args: - "rm -rf /your/logs/path && mkdir -p /mount/${HOSTNAME}_logs && ln -s /mount/${HOSTNAME}_logs /your/logs/path && bin/javasdk start" volumeMounts: - name: logfile mountPath: /mount volumes: - name: logfile hostPath: path: /home/data/javasdk_logs/ type: DirectoryOrCreate
YML

Set the path matched by Filebeat to /home/data/javasdk_logs/*/service_log.*, and also mount the directory where Filebeat stores progress on the host, so that when the DaemonSet is restarted, the Filebeat on the node will continue with the previous progress.

  • The DaemonSet configuration file, filebeat-ds.yaml, is as follows:
apiVersion: v1 kind: ConfigMap metadata: name: filebeat-config labels: sa-app: filebeat data: filebeat.yml: |- filebeat.inputs: - type: log paths: # 采集 service_log 开头的日志文件 - /var/log/containers/*/service_log.* output.logstash: # 部署好的 Logstash hosts: ["logstash-0.logstash:5044","logstash-1.logstash:5044"] --- apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: filebeat labels: sa-app: filebeat spec: template: metadata: labels: sa-app: filebeat spec: serviceAccountName: filebeat terminationGracePeriodSeconds: 30 containers: - name: filebeat image: docker.elastic.co/beats/filebeat:7.2.0 imagePullPolicy: IfNotPresent args: [ "-c", "/etc/filebeat.yml", "-e", ] volumeMounts: - name: config mountPath: /etc/filebeat.yml readOnly: true subPath: filebeat.yml - name: inputs mountPath: /var/log/containers readOnly: true - name: data mountPath: /usr/share/filebeat/data volumes: - name: config configMap: name: filebeat-config - name: inputs hostPath: path: /home/data/javasdk_logs/ - name: data hostPath: path: /home/data/filebeat_data/ type: DirectoryOrCreate --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: filebeat subjects: - kind: ServiceAccount name: filebeat namespace: default roleRef: kind: ClusterRole name: filebeat apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: filebeat labels: sa-app: filebeat rules: - apiGroups: [""] resources: - namespaces - pods verbs: - get - watch - list --- apiVersion: v1 kind: ServiceAccount metadata: name: filebeat namespace: default labels: sa-app: filebeat
CODE

Logstash Data Format Description

The Sensors plugin parses data based on the standard data format provided by Logstash to complete data reporting. When configuring the source and target plugins, ensure that the data format is in the standard Logstash format. The specific format information is as follows:

{ 	"host" => "localhost", 	"@version" => 1, 	"@timestamp" => 2023-01-01T00:00:00, 	"message" => 具体需要上报的 json 数据 }
CODE

Frequently Asked Questions:

:exceptionMessage=>"no implicit conversion of nil into String"

This issue generally occurs when the reported data is not placed in the message, resulting in a JSON parsing error in the plugin. Check the configuration file, for example, pre-parsing the information into JSON, which will cause the message to be empty. There is no need for pre-parsing as the Sensors Analytics plugin will handle data parsing and reporting.