mirror of https://github.com/loggie-io/docs.git
499 lines
20 KiB
Markdown
499 lines
20 KiB
Markdown
# file
|
||
|
||
file source用于日志采集。
|
||
|
||
!!! example
|
||
|
||
```yaml
|
||
sources:
|
||
- type: file
|
||
name: accesslog
|
||
|
||
```
|
||
|
||
!!! tips
|
||
如果你使用logconfig/clusterlogconfig采集容器日志,file source里还增加了额外的字段,请参考[这里](../../discovery/kubernetes/logconfig.md#sources)。
|
||
|
||
## paths
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ---------- | ----------- | ----------- | --------- | -------- |
|
||
| paths | string数组 | 必填 | 无 | 采集的path路径,使用glob表达式来匹配。支持glob扩展表达式`Brace Expansion`和`Glob Star` |
|
||
|
||
!!! example
|
||
|
||
需要采集的目标文件:
|
||
```yaml
|
||
/tmp/loggie/service/order/access.log
|
||
/tmp/loggie/service/order/access.log.2022-04-11
|
||
/tmp/loggie/service/pay/access.log
|
||
/tmp/loggie/service/pay/access.log.2022-04-11
|
||
```
|
||
|
||
对应配置:
|
||
```yaml
|
||
sources:
|
||
- type: file
|
||
paths:
|
||
- /tmp/loggie/**/access.log{,.[2-9][0-9][0-9][0-9]-[01][0-9]-[0123][0-9]}
|
||
```
|
||
|
||
|
||
## excludeFiles
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------------ | ---------- | ---------- | -------- | ------------------------ |
|
||
| excludeFiles | string数组 | 非必填 | 无 | 排除采集的文件正则表达式 |
|
||
|
||
!!! example
|
||
```yaml
|
||
sources:
|
||
- type: file
|
||
paths:
|
||
- /tmp/*.log
|
||
excludeFiles:
|
||
- \.gz$
|
||
```
|
||
|
||
## ignoreOlder
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ----------- | ------------- | ---------- | -------- | ----------------------------------------------------- |
|
||
| ignoreOlder | time.Duration | 非必填 | 无 | 例如48h,表示忽略更新时间在2天之前的文件,无需进行采集 |
|
||
|
||
## ignoreSymlink
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------------- | ------ | ---------- | -------- | -------------------------------- |
|
||
| ignoreSymlink | bool | 非必填 | false | 是否忽略符号链接(软链接)的文件 |
|
||
|
||
|
||
## addonMeta
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------------- | ------ | ---------- | -------- | -------------------------------- |
|
||
| addonMeta | bool | 非必填 | false | 是否添加默认的日志采集state元信息 |
|
||
|
||
!!! example "event示例"
|
||
|
||
```json
|
||
{
|
||
"body": "this is test",
|
||
"state": {
|
||
"pipeline": "local",
|
||
"source": "demo",
|
||
"filename": "/var/log/a.log",
|
||
"timestamp": "2006-01-02T15:04:05.000Z",
|
||
"offset": 1024,
|
||
"bytes": 4096,
|
||
"hostname": "node-1"
|
||
}
|
||
}
|
||
```
|
||
|
||
state含义解释:
|
||
|
||
- pipeline: 所在的pipeline名称
|
||
- source: 所在的source名称
|
||
- filename: 采集的文件名称
|
||
- timestamp: 采集时刻的时间戳
|
||
- offset: 采集的数据在文件的offset偏移量
|
||
- bytes: 采集的数据字节数
|
||
- hostname: 所在节点名称
|
||
|
||
## workerCount
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ----------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
|
||
| workerCount | int | 非必填 | 1 | 读取文件内容的工作线程(goroutine)数。单节点超过100个文件的时候考虑提高 |
|
||
|
||
## readBufferSize
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| -------------- | ------ | ---------- | -------- | ----------------------------------- |
|
||
| readBufferSize | int | 非必填 | 65536 | 单次读取文件的数据量。默认64K=65536 |
|
||
|
||
## maxContinueRead
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| --------------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
|
||
| maxContinueRead | int | 非必填 | 16 | 连续读取同一个文件内容的次数,达到这个次数将强制切换到下个文件读取。主要作用是用来避免活跃文件一直占据读取资源,非活跃文件长时间得不到读取采集 |
|
||
|
||
## maxContinueReadTimeout
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ---------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
|
||
| maxContinueReadTimeout | time.Duration | 非必填 | 3s | 同一个文件最长读取时间,超过这个时间将强制切换下个文件读取。作用与`maxContinueRead`类似 |
|
||
|
||
## inactiveTimeout
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| --------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
|
||
| inactiveTimeout | time.Duration | 非必填 | 3s | 如果当文件从上一次采集到现在超过inactiveTimeout的话,则认为文件进入不活跃状态(即最后一条日志已经写入完成),则可以安全的采集最后一行日志 |
|
||
|
||
## firstNBytesForIdentifier
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------------------------ | ------ | ---------- | -------- | ------------------------------------------------------------ |
|
||
| firstNBytesForIdentifier | int | 非必填 | 128 | 使用采集目标文件的前n个字符来生成文件唯一code。**如果文件的大小小于n,则该文件暂时不会采集**。用途主要是,结合文件inode信息,用来精确标识一个文件。辅助判断文件是否删除或者是改名 |
|
||
|
||
## multi
|
||
|
||
多行采集相关配置
|
||
|
||
!!! example
|
||
|
||
```yaml
|
||
sources:
|
||
- type: file
|
||
name: accesslog
|
||
multi:
|
||
active: true
|
||
```
|
||
|
||
### active
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------ | ------ | ---------- | -------- | -------------------- |
|
||
| active | bool | 非必填 | false | 是否开启多行采集模式 |
|
||
|
||
### pattern
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------- | ------ | ----------------------------- | -------- | ------------------------------------------------------------ |
|
||
| pattern | string | 当multi.active=true的时候必填 | false | 判断为一条全新日志的正则表达式。例如配置为`'^\['`,则认为行首以`[`开头才是一条新日志,否则将这行内容合入上一条日志作为上一条日志的一部分 |
|
||
|
||
#### maxLines
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| -------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
|
||
| maxLines | int | 非必填 | 500 | 1条日志最多包含几行内容。默认500行,超过上限将强制发送当前日志,超出部分作为新的一条日志 |
|
||
|
||
#### maxBytes
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| -------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
|
||
| maxBytes | int64 | 非必填 | 131072 | 1条日志最多包含几个字节。默认128K,超过上限将强制发送当前日志,超出部分作为新的一条日志 |
|
||
|
||
#### timeout
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
|
||
| timeout | time.Duration | 非必填 | 5s | 1条日志最多等待多久采集为完整的1条日志。默认5s,超过上限将强制发送当前日志,超出部分作为新的一条日志 |
|
||
|
||
## ack
|
||
|
||
source的确认机制相关配置。如果需确保要`at least once`,需要开启ack机制,但是会有一定性能顺耗
|
||
|
||
!!! caution
|
||
该配置只能配置在defaults中
|
||
|
||
!!! example
|
||
|
||
```yaml
|
||
defaults:
|
||
sources:
|
||
- type: file
|
||
ack:
|
||
enable: true
|
||
```
|
||
|
||
### enable
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------ | ------ | ---------- | -------- | ---------------- |
|
||
| enable | bool | 非必填 | true | 是否开启确认机制 |
|
||
|
||
### maintenanceInterval
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
|
||
| maintenanceInterval | time.Duration | 非必填 | 20h | 维护周期。用来定时清理过期的确认文件数据(例如不再采集的文件的ack信息) |
|
||
|
||
## db
|
||
|
||
使用`sqlite3`作为数据库。保存采集过程中的文件名称、文件inode、文件采集的offset等信息。用来在loggie reload或者重启后恢复上一次的采集进度
|
||
|
||
!!! caution
|
||
该配置只能配置在defaults中
|
||
|
||
!!! example
|
||
|
||
```yaml
|
||
defaults:
|
||
sources:
|
||
- type: file
|
||
db:
|
||
file: "./data/loggie.db"
|
||
```
|
||
|
||
### file
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------ | ------ | ---------- | ---------------- | -------------- |
|
||
| file | string | 非必填 | ./data/loggie.db | 数据库文件路径 |
|
||
|
||
### tableName
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| --------- | ------ | ---------- | -------- | ------------ |
|
||
| tableName | string | 非必填 | registry | 数据库表名称 |
|
||
|
||
### flushTimeout
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------------ | ------------- | ---------- | -------- | -------------------------- |
|
||
| flushTimeout | time.Duration | 非必填 | 2s | 定时将采集信息写入到数据库 |
|
||
|
||
### bufferSize
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ---------- | ------ | ---------- | -------- | -------------------------------- |
|
||
| bufferSize | int | 非必填 | 2048 | 输入数据库的采集信息的缓冲区大小 |
|
||
|
||
### cleanInactiveTimeout
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| -------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
|
||
| cleanInactiveTimeout | time.Duration | 非必填 | 504h | 清理数据库中的过期数据。如果数据的更新时间超过配置值,将会删除该条数据。默认保留21天 |
|
||
|
||
### cleanScanInterval
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ----------------- | ------------- | ---------- | -------- | ----------------------------------------------------- |
|
||
| cleanScanInterval | time.Duration | 非必填 | 1h | 周期性的检查数据库中的过期数据。默认每隔1小时检查一次 |
|
||
|
||
## watcher
|
||
|
||
监控文件变化的相关配置
|
||
|
||
!!! caution
|
||
该配置只能配置在defaults中
|
||
|
||
!!! example
|
||
|
||
```yaml
|
||
defaults:
|
||
sources:
|
||
- type: file
|
||
watcher:
|
||
enableOsWatch: true
|
||
```
|
||
|
||
### enableOsWatch
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------------- | ------ | ---------- | -------- | ------------------------------------------------ |
|
||
| enableOsWatch | bool | 非必填 | true | 是否启用OS的监控通知机制。例如linux的inotify指令 |
|
||
|
||
### scanTimeInterval
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ---------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
|
||
| scanTimeInterval | time.Duration | 非必填 | 10s | 周期性的检查文件的状态变更(例如文件的新建、删除等)。默认每隔10s检查一次 |
|
||
|
||
### maintenanceInterval
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
|
||
| maintenanceInterval | time.Duration | 非必填 | 5m | 周期性的维护工作(例如上报采集统计信息、清理文件等)。默认每隔10s检查一次 |
|
||
|
||
### fdHoldTimeoutWhenInactive
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
|
||
| fdHoldTimeoutWhenInactive | time.Duration | 非必填 | 5m | 当文件的上次采集到现在的时间超过限制(文件长时间没有写入,认为大概率不会再写入内容),将会释放该文件的文件句柄以释放系统资源 |
|
||
|
||
### fdHoldTimeoutWhenRemove
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ----------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
|
||
| fdHoldTimeoutWhenRemove | time.Duration | 非必填 | 5m | 当文件被删除且未采集完成,会等待的最大时间来采集完成。超过限制不管文件最终是否采集完成,都会直接释放文件句柄不再采集 |
|
||
|
||
### maxOpenFds
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ---------- | ------ | ---------- | -------- | -------------------------------------------------- |
|
||
| maxOpenFds | int | 非必填 | 512 | 最大打开的文件句柄数量,超出后的文件将暂时不会采集 |
|
||
|
||
### maxEofCount
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ----------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
|
||
| maxEofCount | int | 非必填 | 3 | 最大连续读取文件遇到eof的次数。超过限制认为文件暂时不活跃,将进入“僵尸”队列等待更新事件被激活 |
|
||
|
||
### cleanWhenRemoved
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ---------------- | ------ | ---------- | -------- | ---------------------------------------------- |
|
||
| cleanWhenRemoved | bool | 非必填 | true | 当文件被删除后,是否同步删除db中的采集相关信息 |
|
||
|
||
### readFromTail
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------------ | ------ | ---------- | -------- | ------------------------------------------------------------ |
|
||
| readFromTail | bool | 非必填 | false | 是否从文件的最新一行开始采集,而不管历史写入到文件的内容。适用于采集系统的迁移等场景 |
|
||
|
||
### taskStopTimeout
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| --------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
|
||
| taskStopTimeout | time.Duration | 非必填 | 30s | 采集任务退出的超时时间。是一个兜底方案,放在采集任务假死导致无法reload |
|
||
|
||
### cleanFiles
|
||
|
||
清理文件相关配置。过期且已经采集完成的文件将会直接从磁盘删除以释放磁盘空间
|
||
|
||
#### maxHistoryDays
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| -------------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
|
||
| maxHistoryDays | int | 非必填 | 无 | (采集完成后的)文件最多保留的天数。如果超出限制,将会把文件直接从磁盘中删除。不配置则永远不会删除文件 |
|
||
|
||
|
||
## charset
|
||
|
||
编码转换,用于将不同的编码转换为utf8,当下支持的编码转换格式.
|
||
|
||
|
||
!!! example
|
||
|
||
```yaml
|
||
sources:
|
||
- type: file
|
||
name: demo
|
||
paths:
|
||
- /tmp/log/*.log
|
||
fields:
|
||
topic: "loggie"
|
||
charset: "gbk"
|
||
```
|
||
|
||
## charset
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ---------- | ----------- |--------|-------| -------- |
|
||
| charset | string | 否 | utf-8 | 提取字段的匹配模型 |
|
||
|
||
当前支持的转换为utf-8的编码格式有
|
||
- `nop`
|
||
- `plain`
|
||
- `utf-8`
|
||
- `gbk`
|
||
- `big5`
|
||
- `euc-jp`
|
||
- `iso2022-jp`
|
||
- `shift-jis`
|
||
- `euc-kr`
|
||
- `iso8859-6e`
|
||
- `iso8859-6i`
|
||
- `iso8859-8e`
|
||
- `iso8859-8i`
|
||
- `iso8859-1`
|
||
- `iso8859-2`
|
||
- `iso8859-3`
|
||
- `iso8859-4`
|
||
- `iso8859-5`
|
||
- `iso8859-6`
|
||
- `iso8859-7`
|
||
- `iso8859-8`
|
||
- `iso8859-9`
|
||
- `iso8859-10`
|
||
- `iso8859-13`
|
||
- `iso8859-14`
|
||
- `iso8859-15`
|
||
- `iso8859-16`
|
||
- `cp437`
|
||
- `cp850`
|
||
- `cp852`
|
||
- `cp855`
|
||
- `cp858`
|
||
- `cp860`
|
||
- `cp862`
|
||
- `cp863`
|
||
- `cp865`
|
||
- `cp866`
|
||
- `ebcdic-037`
|
||
- `ebcdic-1040`
|
||
- `ebcdic-1047`
|
||
- `koi8r`
|
||
- `koi8u`
|
||
- `macintosh`
|
||
- `macintosh-cyrillic`
|
||
- `windows1250`
|
||
- `windows1251`
|
||
- `windows1252`
|
||
- `windows1253`
|
||
- `windows1254`
|
||
- `windows1255`
|
||
- `windows1256`
|
||
- `windows1257`
|
||
- `windows1258`
|
||
- `windows874`
|
||
- `utf-16be-bom`
|
||
- `utf-16le-bom`
|
||
|
||
|
||
## lineDelimiter
|
||
|
||
换行符相关配置
|
||
|
||
!!! example
|
||
|
||
```yaml
|
||
sources:
|
||
- type: file
|
||
name: demo
|
||
lineDelimiter:
|
||
type: carriage_return_line_feed
|
||
value: "\r\n"
|
||
charset: gbk
|
||
```
|
||
|
||
### type
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ---------------- | ------ | ---------- |-------|----------------------------|
|
||
| type | bool | 非必填 | auto | 只有在type是custome时候value才会有效 |
|
||
|
||
当前支持的type有
|
||
|
||
- `auto`
|
||
- `line_feed`
|
||
- `vertical_tab`
|
||
- `form_feed`
|
||
- `carriage_return`
|
||
- `carriage_return_line_feed`
|
||
- `next_line`
|
||
- `line_separator`
|
||
- `paragraph_separator`
|
||
- `null_terminator`
|
||
|
||
对应的换行符为:
|
||
|
||
```
|
||
auto: {'\u000A'},
|
||
line_feed: {'\u000A'},
|
||
vertical_tab: {'\u000B'},
|
||
form_feed: {'\u000C'},
|
||
carriage_return: {'\u000D'},
|
||
carriage_return_line_feed: []byte("\u000D\u000A"),
|
||
next_line: {'\u0085'},
|
||
line_separator: []byte("\u2028"),
|
||
paragraph_separator: []byte("\u2029"),
|
||
null_terminator: {'\u0000'},
|
||
```
|
||
|
||
### value
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| ------------ |--------| ---------- |-------|--------|
|
||
| value | string | 非必填 | \n | 换行符的内容 |
|
||
|
||
### charset
|
||
|
||
| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
|
||
| --------------- |--------| ---------- |-------|-------|
|
||
| charset | string | 非必填 | utf-8 | 换行符编码 |
|
||
|
||
|