diff --git a/content/docs/3.0/reference/clients.md b/content/docs/3.0/reference/clients.md deleted file mode 100644 index f052438..0000000 --- a/content/docs/3.0/reference/clients.md +++ /dev/null @@ -1,4 +0,0 @@ ---- -title: Clients -draft: true ---- diff --git a/content/docs/3.0/reference/clients/_index.md b/content/docs/3.0/reference/clients/_index.md new file mode 100644 index 0000000..617ecfb --- /dev/null +++ b/content/docs/3.0/reference/clients/_index.md @@ -0,0 +1,7 @@ +--- +title: APIs +description: Interact with TiKV using the raw key-value API or the transactional key-value API +menu: + docs: + parent: Reference +--- \ No newline at end of file diff --git a/content/docs/3.0/concepts/apis.md b/content/docs/3.0/reference/clients/apis.md similarity index 100% rename from content/docs/3.0/concepts/apis.md rename to content/docs/3.0/reference/clients/apis.md diff --git a/content/docs/3.0/reference/clients/go-client-api.md b/content/docs/3.0/reference/clients/go-client-api.md new file mode 100644 index 0000000..e608433 --- /dev/null +++ b/content/docs/3.0/reference/clients/go-client-api.md @@ -0,0 +1,360 @@ +--- +title: Try Two Types of APIs +description: Learn how to use the Raw Key-Value API and the Transactional Key-Value API in TiKV. +menu: + docs: + parent: Clients +--- + +# Try Two Types of APIs + +To apply to different scenarios, TiKV provides [two types of APIs](../../overview.md#two-types-of-apis) for developers: the Raw Key-Value API and the Transactional Key-Value API. This document uses two examples to guide you through how to use the two APIs in TiKV. The usage examples are based on multiple nodes for testing. You can also quickly try the two types of APIs on a single machine. + +> **Warning:** Do not use these two APIs together in the same cluster, otherwise they might corrupt each other's data. + +## Try the Raw Key-Value API + +To use the Raw Key-Value API in applications developed in the Go language, take the following steps: + +1. Install the necessary packages. + + ```bash + export GO111MODULE=on + go mod init rawkv-demo + go get github.com/pingcap/tidb@master + ``` + +2. Import the dependency packages. + + ```go + import ( + "fmt" + "github.com/pingcap/tidb/config" + "github.com/pingcap/tidb/store/tikv" + ) + ``` + +3. Create a Raw Key-Value client. + + ```go + cli, err := tikv.NewRawKVClient([]string{"192.168.199.113:2379"}, config.Security{}) + ``` + + Description of two parameters in the above command: + + - `string`: a list of PD servers’ addresses + - `config.Security`: used to establish TLS connections, usually left empty when you do not need TLS + +4. Call the Raw Key-Value client methods to access the data on TiKV. The Raw Key-Value API contains the following methods, and you can also find them at [GoDoc](https://godoc.org/github.com/pingcap/tidb/store/tikv#RawKVClient). + + ```go + type RawKVClient struct + func (c *RawKVClient) Close() error + func (c *RawKVClient) ClusterID() uint64 + func (c *RawKVClient) Delete(key []byte) error + func (c *RawKVClient) Get(key []byte) ([]byte, error) + func (c *RawKVClient) Put(key, value []byte) error + func (c *RawKVClient) Scan(startKey, endKey []byte, limit int) (keys [][]byte, values [][]byte, err error) + ``` + +### Usage example of the Raw Key-Value API + +```go +package main + +import ( + "fmt" + + "github.com/pingcap/tidb/config" + "github.com/pingcap/tidb/store/tikv" +) + +func main() { + cli, err := tikv.NewRawKVClient([]string{"192.168.199.113:2379"}, config.Security{}) + if err != nil { + panic(err) + } + defer cli.Close() + + fmt.Printf("cluster ID: %d\n", cli.ClusterID()) + + key := []byte("Company") + val := []byte("PingCAP") + + // put key into tikv + err = cli.Put(key, val) + if err != nil { + panic(err) + } + fmt.Printf("Successfully put %s:%s to tikv\n", key, val) + + // get key from tikv + val, err = cli.Get(key) + if err != nil { + panic(err) + } + fmt.Printf("found val: %s for key: %s\n", val, key) + + // delete key from tikv + err = cli.Delete(key) + if err != nil { + panic(err) + } + fmt.Printf("key: %s deleted\n", key) + + // get key again from tikv + val, err = cli.Get(key) + if err != nil { + panic(err) + } + fmt.Printf("found val: %s for key: %s\n", val, key) +} +``` + +The result is like: + +```bash +INFO[0000] [pd] create pd client with endpoints [192.168.199.113:2379] +INFO[0000] [pd] leader switches to: http://127.0.0.1:2379, previous: +INFO[0000] [pd] init cluster id 6554145799874853483 +cluster ID: 6554145799874853483 +Successfully put Company:PingCAP to tikv +found val: PingCAP for key: Company +key: Company deleted +found val: for key: Company +``` + +RawKVClient is a client of the TiKV server and only supports the GET/PUT/DELETE/SCAN commands. The RawKVClient can be safely and concurrently accessed by multiple goroutines, as long as it is not closed. Therefore, for one process, one client is enough generally. + +### Possible Error + +- If you see this error: + + ```bash + build rawkv-demo: cannot load github.com/pingcap/pd/pd-client: cannot find module providing package github.com/pingcap/pd/pd-client + ``` + + You can run `GO111MODULE=on go get -u github.com/pingcap/tidb@master` to fix it. + +- If you got this error when you run `go get -u github.com/pingcap/tidb@master`: + + ``` + go: github.com/golang/lint@v0.0.0-20190409202823-959b441ac422: parsing go.mod: unexpected module path "golang.org/x/lint" + ``` + + You can run `go mod edit -replace github.com/golang/lint=golang.org/x/lint@latest` to fix it. [Refer Link](https://github.com/golang/lint/issues/446#issuecomment-483638233) + +## Try the Transactional Key-Value API + +The Transactional Key-Value API is more complicated than the Raw Key-Value API. Some transaction related concepts are listed as follows. For more details, see the [KV package](https://github.com/pingcap/tidb/tree/master/kv). + +- Storage + + Like the RawKVClient, a Storage is an abstract TiKV cluster. + +- Snapshot + + A Snapshot is the state of a Storage at a particular point of time, which provides some readonly methods. The multiple times read from a same Snapshot is guaranteed consistent. + +- Transaction + + Like the transactions in SQL, a Transaction symbolizes a series of read and write operations performed within the Storage. Internally, a Transaction consists of a Snapshot for reads, and a MemBuffer for all writes. The default isolation level of a Transaction is Snapshot Isolation. + +To use the Transactional Key-Value API in applications developed by golang, take the following steps: + +1. Install the necessary packages. + + ```bash + export GO111MODULE=on + go mod init txnkv-demo + go get github.com/pingcap/tidb@master + ``` + +2. Import the dependency packages. + + ```go + import ( + "flag" + "fmt" + "os" + + "github.com/juju/errors" + "github.com/pingcap/tidb/kv" + "github.com/pingcap/tidb/store/tikv" + "github.com/pingcap/tidb/terror" + + goctx "golang.org/x/net/context" + ) + ``` + +3. Create Storage using a URL scheme. + + ```go + driver := tikv.Driver{} + storage, err := driver.Open("tikv://192.168.199.113:2379") + ``` + +4. (Optional) Modify the Storage using a Transaction. + + The lifecycle of a Transaction is: _begin → {get, set, delete, scan} → {commit, rollback}_. + +5. Call the Transactional Key-Value API's methods to access the data on TiKV. The Transactional Key-Value API contains the following methods: + + ```go + Begin() -> Txn + Txn.Get(key []byte) -> (value []byte) + Txn.Set(key []byte, value []byte) + Txn.Iter(begin, end []byte) -> Iterator + Txn.Delete(key []byte) + Txn.Commit() + ``` + +### Usage example of the Transactional Key-Value API + +```go +package main + +import ( + "flag" + "fmt" + "os" + + "github.com/juju/errors" + "github.com/pingcap/tidb/kv" + "github.com/pingcap/tidb/store/tikv" + "github.com/pingcap/tidb/terror" + + goctx "golang.org/x/net/context" +) + +type KV struct { + K, V []byte +} + +func (kv KV) String() string { + return fmt.Sprintf("%s => %s (%v)", kv.K, kv.V, kv.V) +} + +var ( + store kv.Storage + pdAddr = flag.String("pd", "192.168.199.113:2379", "pd address:192.168.199.113:2379") +) + +// Init initializes information. +func initStore() { + driver := tikv.Driver{} + var err error + store, err = driver.Open(fmt.Sprintf("tikv://%s", *pdAddr)) + terror.MustNil(err) +} + +// key1 val1 key2 val2 ... +func puts(args ...[]byte) error { + tx, err := store.Begin() + if err != nil { + return errors.Trace(err) + } + + for i := 0; i < len(args); i += 2 { + key, val := args[i], args[i+1] + err := tx.Set(key, val) + if err != nil { + return errors.Trace(err) + } + } + err = tx.Commit(goctx.Background()) + if err != nil { + return errors.Trace(err) + } + + return nil +} + +func get(k []byte) (KV, error) { + tx, err := store.Begin() + if err != nil { + return KV{}, errors.Trace(err) + } + v, err := tx.Get(k) + if err != nil { + return KV{}, errors.Trace(err) + } + return KV{K: k, V: v}, nil +} + +func dels(keys ...[]byte) error { + tx, err := store.Begin() + if err != nil { + return errors.Trace(err) + } + for _, key := range keys { + err := tx.Delete(key) + if err != nil { + return errors.Trace(err) + } + } + err = tx.Commit(goctx.Background()) + if err != nil { + return errors.Trace(err) + } + return nil +} + +func scan(keyPrefix []byte, limit int) ([]KV, error) { + tx, err := store.Begin() + if err != nil { + return nil, errors.Trace(err) + } + it, err := tx.Iter(kv.Key(keyPrefix), nil) + if err != nil { + return nil, errors.Trace(err) + } + defer it.Close() + var ret []KV + for it.Valid() && limit > 0 { + ret = append(ret, KV{K: it.Key()[:], V: it.Value()[:]}) + limit-- + it.Next() + } + return ret, nil +} + +func main() { + pdAddr := os.Getenv("PD_ADDR") + if pdAddr != "" { + os.Args = append(os.Args, "-pd", pdAddr) + } + flag.Parse() + initStore() + + // set + err := puts([]byte("key1"), []byte("value1"), []byte("key2"), []byte("value2")) + terror.MustNil(err) + + // get + kv, err := get([]byte("key1")) + terror.MustNil(err) + fmt.Println(kv) + + // scan + ret, err := scan([]byte("key"), 10) + for _, kv := range ret { + fmt.Println(kv) + } + + // delete + err = dels([]byte("key1"), []byte("key2")) + terror.MustNil(err) +} +``` + +The result is like: + +```bash +INFO[0000] [pd] create pd client with endpoints [192.168.199.113:2379] +INFO[0000] [pd] leader switches to: http://192.168.199.113:2379, previous: +INFO[0000] [pd] init cluster id 6563858376412119197 +key1 => value1 ([118 97 108 117 101 49]) +key1 => value1 ([118 97 108 117 101 49]) +key2 => value2 ([118 97 108 117 101 50]) +``` diff --git a/content/docs/3.0/reference/configuration/_index.md b/content/docs/3.0/reference/configuration/_index.md new file mode 100644 index 0000000..86baa30 --- /dev/null +++ b/content/docs/3.0/reference/configuration/_index.md @@ -0,0 +1,7 @@ +--- +title: Configuration +description: How to configure TiKV +menu: + docs: + parent: Reference +--- \ No newline at end of file diff --git a/content/docs/3.0/reference/configuration/coprocessor-config.md b/content/docs/3.0/reference/configuration/coprocessor-config.md new file mode 100644 index 0000000..0f28bcf --- /dev/null +++ b/content/docs/3.0/reference/configuration/coprocessor-config.md @@ -0,0 +1,85 @@ +--- +title: TiKV Coprocessor Configuration +description: Learn how to configure Coprocessor in TiKV. +menu: + docs: + parent: Configuration +--- + +# TiKV Coprocessor Configuration + +Coprocessor is the component that handles most of the read requests from TiDB. Unlike Storage, it is more high-leveled that it not only fetches KV data but also does computing like filter or aggregation. TiKV is used as a distribution computing engine and Coprocessor is also used to reduce data serialization and traffic. This document describes how to configure TiKV Coprocessor. + +## Configuration + +Most Coprocessor configurations are in the `[readpool.coprocessor]` section and some configurations are in the `[server]` section. + +### `[readpool.coprocessor]` + +There are three thread pools for handling high priority, normal priority and low priority requests respectively. TiDB point select is high priority, range scan is normal priority and background jobs like table analyzing is low priority. + +#### `high-concurrency` + +- Specifies the thread pool size for handling high priority Coprocessor requests +- Default value: number of cores * 0.8 (> 8 cores) or 8 (<= 8 cores) +- Minimum value: 1 +- It must be larger than zero but should not exceed the number of CPU cores of the host machine +- On a machine with more than 8 CPU cores, its default value is NUM_CPUS * 0.8. Otherwise it is 8 +- If you are running multiple TiKV instances on the same machine, make sure that the sum of this configuration item does not exceed the number of CPU cores. For example, assuming that you have a 48 core server running 3 TiKVs, then the `high-concurrency` value for each instance should be less than 16 +- Do not set it to a too small value, otherwise your read request QPS is limited. On the other hand, a larger value is not always the optimal choice because there might be larger resource contention + +#### `normal-concurrency` + +- Specifies the thread pool size for handling normal priority Coprocessor requests +- Default value: number of cores * 0.8 (> 8 cores) or 8 (<= 8 cores) +- Minimum value: 1 + +#### `low-concurrency` + +- Specifies the thread pool size for handling low priority Coprocessor requests +- Default value: number of cores * 0.8 (> 8 cores) or 8 (<= 8 cores) +- Minimum value: 1 +- Generally, you don’t need to ensure that the sum of high + normal + low < the number of CPU cores, because a single Coprocessor request is handled by only one of them + +#### `max-tasks-per-worker-high` + +- Specifies the max number of running operations for each thread in high priority thread pool +- Default value: number of cores * 0.8 (> 8 cores) or 8 (<= 8 cores) +- Minimum value: 1 +- Because actually a throttle of the thread-pool level instead of single thread level is performed, the max number of running operations for the thread pool is limited to `max-tasks-per-worker-high * high-concurrency`. If the number of running operations exceeds this configuration, new operations are simply rejected without being handled and it contains an error header telling that TiKV is busy +- Generally, you don’t need to adjust this configuration unless you are following trustworthy advice + +#### `max-tasks-per-worker-normal` + +- Specifies the max running operations for each thread in the normal priority thread pool +- Default value: 2000 +- Minimum value: 2000 + +#### `max-tasks-per-worker-low` + +- Specifies the max running operations for each thread in the low priority thread pool +- Default value: 2000 +- Minimum value: 2000 + +#### `stack-size` + +- Sets the stack size for each thread in the three thread pools +- Default value: 10MB +- Minimum value: 2MB +- For large requests, you need a large stack to handle. Some Coprocessor requests are extremely large, change with caution + +### `[server]` + +#### `end-point-recursion-limit` + +- Sets the max allowed recursions when decoding Coprocessor DAG expressions +- Default value: 1000 +- Minimum value: 100 +- Smaller value might cause large Coprocessor DAG requests to fail + +#### `end-point-request-max-handle-duration` + +- Sets the max allowed waiting time for each request +- Default value: 60s +- Minimum value: 60s +- When there are many backlog Coprocessor requests, new requests might wait in queue. If the waiting time of a request exceeds this configuration, it is rejected with the TiKV busy error and is not handled \ No newline at end of file diff --git a/content/docs/3.0/reference/configuration/grpc-config.md b/content/docs/3.0/reference/configuration/grpc-config.md new file mode 100644 index 0000000..1d599dd --- /dev/null +++ b/content/docs/3.0/reference/configuration/grpc-config.md @@ -0,0 +1,51 @@ +--- +title: gRPC Configuration +description: Learn how to configure gRPC. +menu: + docs: + parent: Configuration +--- + +# gRPC Configuration + +TiKV uses gRPC, a remote procedure call (RPC) framework, to build a distributed transactional key-value database. gRPC is designed to be high-performance, but ill-configured gRPC leads to performance regression of TiKV. This document describes how to configure gRPC. + +## grpc-compression-type + +- Compression type for the gRPC channel +- Default: "none" +- Available values are “none”, “deflate” and “gzip” +- To exchange the CPU time for network I/O, you can set it to “deflate” or “gzip”. It is useful when the network bandwidth is limited + +## grpc-concurrency + +- The size of the thread pool that drives gRPC +- Default: 4. It is suitable for a commodity computer. You can double the size if TiKV is deployed in a high-end server (32 core+ CPU) +- Higher concurrency is for higher QPS, but it consumes more CPU + +## grpc-concurrent-stream + +- The number of max concurrent streams/requests on a connection +- Default: 1024. It is suitable for most workload +- Increase the number if you find that most of your requests are not time consuming, e.g., RawKV Get + +## grpc-keepalive-time + +- Time to wait before sending out a ping to check whether the server is still alive. This is only for the communication between TiKV instances +- Default: 10s + +## grpc-keepalive-timeout + +- Time to wait before closing the connection without receiving the `keepalive` ping ACK +- Default: 3s + +## grpc-raft-conn-num + +- The number of connections with each TiKV server to send Raft messages +- Default: 10 + +## grpc-stream-initial-window-size + +- Amount to Read Ahead on individual gRPC streams +- Default: 2MB +- Larger values can help throughput on high-latency connections \ No newline at end of file diff --git a/content/docs/3.0/reference/configuration/pd-scheduler-config.md b/content/docs/3.0/reference/configuration/pd-scheduler-config.md new file mode 100644 index 0000000..fcae739 --- /dev/null +++ b/content/docs/3.0/reference/configuration/pd-scheduler-config.md @@ -0,0 +1,102 @@ +--- +title: PD Scheduler Configuration +description: Learn how to configure PD Scheduler. +menu: + docs: + parent: Configuration +--- + +# PD Scheduler Configuration + +PD Scheduler is responsible for scheduling the storage and computing resources. PD has many kinds of schedulers to meet the requirements in different scenarios. PD Scheduler is one of the most important component in PD. + +The basic workflow of PD Scheduler is as follows. First, the scheduler is triggered according to `minAdjacentSchedulerInterval` defined in `ScheduleController`. Then it tries to select the source store and the target store, create the corresponding operators and send a message to TiKV to do some operations. + +## Usage description + +This section describes the usage of PD Scheduler parameters. + +### `max-merge-region-keys && max-merge-region-size` + +If the Region size is smaller than `max-merge-region-size` and the number of keys in the Region is smaller than `max-merge-region-keys` at the same time, the Region will try to merge with adjacent Regions. The default value of both the two parameters is 0. Currently, `merge` is not enabled by default. + +### `split-merge-interval` + +`split-merge-interval` is the minimum interval time to allow merging after split. The default value is "1h". + +### `max-snapshot-count` + +If the snapshot count of one store is larger than the value of `max-snapshot-count`, it will never be used as a source or target store. The default value is 3. + +### `max-pending-peer-count` + +If the pending peer count of one store is larger than the value of `max-pending-peer-count`, it will never be used as a source or target store. The default value is 16. + +### `max-store-down-time` + +`max-store-down-time` is the maximum duration after which a store is considered to be down if it has not reported heartbeats. The default value is “30m”. + +### `leader-schedule-limit` + +`leader-schedule-limit` is the maximum number of coexistent leaders that are under scheduling. The default value is 4. + +### `region-schedule-limit` + +`region-schedule-limit` is the maximum number of coexistent Regions that are under scheduling. The default value is 4. + +### `replica-schedule-limit` + +`replica-schedule-limit` is the maximum number of coexistent replicas that are under scheduling. The default value is 8. + +### `merge-schedule-limit` + +`merge-schedule-limit` is the maximum number of coexistent merges that are under scheduling. The default value is 8. + +### `tolerant-size-ratio` + +`tolerant-size-ratio` is the ratio of buffer size for the balance scheduler. The default value is 5.0. + +### `low-space-ratio` + +`low-space-ratio` is the lowest usage ratio of a storage which can be regarded as low space. When a storage is in low space, the score turns to be high and varies inversely with the available size. + +### `high-space-ratio` + +`high-space-ratio` is the highest usage ratio of storage which can be regarded as high space. High space means there is a lot of available space of the storage, and the score varies directly with the used size. + +### `disable-raft-learner` + +`disable-raft-learner` is the option to disable `AddNode` and use `AddLearnerNode` instead. + +### `disable-remove-down-replica` + +`disable-remove-down-replica` is the option to prevent replica checker from removing replicas whose status are down. + +### `disable-replace-offline-replica` + +`disable-replace-offline-replica` is the option to prevent the replica checker from replacing offline replicas. + +### `disable-make-up-replica` + +`disable-make-up-replica` is the option to prevent the replica checker from making up replicas when the count of replicas is less than expected. + +### `disable-remove-extra-replica` + +`disable-remove-extra-replica` is the option to prevent the replica checker from removing extra replicas. + +### `disable-location-replacement` + +`disable-location-replacement` is the option to prevent the replica checker from moving the replica to a better location. + +## Customization + +The default schedulers include `balance-leader`, `balance-region` and `hot-region`. In addition, you can also customize the schedulers. For each scheduler, the configuration has three variables: `type`, `args` and `disable`. + +Here is an example to enable the `evict-leader` scheduler in the `config.toml` file: + +``` +[[schedule.schedulers]] +type = "evict-leader" +args = ["1"] +disable = false +``` \ No newline at end of file diff --git a/content/docs/3.0/reference/configuration/raftstore-config.md b/content/docs/3.0/reference/configuration/raftstore-config.md new file mode 100644 index 0000000..375b6ab --- /dev/null +++ b/content/docs/3.0/reference/configuration/raftstore-config.md @@ -0,0 +1,70 @@ +--- +title: Raftstore Configuration +description: Learn about Raftstore configuration in TiKV. +menu: + docs: + parent: Configuration +--- + +# Raftstore Configurations + +Raftstore is TiKV's implementation of [Multi-raft](https://tikv.org/deep-dive/scalability/multi-raft/) to manage multiple Raft peers on one node. Raftstore is comprised of two major components: + +- **Raftstore** component writes Raft logs into RaftDB. +- **Apply** component resolves Raft logs and flush the data in the log into the underlying storage engine. + +This document introduces the following features of Raftstore and their configurations: + +- [Multi-thread Raftstore](#multi-thread-raftstore) +- [Hibernate Region](#hibernate-region) + +## Multi-thread Raftstore + + Multi-thread support for the Raftstore and the Apply components means higher throughput and lower latency per each single node. In the multi-thread mode, each thread obtains peers from the queue in batch, so that small writes of multiple peers can be consolidated into a big write for better throughput. + +![Multi-thread Raftstore Model](../../images/multi-thread-raftstore.png) + +> **Note:** +> +> In the multi-thread mode, peers are obtained in batch, so pressure from hot write Regions cannot be scattered evenly to each CPU. For better load balancing, it is recommended you use smaller batch sizes. + +### Configuration items + +You can specify the following items in the TiKV configuration file to configure multi-thread Raftstore: + +**`raftstore.store_max_batch_size`** + +Determines the maximum number of peers that a single thread can obtain in a batch. The value must be a positive integer. A smaller value provides better load balancing for CPU, but may cause more frequent writes. + +**`raftstore.store_pool_size`** + +Determines the number of threads to process peers in batch. The value must be a positive integer. For better performance, it is recommended that you set a value less than or equal to the number of CPU cores on your machine. + +**`raftstore.apply_max_batch_size`** + +Determines the maximum number of ApplyDelegates requests that a single thread can resolve in a batch. The value must be a positive integer. A smaller value provides better load balancing for CPU, but may cause more frequent writes. + +**`raftstore.apply_pool_size`** + +Determines the number of threads. The value must be a positive integer. For better performance, it is recommended that you set a value less than or equal to the number of CPU cores on your machine. + +## Hibernate Region + +Hibernate Region is a Raftstore feature to reduce the extra overhead caused by heartbeat messages between the Raft leader and the followers for idle Regions. With this feature enabled, a Region idle for a long time is automatically set as hibernated. The heartbeat interval for the leader to maintain its lease becomes much longer, and the followers do not initiate elections simply because they cannot receive heartbeats from the leader. + +> **Note:** +> +> - Hibernate Region is still an Experimental feature and is disabled by default. +> - Any requests from the client or disconnections will activate the Region from the hibernated state. + +### Configuration items + +You can specify the following items in the TiKV configuration file to configure Hibernate Region: + +**`raftstore.hibernate-regions`** + +Enables or disables Hibernate Region. Possible values are true and false. The default value is false. + +**`raftstore.peer_stale_state_check_interval`** + +Modifies the state check interval for hibernated Regions. The default value is 5 minutes. This value also determines the heartbeat interval between the leader and followers of the hibernated Regions. \ No newline at end of file diff --git a/content/docs/3.0/reference/configuration/rocksdb-option-config.md b/content/docs/3.0/reference/configuration/rocksdb-option-config.md new file mode 100644 index 0000000..8ecd2bd --- /dev/null +++ b/content/docs/3.0/reference/configuration/rocksdb-option-config.md @@ -0,0 +1,383 @@ +--- +title: RocksDB Option Configuration +description: Learn how to configure RocksDB options. +menu: + docs: + parent: Configuration +--- + +# RocksDB Option Configuration + +TiKV uses RocksDB as its underlying storage engine for storing both Raft logs and KV (key-value) pairs. [RocksDB](https://github.com/facebook/rocksdb/wiki) is a highly customizable persistent key-value store that can be tuned to run on a variety of production environments, including pure memory, Flash, hard disks or HDFS. It supports various compression algorithms and good tools for production support and debugging. + +## Configuration + +TiKV creates two RocksDB instances called `rocksdb` and `raftdb` separately. + +- `rocksdb` has three column families: + + - `rocksdb.defaultcf` is used to store actual KV pairs of TiKV + - `rocksdb.writecf` is used to store the commit information in the MVCC model + - `rocksdb.lockcf` is used to store the lock information in the MVCC model + +- `raftdb` has only one column family called `raftdb.defaultcf`, which is used to store the Raft logs. + +Each RocksDB instance and column family is configurable. Below explains the details of DBOptions for tuning the RocksDB instance and CFOptions for tuning the column family. + +### DBOptions + +#### max-background-jobs + +- The maximum number of concurrent background jobs (compactions and flushes) + +#### max-sub-compactions + +- The maximum number of threads that will concurrently perform a compaction job by breaking the job into multiple smaller ones that run simultaneously + +#### max-open-files + +- The number of open files that can be used by RocksDB. You may need to increase this if your database has a large working set +- Value -1 means files opened are always kept open. You can estimate the number of files based on `target_file_size_base` and `target_file_size_multiplier` for level-based compaction +- If max-open-files = -1, RocksDB will prefetch index blocks and filter blocks into block cache at startup, so if your database has a large working set, it will take several minutes to open RocksDB + +#### max-manifest-file-size + +- The maximum size of RocksDB's MANIFEST file. For details, see [MANIFEST](https://github.com/facebook/rocksdb/wiki/MANIFEST) + +#### create-if-missing + +- If it is true, the database will be created when it is missing + +#### wal-recovery-mode + +RocksDB WAL(write-ahead log) recovery mode: + +- `0`: TolerateCorruptedTailRecords, tolerates incomplete record in trailing data on all logs +- `1`: AbsoluteConsistency, tolerates no We don't expect to find any corruption (all the I/O errors are considered as corruptions) in the WAL +- `2`: PointInTimeRecovery, recovers to point-in-time consistency +- `3`: SkipAnyCorruptedRecords, recovery after a disaster + +#### wal-dir + +- RocksDB write-ahead logs directory path. This specifies the absolute directory path for write-ahead logs +- If it is empty, the log files will be in the same directory as data +- When you set the path to the RocksDB directory in memory like in `/dev/shm`, you may want to set `wal-dir` to a directory on a persistent storage. For details, see [RocksDB documentation](https://github.com/facebook/rocksdb/wiki/How-to-persist-in-memory-RocksDB-database) + +#### wal-ttl-seconds + +See [wal-size-limit](#wal-size-limit) + +#### wal-size-limit + +`wal-ttl-seconds` and `wal-size-limit` affect how archived write-ahead logs will be deleted + +- If both are set to 0, logs will be deleted immediately and will not get into the archive +- If `wal-ttl-seconds` is 0 and `wal-size-limit` is not 0, + WAL files will be checked every 10 minutes and if the total size is greater + than `wal-size-limit`, WAL files will be deleted from the earliest position with the + earliest until `size_limit` is met. All empty files will be deleted +- If `wal-ttl-seconds` is not 0 and `wal-size-limit` is 0, + WAL files will be checked every wal-ttl-seconds / 2 and those that + are older than `wal-ttl-seconds` will be deleted +- If both are not 0, WAL files will be checked every 10 minutes and both `ttl` and `size` checks will be performed with ttl being first +- When you set the path to the RocksDB directory in memory like in `/dev/shm`, you may want to set `wal-ttl-seconds` to a value greater than 0 (like 86400) and backup your RocksDB on a regular basis. For details, see [RocksDB documentation](https://github.com/facebook/rocksdb/wiki/How-to-persist-in-memory-RocksDB-database) + +#### wal-bytes-per-sync + +- Allows OS to incrementally synchronize WAL to the disk while the log is being written + +#### max-total-wal-size + +- Once the total size of write-ahead logs exceeds this size, RocksDB will start forcing the flush of column families whose memtables are backed up by the oldest live WAL file +- If it is set to 0, we will dynamically set the WAL size limit to be [sum of all write_buffer_size * max_write_buffer_number] * 4 + +#### enable-statistics + +- RocksDB statistics provide cumulative statistics over time. Turning statistics on will introduce about 5%-10% overhead for RocksDB, but it is worthwhile to know the internal status of RocksDB + +#### stats-dump-period + +- Dumps statistics periodically in information logs + +#### compaction-readahead-size + +- According to [RocksDB FAQ](https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ): if you want to use RocksDB on multi disks or spinning disks, you should set this value to at least 2MB + +#### writable-file-max-buffer-size + +- The maximum buffer size that is used by `WritableFileWrite` + +#### use-direct-io-for-flush-and-compaction + +- Uses `O_DIRECT` for both reads and writes in background flush and compactions + +#### rate-bytes-per-sec + +- Limits the disk I/O of compaction and flush +- Compaction and flush can cause terrible spikes if they exceed a certain threshold. It is recommended to set this to 50% ~ 80% of the disk throughput for a more stable result. But for heavy write workload, limiting compaction and flush speed can cause write stalls too + +#### enable-pipelined-write + +- Enables/Disables the pipelined write. For details, see [Pipelined Write](https://github.com/facebook/rocksdb/wiki/Pipelined-Write) + +#### bytes-per-sync + +- Allows OS to incrementally synchronize files to the disk while the files are being written asynchronously in the background + +#### info-log-max-size + +- Specifies the maximum size of the RocksDB log file +- If the log file is larger than `max_log_file_size`, a new log file will be created +- If max_log_file_size == 0, all logs will be written to one log file + +#### info-log-roll-time + +- Time for the RocksDB log file to roll (in seconds) +- If it is specified with non-zero value, the log file will be rolled when its active time is longer than `log_file_time_to_roll` + +#### info-log-keep-log-file-num + +- The maximum number of RocksDB log files to be kept + +#### info-log-dir + +- Specifies the RocksDB info log directory +- If it is empty, the log files will be in the same directory as data +- If it is non-empty, the log files will be in the specified directory, and the absolute path of RocksDB data directory will be used as the prefix of the log file name + +### CFOptions + +#### compression-per-level + +- Per level compression. The compression method (if any) is used to compress a block + + - no: kNoCompression + - snappy: kSnappyCompression + - zlib: kZlibCompression + - bzip2: kBZip2Compression + - lz4: kLZ4Compression + - lz4hc: kLZ4HCCompression + - zstd: kZSTD + +- For details, see [Compression of RocksDB](https://github.com/facebook/rocksdb/wiki/Compression) + +#### block-size + +- Approximate size of user data packed per block. The block size specified here corresponds to the uncompressed data + +#### bloom-filter-bits-per-key + +- If you're doing point lookups, you definitely want to turn bloom filters on. Bloom filter is used to avoid unnecessary disk read +- Default: 10, which yields ~1% false positive rate +- Larger values will reduce false positive rate, but will increase memory usage and space amplification + +#### block-based-bloom-filter + +- False: one `sst` file has a corresponding bloom filter +- True: every block has a corresponding bloom filter + +#### level0-file-num-compaction-trigger + +- The number of files to trigger level-0 compaction +- A value less than 0 means that level-0 compaction will not be triggered by the number of files + +#### level0-slowdown-writes-trigger + +- Soft limit on the number of level-0 files. The write performance is slowed down at this point + +#### level0-stop-writes-trigger + +- The maximum number of level-0 files. The write operation is stopped at this point + +#### write-buffer-size + +- The amount of data to build up in memory (backed up by an unsorted log on the disk) before it is converted to a sorted on-disk file + +#### max-write-buffer-number + +- The maximum number of write buffers that are built up in memory + +#### min-write-buffer-number-to-merge + +- The minimum number of write buffers that will be merged together before writing to the storage + +#### max-bytes-for-level-base + +- Controls the maximum total data size for the base level (level 1). + +#### target-file-size-base + +- Target file size for compaction + +#### max-compaction-bytes + +- The maximum bytes for `compaction.max_compaction_bytes` + +#### compaction-pri + +There are four different algorithms to pick files to compact: + +- `0`: ByCompensatedSize +- `1`: OldestLargestSeqFirst +- `2`: OldestSmallestSeqFirst +- `3`: MinOverlappingRatio + +#### block-cache-size + +- Caches uncompressed blocks +- Big block-cache can speed up the read performance. Generally, this should be set to 30%-50% of the system's total memory + +#### cache-index-and-filter-blocks + +- Indicates if index/filter blocks will be put to the block cache +- If it is not specified, each "table reader" object will pre-load the index/filter blocks during table initialization + +#### pin-l0-filter-and-index-blocks + +- Pins level0 filter and index blocks in the cache + +#### read-amp-bytes-per-bit + +Enables read amplification statistics +- value => memory usage (percentage of loaded blocks memory) +- 0 => disable +- 1 => 12.50 % +- 2 => 06.25 % +- 4 => 03.12 % +- 8 => 01.56 % +- 16 => 00.78 % + +#### dynamic-level-bytes + +- Picks the target size of each level dynamically +- This feature can reduce space amplification. It is highly recommended to setit to true. For details, see [Dynamic Level Size for Level-Based Compaction]( https://rocksdb.org/blog/2015/07/23/dynamic-level.html) + +## Template + +This template shows the default RocksDB configuration for TiKV: + +``` +[rocksdb] +max-background-jobs = 8 +max-sub-compactions = 1 +max-open-files = 40960 +max-manifest-file-size = "20MB" +create-if-missing = true +wal-recovery-mode = 2 +wal-dir = "/tmp/tikv/store" +wal-ttl-seconds = 0 +wal-size-limit = 0 +max-total-wal-size = "4GB" +enable-statistics = true +stats-dump-period = "10m" +compaction-readahead-size = 0 +writable-file-max-buffer-size = "1MB" +use-direct-io-for-flush-and-compaction = false +rate-bytes-per-sec = 0 +enable-pipelined-write = true +bytes-per-sync = "0MB" +wal-bytes-per-sync = "0KB" +info-log-max-size = "1GB" +info-log-roll-time = "0" +info-log-keep-log-file-num = 10 +info-log-dir = "" + +# Column Family default used to store actual data of the database. +[rocksdb.defaultcf] +compression-per-level = ["no", "no", "lz4", "lz4", "lz4", "zstd", "zstd"] +block-size = "64KB" +bloom-filter-bits-per-key = 10 +block-based-bloom-filter = false +level0-file-num-compaction-trigger = 4 +level0-slowdown-writes-trigger = 20 +level0-stop-writes-trigger = 36 +write-buffer-size = "128MB" +max-write-buffer-number = 5 +min-write-buffer-number-to-merge = 1 +max-bytes-for-level-base = "512MB" +target-file-size-base = "8MB" +max-compaction-bytes = "2GB" +compaction-pri = 3 +block-cache-size = "1GB" +cache-index-and-filter-blocks = true +pin-l0-filter-and-index-blocks = true +read-amp-bytes-per-bit = 0 +dynamic-level-bytes = true + +# Options for Column Family write +# Column Family write used to store commit information in MVCC model +[rocksdb.writecf] +compression-per-level = ["no", "no", "lz4", "lz4", "lz4", "zstd", "zstd"] +block-size = "64KB" +write-buffer-size = "128MB" +max-write-buffer-number = 5 +min-write-buffer-number-to-merge = 1 +max-bytes-for-level-base = "512MB" +target-file-size-base = "8MB" +# In normal cases it should be tuned to 10%-30% of the system's total memory. +block-cache-size = "256MB" +level0-file-num-compaction-trigger = 4 +level0-slowdown-writes-trigger = 20 +level0-stop-writes-trigger = 36 +cache-index-and-filter-blocks = true +pin-l0-filter-and-index-blocks = true +compaction-pri = 3 +read-amp-bytes-per-bit = 0 +dynamic-level-bytes = true + +[rocksdb.lockcf] +compression-per-level = ["no", "no", "no", "no", "no", "no", "no"] +block-size = "16KB" +write-buffer-size = "128MB" +max-write-buffer-number = 5 +min-write-buffer-number-to-merge = 1 +max-bytes-for-level-base = "128MB" +target-file-size-base = "8MB" +block-cache-size = "256MB" +level0-file-num-compaction-trigger = 1 +level0-slowdown-writes-trigger = 20 +level0-stop-writes-trigger = 36 +cache-index-and-filter-blocks = true +pin-l0-filter-and-index-blocks = true +compaction-pri = 0 +read-amp-bytes-per-bit = 0 +dynamic-level-bytes = true + +[raftdb] +max-sub-compactions = 1 +max-open-files = 40960 +max-manifest-file-size = "20MB" +create-if-missing = true +enable-statistics = true +stats-dump-period = "10m" +compaction-readahead-size = 0 +writable-file-max-buffer-size = "1MB" +use-direct-io-for-flush-and-compaction = false +enable-pipelined-write = true +allow-concurrent-memtable-write = false +bytes-per-sync = "0MB" +wal-bytes-per-sync = "0KB" +info-log-max-size = "1GB" +info-log-roll-time = "0" +info-log-keep-log-file-num = 10 +info-log-dir = "" + +[raftdb.defaultcf] +compression-per-level = ["no", "no", "lz4", "lz4", "lz4", "zstd", "zstd"] +block-size = "64KB" +write-buffer-size = "128MB" +max-write-buffer-number = 5 +min-write-buffer-number-to-merge = 1 +max-bytes-for-level-base = "512MB" +target-file-size-base = "8MB" +# should tune to 256MB~2GB. +block-cache-size = "256MB" +level0-file-num-compaction-trigger = 4 +level0-slowdown-writes-trigger = 20 +level0-stop-writes-trigger = 36 +cache-index-and-filter-blocks = true +pin-l0-filter-and-index-blocks = true +compaction-pri = 0 +read-amp-bytes-per-bit = 0 +dynamic-level-bytes = true +``` \ No newline at end of file diff --git a/content/docs/3.0/reference/configuration/security-config.md b/content/docs/3.0/reference/configuration/security-config.md new file mode 100644 index 0000000..b16b3bb --- /dev/null +++ b/content/docs/3.0/reference/configuration/security-config.md @@ -0,0 +1,23 @@ +--- +title: TiKV Security Configuration +description: Learn about the security configuration in TiKV. +menu: + docs: + parent: Configuration +--- + +# TiKV Security Configuration + +TiKV has SSL/TLS integration to encrypt the data exchanged between nodes. This document describes the security configuration in the TiKV cluster. + +## ca-path = "/path/to/ca.pem" + +The path to the file that contains the PEM encoding of the server’s CA certificates. + +## cert-path = "/path/to/cert.pem" + +The path to the file that contains the PEM encoding of the server’s certificate chain. + +## key-path = "/path/to/key.pem" + +The path to the file that contains the PEM encoding of the server’s private key. \ No newline at end of file diff --git a/content/docs/3.0/reference/configuration/storage-config.md b/content/docs/3.0/reference/configuration/storage-config.md new file mode 100644 index 0000000..f86b8f4 --- /dev/null +++ b/content/docs/3.0/reference/configuration/storage-config.md @@ -0,0 +1,110 @@ +--- +title: TiKV Storage Configuration +description: Learn how to configure TiKV Storage. +menu: + docs: + parent: Configuration +--- + +# TiKV Storage Configuration + +In TiKV, Storage is the component responsible for handling read and write requests. Note that if you are using TiKV with TiDB, most read requests are handled by the Coprocessor component instead of Storage. + +## Configuration + +There are two sections related to Storage: `[readpool.storage]` and `[storage]`. + +### `[readpool.storage]` + +This configuration section mainly affects storage read operations. Most read requests from TiDB are not controlled by this configuration section. For configuring the read requests from TiDB, see [Coprocessor configurations](coprocessor-config.md). + +There are 3 thread pools for handling read operations, namely read-high, read-normal and read-low, which process high-priority, normal-priority and low-priority read requests respectively. The priority can be specified by corresponding fields in the gRPC request. + +#### `high-concurrency` + +- Specifies the thread pool size for handling high priority requests +- Default value: 4. It means at most 4 CPU cores are used +- Minimum value: 1 +- It must be larger than zero but should not exceed the number of CPU cores of the host machine +- If you are running multiple TiKV instances on the same machine, make sure that the sum of this configuration item does not exceed number of CPU cores. For example, assuming that you have a 48 core server running 3 TiKVs, then the `high-concurrency` value for each instance should be less than 16 +- Do not set this configuration item to a too small value, otherwise your read request QPS is limited. On the other hand, larger value is not always the most optimal choice because there could be larger resource contention + + +#### `normal-concurrency` + +- Specifies the thread pool size for handling normal priority requests +- Default value: 4 +- Minimum value: 1 + +#### `low-concurrency` + +- Specifies the thread pool size for handling low priority requests +- Default value: 4 +- Minimum value: 1 +- Generally, you don’t need to ensure that the sum of high + normal + low < number of CPU cores, because a single request is handled by only one of them + +#### `max-tasks-per-worker-high` + +- Specifies the max number of running operations for each thread in the read-high thread pool, which handles high priority read requests. Because a throttle of the thread-pool level instead of single thread level is performed, the max number of running operations for the read-high thread pool is limited to `max-tasks-per-worker-high * high-concurrency` +- Default value: 2000 +- Minimum value: 2000 +- If the number of running operations exceeds this configuration, new operations are simply rejected without being handled and it will contain an error header telling that TiKV is busy +- Generally, you don’t need to adjust this configuration unless you are following trustworthy advice + +#### `max-tasks-per-worker-normal` + +- Specifies the max running operations for each thread in the read-normal thread pool, which handles normal priority read requests. +- Default value: 2000 +- Minimum value: 2000 + +#### `max-tasks-per-worker-low` + +- Specifies the max running operations for each thread in the read-low thread pool, which handles low priority read requests +- Default value: 2000 +- Minimum value: 2000 + +#### `stack-size` + +- Sets the stack size for each thread in the three thread pools. For large requests, you need a large stack to handle +- Default value: 10MB +- Minimum value: 2MB + +### `[storage]` + +This configuration section mainly affects storage write operations, including where data is stored and the TiKV component Scheduler. Scheduler is the core component in Storage that coordinates and processes write requests. It contains a channel to coordinate requests and a thread pool to process requests. + +#### `data-dir` + +- Specifies the path to the data directory +- Default value: /tmp/tikv/store +- Make sure that the data directory is moved before changing this configuration + +#### `scheduler-notify-capacity` + +- Specifies the Scheduler channel size +- Default value: 10240 +- Do not set it too small, otherwise TiKV might crash +- Do not set it too large, because it might consume more memory +- Generally, you don’t need to adjust this configuration unless you are following trustworthy advice + +#### `scheduler-concurrency` + +- Specifies the number of slots of Scheduler’s latch, which controls concurrent write requests +- Default value: 2048000 +- You can set it to a larger value to reduce latch contention if there are a lot of write requests. But it will consume more memory + +#### `scheduler-worker-pool-size` + +- Specifies the Scheduler’s thread pool size. Write requests are finally handled by each worker thread of this thread pool +- Default value: 8 (>= 16 cores) or 4 (< 16 cores) +- Minimum value: 1 +- This configuration must be set larger than zero but should not exceed the number of CPU cores of the host machine +- On machines with more than 16 CPU cores, the default value of this configuration is 8, otherwise 4 +- If you have heavy write requests, you can set this configuration to a larger value. If you are running multiple TiKV instances on the same machine, make sure that the sum of this configuration item does not exceed the number of CPU cores +- You should not set this configuration item to a too small value, otherwise your write request QPS is limited. On the other hand, a larger value is not always the most optimal choice because there could be larger resource contention + +#### `scheduler-pending-write-threshold` + +- Specifies the maximum allowed byte size of pending writes +- Default value: 100MB +- If the size of pending write bytes exceeds this threshold, new requests are simply rejected with the “scheduler too busy” error and not be handled \ No newline at end of file diff --git a/content/docs/3.0/reference/tools.md b/content/docs/3.0/reference/tools.md deleted file mode 100644 index a8a47ed..0000000 --- a/content/docs/3.0/reference/tools.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -title: Tools -draft: true -menu: - docs: - parent: Reference ---- diff --git a/content/docs/3.0/reference/tools/_index.md b/content/docs/3.0/reference/tools/_index.md new file mode 100644 index 0000000..6feefc7 --- /dev/null +++ b/content/docs/3.0/reference/tools/_index.md @@ -0,0 +1,7 @@ +--- +title: Tools +description: Tools which can be used to administrate TiKV +menu: + docs: + parent: Reference +--- \ No newline at end of file diff --git a/content/docs/3.0/reference/tools/pd-control.md b/content/docs/3.0/reference/tools/pd-control.md new file mode 100644 index 0000000..79f0a1e --- /dev/null +++ b/content/docs/3.0/reference/tools/pd-control.md @@ -0,0 +1,933 @@ +--- +title: PD Control User Guide +description: Use PD Control to obtain the state information of a cluster and tune a cluster. +menu: + docs: + parent: Tools +--- + +# PD Control User Guide + +As a command line tool of PD, PD Control obtains the state information of the cluster and tunes the cluster. + +## Source code compiling + +1. [Go](https://golang.org/) Version 1.11 or later +2. In the root directory of the [PD project](https://github.com/pingcap/pd), use the `make` command to compile and generate `bin/pd-ctl` + +> **Note:** Generally, you do not need to compile source code because the PD Control tool already exists in the released Binary or Docker. For developer users, the `make` command can be used to compile source code. + +## Usage + +Single-command mode: + +```bash +./pd-ctl store -d -u http://127.0.0.1:2379 +``` + +Interactive mode: + +```bash +./pd-ctl -u http://127.0.0.1:2379 +``` + +Use environment variables: + +```bash +export PD_ADDR=http://127.0.0.1:2379 +./pd-ctl +``` + +Use TLS to encrypt: + +```bash +./pd-ctl -u https://127.0.0.1:2379 --cacert="path/to/ca" --cert="path/to/cert" --key="path/to/key" +``` + +## Command line flags + +### \-\-pd,-u + ++ PD address ++ Default address: http://127.0.0.1:2379 ++ Environment variable: PD_ADDR + +### \-\-detach,-d + ++ Use single command line mode (not entering readline) ++ Default: false + +### --cacert + ++ Specify the path to the certificate file of the trusted CA in PEM format ++ Default: "" + +### --cert + ++ Specify the path to the certificate of SSL in PEM format ++ Default: "" + +### --key + ++ Specify the path to the certificate key file of SSL in PEM format, which is the private key of the certificate specified by `--cert` ++ Default: "" + +### --version,-V + ++ Print the version information and exit ++ Default: false + +## Command + +### `cluster` + +Use this command to view the basic information of the cluster. + +Usage: + +```bash +>> cluster // To show the cluster information +{ + "id": 6493707687106161130, + "max_peer_count": 3 +} +``` + +### `config [show | set