Added generated API doc (#747)

This commit is contained in:
Yinan Li 2019-12-19 14:34:49 -08:00 committed by GitHub
parent 984dec3a1c
commit b2a326ecaf
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 2720 additions and 180 deletions

View File

@ -36,3 +36,10 @@ install-sparkctl: | sparkctl/sparkctl-darwin-amd64 sparkctl/sparkctl-linux-amd64
else \
echo "$(UNAME) not supported"; \
fi
build-api-docs:
hack/api-ref-docs \
-config hack/api-docs-config.json \
-api-dir github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/apis/sparkoperator.k8s.io/v1beta2 \
-template-dir hack/api-docs-template \
-out-file docs/api-docs.md

View File

@ -57,7 +57,7 @@ Get started quickly with the Kubernetes Operator for Apache Spark using the [Qui
If you are running the Kubernetes Operator for Apache Spark on Google Kubernetes Engine and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the [GCP guide](docs/gcp.md).
For more information, check the [Design](docs/design.md), [API Specification](docs/api.md) and detailed [User Guide](docs/user-guide.md).
For more information, check the [Design](docs/design.md), [API Specification](docs/api-docs.md) and detailed [User Guide](docs/user-guide.md).
## Overview

2519
docs/api-docs.md Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1,179 +0,0 @@
# SparkApplication API
The Kubernetes Operator for Apache Spark uses [CustomResourceDefinitions](https://kubernetes.io/docs/concepts/api-extension/custom-resources/) named `SparkApplication` and `ScheduledSparkApplication` for specifying one-time Spark applications and Spark applications
that are supposed to run on a standard [cron](https://en.wikipedia.org/wiki/Cron) schedule. Similarly to other kinds of Kubernetes resources, they consist of a specification in a `Spec` field and a `Status` field. The definitions are organized in the following structure. The v1beta2 version of the API definition is implemented [here](../pkg/apis/sparkoperator.k8s.io/v1beta2/types.go).
```
ScheduledSparkApplication
|__ ScheduledSparkApplicationSpec
|__ SparkApplication
|__ ScheduledSparkApplicationStatus
SparkApplication
|__ SparkApplicationSpec
|__ DriverSpec
|__ SparkPodSpec
|__ ExecutorSpec
|__ SparkPodSpec
|__ Dependencies
|__ MonitoringSpec
|__ PrometheusSpec
|__ SparkApplicationStatus
|__ DriverInfo
```
## API Definition
### `SparkApplicationSpec`
A `SparkApplicationSpec` has the following top-level fields:
| Field | Spark configuration property or `spark-submit` option | Note |
| ------------- | ------------- | ------------- |
| `Type` | N/A | The type of the Spark application. Valid values are `Java`, `Scala`, `Python`, and `R`. |
| `PythonVersion` | `spark.kubernetes.pyspark.pythonVersion` | This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3, default 2. |
| `Mode` | `--mode` | Spark deployment mode. Valid values are `cluster` and `client`. |
| `Image` | `spark.kubernetes.container.image` | Unified container image for the driver, executor, and init-container. |
| `InitContainerImage` | `spark.kubernetes.initContainer.image` | Custom init-container image. |
| `ImagePullPolicy` | `spark.kubernetes.container.image.pullPolicy` | Container image pull policy. |
| `ImagePullSecrets` | `spark.kubernetes.container.image.pullSecrets` | Container image pull secrets. |
| `MainClass` | `--class` | Main application class to run. |
| `MainApplicationFile` | N/A | Main application file, e.g., a bundled jar containing the main class and its dependencies. |
| `Arguments` | N/A | List of application arguments. |
| `SparkConf` | N/A | A map of extra Spark configuration properties. |
| `HadoopConf` | N/A | A map of Hadoop configuration properties. The operator will add the prefix `spark.hadoop.` to the properties when adding it through the `--conf` option. |
| `SparkConfigMap` | N/A | Name of a Kubernetes ConfigMap carrying Spark configuration files, e.g., `spark-env.sh`. The controller sets the environment variable `SPARK_CONF_DIR` to where the ConfigMap is mounted. |
| `HadoopConfigMap` | N/A | Name of a Kubernetes ConfigMap carrying Hadoop configuration files, e.g., `core-site.xml`. The controller sets the environment variable `HADOOP_CONF_DIR` to where the ConfigMap is mounted. |
| `Volumes` | N/A | List of Kubernetes [volumes](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#volume-v1-core) the driver and executors need collectively. |
| `Driver` | N/A | A [`DriverSpec`](#driverspec) field. |
| `Executor` | N/A | An [`ExecutorSpec`](#executorspec) field. |
| `Deps` | N/A | A [`Dependencies`](#dependencies) field. |
| `RestartPolicy` | N/A | The policy regarding if and in which conditions the controller should restart a terminated application. |
| `NodeSelector` | `spark.kubernetes.node.selector.[labelKey]` | Node selector of the driver pod and executor pods, with key `labelKey` and value as the label's value. |
| `MemoryOverheadFactor` | `spark.kubernetes.memoryOverheadFactor` | This sets the Memory Overhead Factor that will allocate memory to non-JVM memory. For JVM-based jobs this value will default to 0.10, for non-JVM jobs 0.40. Value of this field will be overridden by `Spec.Driver.MemoryOverhead` and `Spec.Executor.MemoryOverhead` if they are set. |
| `Monitoring` | N/A | This specifies how monitoring of the Spark application should be handled, e.g., how driver and executor metrics are to be exposed. Currently only exposing metrics to Prometheus is supported. |
#### `DriverSpec`
A `DriverSpec` embeds a [`SparkPodSpec`](#sparkpodspec) and additionally has the following fields:
| Field | Spark configuration property or `spark-submit` option | Note |
| ------------- | ------------- | ------------- |
| `PodName` | `spark.kubernetes.driver.pod.name` | Name of the driver pod. |
| `ServiceAccount` | `spark.kubernetes.authenticate.driver.serviceAccountName` | Name of the Kubernetes service account to use for the driver pod. |
#### `ExecutorSpec`
Similarly to the `DriverSpec`, an `ExecutorSpec` also embeds a a [`SparkPodSpec`](#sparkpodspec) and additionally has the following fields:
| Field | Spark configuration property or `spark-submit` option | Note |
| ------------- | ------------- | ------------- |
| `Instances` | `spark.executor.instances` | Number of executor instances to request for. |
| `CoreRequest` | `spark.kubernetes.executor.request.cores` | Physical CPU request for the executors. |
#### `SparkPodSpec`
A `SparkPodSpec` defines common attributes of a driver or executor pod, summarized in the following table.
| Field | Spark configuration property or `spark-submit` option | Note |
| ------------- | ------------- | ------------- |
| `Cores` | `spark.driver.cores` or `spark.executor.cores` | Number of CPU cores for the driver or executor pod. |
| `CoreLimit` | `spark.kubernetes.driver.limit.cores` or `spark.kubernetes.executor.limit.cores` | Hard limit on the number of CPU cores for the driver or executor pod. |
| `Memory` | `spark.driver.memory` or `spark.executor.memory` | Amount of memory to request for the driver or executor pod. |
| `MemoryOverhead` | `spark.driver.memoryOverhead` or `spark.executor.memoryOverhead` | Amount of off-heap memory to allocate for the driver or executor pod in cluster mode, in `MiB` unless otherwise specified. |
| `Image` | `spark.kubernetes.driver.container.image` or `spark.kubernetes.executor.container.image` | Custom container image for the driver or executor. |
| `ConfigMaps` | N/A | A map of Kubernetes ConfigMaps to mount into the driver or executor pod. Keys are ConfigMap names and values are mount paths. |
| `Secrets` | `spark.kubernetes.driver.secrets.[SecretName]` or `spark.kubernetes.executor.secrets.[SecretName]` | A map of Kubernetes secrets to mount into the driver or executor pod. Keys are secret names and values specify the mount paths and secret types. |
| `EnvVars` | `spark.kubernetes.driverEnv.[EnvironmentVariableName]` or `spark.executorEnv.[EnvironmentVariableName]` | A map of environment variables to add to the driver or executor pod. Keys are variable names and values are variable values. |
| `EnvSecretKeyRefs` | `spark.kubernetes.driver.secretKeyRef.[EnvironmentVariableName]` or `spark.kubernetes.executor.secretKeyRef.[EnvironmentVariableName]` | A map of environment variables to SecretKeyRefs. Keys are variable names and values are pairs of a secret name and a secret key. |
| `Labels` | `spark.kubernetes.driver.label.[LabelName]` or `spark.kubernetes.executor.label.[LabelName]` | A map of Kubernetes labels to add to the driver or executor pod. Keys are label names and values are label values. |
| `Annotations` | `spark.kubernetes.driver.annotation.[AnnotationName]` or `spark.kubernetes.executor.annotation.[AnnotationName]` | A map of Kubernetes annotations to add to the driver or executor pod. Keys are annotation names and values are annotation values. |
| `VolumeMounts` | N/A | List of Kubernetes [volume mounts](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#volumemount-v1-core) for volumes that should be mounted to the pod. |
| `Tolerations` | N/A | List of Kubernetes [tolerations](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#toleration-v1-core) that should be applied to the pod. |
#### `Dependencies`
A `Dependencies` specifies the various types of dependencies of a Spark application in a central place.
| Field | Spark configuration property or `spark-submit` option | Note |
| ------------- | ------------- | ------------- |
| `Jars` | `spark.jars` or `--jars` | List of jars the application depends on. |
| `Files` | `spark.files` or `--files` | List of files the application depends on. |
#### `MonitoringSpec`
A `MonitoringSpec` specifies how monitoring of the Spark application should be handled, e.g., how driver and executor metrics are to be exposed. Currently only exposing metrics to Prometheus is supported.
| Field | Spark configuration property or `spark-submit` option | Note |
| ------------- | ------------- | ------------- |
| `ExposeDriverMetrics` | N/A | This specifies if driver metrics should be exposed. Defaults to `false`. |
| `ExposeExecutorMetrics` | N/A | This specifies if executor metrics should be exposed. Defaults to `false`. |
| `MetricsProperties` | N/A | If specified, this contains the content of a custom `metrics.properties` that configures the Spark metrics system. Otherwise, the content of `spark-docker/conf/metrics.properties` will be used. |
| `PrometheusSpec` | N/A | If specified, this configures how metrics are exposed to Prometheus. |
#### `PrometheusSpec`
A `PrometheusSpec` configures how metrics are exposed to Prometheus.
| Field | Spark configuration property or `spark-submit` option | Note |
| ------------- | ------------- | ------------- |
| `JmxExporterJar` | N/A | This specifies the path to the [Prometheus JMX exporter](https://github.com/prometheus/jmx_exporter) jar. |
| `Port` | N/A | If specified, the value will be used in the Java agent configuration for the Prometheus JMX exporter. The Java agent gets bound to the specified port if specified or `8090` otherwise by default. |
| `ConfigFile` | N/A | This specifies the full path of the Prometheus configuration file in the Spark image. If specified, it will override the default configurations and take precedence over `Configuration` shown below. |
| `Configuration` | N/A | If specified, this contains the contents of a custom Prometheus configuration used by the Prometheus JMX exporter. Otherwise, the contents of `spark-docker/conf/prometheus.yaml` will be used, unless `ConfigFile` is specified. |
### `SparkApplicationStatus`
A `SparkApplicationStatus` captures the status of a Spark application including the state of every executors.
| Field | Note |
| ------------- | ------------- |
| `AppID` | A randomly generated ID used to group all Kubernetes resources of an application. |
| `LastSubmissionAttemptTime` | Time for the last application submission attempt. |
| `CompletionTime` | Time the application completes (if it does). |
| `DriverInfo` | A [`DriverInfo`](#driverinfo) field. |
| `AppState` | Current state of the application. |
| `ExecutorState` | A map of executor pod names to executor state. |
| `ExecutionAttempts` | The number of attempts made for an application. |
| `SubmissionAttempts` | The number of submission attempts made for an application. |
#### `DriverInfo`
A `DriverInfo` captures information about the driver pod and the Spark web UI running in the driver.
| Field | Note |
| ------------- | ------------- |
| `WebUIServiceName` | Name of the service for the Spark web UI. |
| `WebUIPort` | Port on which the Spark web UI runs on the Node. |
| `WebUIAddress` | Address to access the web UI from within the cluster. |
| `WebUIIngressName` | Name of the ingress for the Spark web UI. |
| `WebUIIngressAddress` | Address to access the web UI via the Ingress. |
| `PodName` | Name of the driver pod. |
### `ScheduledSparkApplicationSpec`
A `ScheduledSparkApplicationSpec` has the following top-level fields:
| Field | Optional | Default | Note |
| ------------- | ------------- | ------------- | ------------- |
| `Schedule` | No | N/A | The cron schedule on which the application should run. |
| `Template` | No | N/A | A template from which `SparkApplication` instances of scheduled runs of the application can be created. |
| `Suspend` | Yes | `false` | A flag telling the controller to suspend subsequent runs of the application if set to `true`. |
| `ConcurrencyPolicy` | `Allow` | Yes | the policy governing concurrent runs of the application. Valid values are `Allow`, `Forbid`, and `Replace` |
| `SuccessfulRunHistoryLimit` | Yes | 1 | The number of past successful runs of the application to keep track of. |
| `FailedRunHistoryLimit` | Yes | 1 | The number of past failed runs of the application to keep track of. |
### `ScheduledSparkApplicationStatus`
A `ScheduledSparkApplicationStatus` captures the status of a Spark application including the state of every executors.
| Field | Note |
| ------------- | ------------- |
| `LastRun` | The time when the last run of the application started. |
| `NextRun` | The time when the next run of the application is estimated to start. |
| `PastSuccessfulRunNames` | The names of `SparkApplication` objects of past successful runs of the application. The maximum number of names to keep track of is controlled by `SuccessfulRunHistoryLimit`. |
| `PastFailedRunNames` | The names of `SparkApplication` objects of past failed runs of the application. The maximum number of names to keep track of is controlled by `FailedRunHistoryLimit`. |
| `ScheduleState` | The current scheduling state of the application. Valid values are `FailedValidation` and `Scheduled`. |
| `Reason` | Human readable message on why the `ScheduledSparkApplication` is in the particular `ScheduleState`. |

View File

@ -83,3 +83,13 @@ To run unit tests, run the following command:
```bash
$ go test ./...
```
## Build the API Specification Doc
When you update the API, or specifically the `SparkApplication` and `ScheduledSparkApplication` specifications, the API specification doc needs to be updated. To update the API specification doc, run the following command:
```bash
make build-api-docs
```
Running the aboe command will update the file `docs/api-docs.md`.

28
hack/api-docs-config.json Normal file
View File

@ -0,0 +1,28 @@
{
"hideMemberFields": [
"TypeMeta"
],
"hideTypePatterns": [
"ParseError$",
"List$"
],
"externalPackages": [
{
"typeMatchPrefix": "^k8s\\.io/apimachinery/pkg/apis/meta/v1\\.Duration$",
"docsURLTemplate": "https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#Duration"
},
{
"typeMatchPrefix": "^k8s\\.io/(api|apimachinery/pkg/apis)/",
"docsURLTemplate": "https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.13/#{{lower .TypeIdentifier}}-{{arrIndex .PackageSegments -1}}-{{arrIndex .PackageSegments -2}}"
},
{
"typeMatchPrefix": "^github\\.com/knative/pkg/apis/duck/",
"docsURLTemplate": "https://godoc.org/github.com/knative/pkg/apis/duck/{{arrIndex .PackageSegments -1}}#{{.TypeIdentifier}}"
}
],
"typeDisplayNamePrefixOverrides": {
"k8s.io/api/": "Kubernetes ",
"k8s.io/apimachinery/pkg/apis/": "Kubernetes "
},
"markdownDisabled": false
}

View File

@ -0,0 +1,48 @@
{{ define "members" }}
{{ range .Members }}
{{ if not (hiddenMember .)}}
<tr>
<td>
<code>{{ fieldName . }}</code></br>
<em>
{{ if linkForType .Type }}
<a href="{{ linkForType .Type}}">
{{ typeDisplayName .Type }}
</a>
{{ else }}
{{ typeDisplayName .Type }}
{{ end }}
</em>
</td>
<td>
{{ if fieldEmbedded . }}
<p>
(Members of <code>{{ fieldName . }}</code> are embedded into this type.)
</p>
{{ end}}
{{ if isOptionalMember .}}
<em>(Optional)</em>
{{ end }}
{{ safe (renderComments .CommentLines) }}
{{ if and (eq (.Type.Name.Name) "ObjectMeta") }}
Refer to the Kubernetes API documentation for the fields of the
<code>metadata</code> field.
{{ end }}
{{ if or (eq (fieldName .) "spec") }}
<br/>
<br/>
<table>
{{ template "members" .Type }}
</table>
{{ end }}
</td>
</tr>
{{ end }}
{{ end }}
{{ end }}

View File

@ -0,0 +1,49 @@
{{ define "packages" }}
{{ with .packages}}
<p>Packages:</p>
<ul>
{{ range . }}
<li>
<a href="#{{- packageAnchorID . -}}">{{ packageDisplayName . }}</a>
</li>
{{ end }}
</ul>
{{ end}}
{{ range .packages }}
<h2 id="{{- packageAnchorID . -}}">
{{- packageDisplayName . -}}
</h2>
{{ with (index .GoPackages 0 )}}
{{ with .DocComments }}
<p>
{{ safe (renderComments .) }}
</p>
{{ end }}
{{ end }}
Resource Types:
<ul>
{{- range (visibleTypes (sortedTypes .Types)) -}}
{{ if isExportedType . -}}
<li>
<a href="{{ linkForType . }}">{{ typeDisplayName . }}</a>
</li>
{{- end }}
{{- end -}}
</ul>
{{ range (visibleTypes (sortedTypes .Types))}}
{{ template "type" . }}
{{ end }}
<hr/>
{{ end }}
<p><em>
Generated with <code>gen-crd-api-reference-docs</code>
{{ with .gitCommit }} on git commit <code>{{ . }}</code>{{end}}.
</em></p>
{{ end }}

View File

@ -0,0 +1,58 @@
{{ define "type" }}
<h3 id="{{ anchorIDForType . }}">
{{- .Name.Name }}
{{ if eq .Kind "Alias" }}(<code>{{.Underlying}}</code> alias)</p>{{ end -}}
</h3>
{{ with (typeReferences .) }}
<p>
(<em>Appears on:</em>
{{- $prev := "" -}}
{{- range . -}}
{{- if $prev -}}, {{ end -}}
{{ $prev = . }}
<a href="{{ linkForType . }}">{{ typeDisplayName . }}</a>
{{- end -}}
)
</p>
{{ end }}
<p>
{{ safe (renderComments .CommentLines) }}
</p>
{{ if .Members }}
<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
{{ if isExportedType . }}
<tr>
<td>
<code>apiVersion</code></br>
string</td>
<td>
<code>
{{apiGroup .}}
</code>
</td>
</tr>
<tr>
<td>
<code>kind</code></br>
string
</td>
<td><code>{{.Name.Name}}</code></td>
</tr>
{{ end }}
{{ template "members" .}}
</tbody>
</table>
{{ end }}
{{ end }}

BIN
hack/api-ref-docs Executable file

Binary file not shown.