Commit Graph

145 Commits

Author SHA1 Message Date
Yinan Li 0600e96b1c Fixed issues after merging 2020-04-18 11:21:12 -07:00
Yinan Li 7a34c91f6c Use a Kubernetes Job to run spark-submit for multi-version support 2020-04-18 11:03:26 -07:00
Shiqi Sun e1d70afe9c
Add total SparkApplication count metric (#856)
* Add total SparkApplication count metric

Total SparkApplication count is the total number of SparkApplications
that have been processed by the operator. This metric can be used to
track how many SparkApplication the users have submitted to the K8s API
server, and also can be used as denominator when computing job success
rate, for example.

* Export SparkApp count metric in sparkapp_metric.go

Invoking the export of SparkApp count metric in exportMetrics() in
sparkapp_metrics.go instead of syncSparkApplication() in controller.go,
in order to align with the metric exporting convention in the code base.
2020-04-05 18:03:59 -07:00
Shiqi Sun 8a4a991a81
Fix typos and links in several docs (#853)
Trivial fixes on typos, links and formats in several docs.
2020-04-01 13:48:47 -07:00
Shiqi Sun 0e0867f0b4
Add job start latency metrics and add namespace tag in metrics (#852)
* Add new metric for job start latency

* Add job latency histogram metric and namespace tag

Job start latency is defined as the time difference between when the
job is submitted by the user and when the job is in a running or any of
the terminal states. We use histogram with configurable boundaries
because users can provide different boundaries that they are interested
of. They can use one of them as their SLO/SLA and use the histogram
values to compute the percentage of number of jobs that meet the SLA. We
also added the namespace label into all the metrics when applicable when
the users specify it in the command line option. In addition, we fixed
the controller state machine diagram.

* Add start latency metrics doc, fix based on review

Added start latency summary and histogram metrics doc in
quick-start-guide.md. Added fixes based on the code review comments in
the PR.

Co-authored-by: Vaishnavi Giridaran <vgiridaran@salesforce.com>
2020-04-01 13:13:36 -07:00
jinxingwang 5afcce2919
add fix for metricsProperties when HasPrometheusConfigFile is true. (#847)
* add fix for metricsProperties when HasPrometheusConfigFile is true.

* add new config MetricsPropertiesFile.

* add missing auto-generated code from privous PRs.

* fix monitoring_config_test.go test condition, redo the configmap logic in monitoring_config.go.

* redo the configmap & javaOption logic in monitoring_config.go.

* set back the configmap & javaOption logic in monitoring_config.go

* update log.
2020-03-31 09:14:10 -07:00
Yinan Li aae36546e5
Fix for #826 and some refactoring (#832) 2020-03-11 10:32:42 -07:00
isan_rivkin dd3df7a86e
Update links (#837)
Updated a broken link and reference on the page.
2020-03-11 10:32:28 -07:00
roitvt bcf75c9b25
add Nielsen Identity Engine to who using (#836) 2020-03-10 13:02:25 -07:00
Yinan Li 18572fa33b
Added terminationGracePeriodSeconds and pod/container lifecycle hook to driver pods (#811)
* feat: delete driver pods with a grace period

* feat: adding lifecycle pod spec for driver pods

* adding tests for grace period and lifecycle

* fix: adding user guide for termination grace period and container hooks
2020-03-05 14:29:25 -08:00
Yinan Li 60676326be
Update user-guide.md (#830)
Expand the description of when to use volumes instead of `/tmp` for scratch space.
2020-03-04 21:52:39 -08:00
Jim Kleckner d968a5a287
Make spark-local-dir-1 be spark-local-dir-2 (#829)
This would make it match the location.
2020-03-03 17:56:18 -08:00
Yinan Li 4e0c778951
Updated api-docs.md (#799) 2020-02-10 16:11:56 -08:00
Yinan Li c8ae9e3520
Upgraded to Spark 2.4.5 (#798) 2020-02-10 09:01:06 -08:00
Yinan Li f0f1b1768e
Added support for using envFrom (#794) 2020-02-07 11:08:24 -08:00
Berkay Öztürk c11e54a3e1
Fix broken Kubernetes documentation link (#791)
Fixes the broken link to the Kubernetes API documentation.
2020-02-05 10:48:46 -08:00
Jim Kleckner d212b29961
Minor typo in user guide soptionally (#788) 2020-02-03 21:00:59 -08:00
Ce Gao 959c67061e feat: Add caicloud as a adopter (#777)
Signed-off-by: Ce Gao <gaoce@caicloud.io>
2020-01-20 10:29:51 -08:00
Yinan Li e1570cf6c3
Update user-guide.md 2020-01-15 19:40:08 -08:00
Yinan Li ef12de0d41
Fixed broken link to the API spec doc 2020-01-02 12:00:43 -08:00
Tom Lous 608174e4b6 Added Shell (#759)
* Added Shell

* Update who-is-using.md
2020-01-02 09:37:33 -08:00
Yinan Li 1c4486cdda
Removed legacy init-container related fields (#750) 2019-12-20 10:08:05 -08:00
Yinan Li bbc3e71442
Updated CRD yamls and API docs (#749) 2019-12-19 16:21:09 -08:00
Yinan Li 52dc9a6412
Added support for separate ket for driver physical CPU request (#748) 2019-12-19 15:15:18 -08:00
Yinan Li b2a326ecaf
Added generated API doc (#747) 2019-12-19 14:34:49 -08:00
Yinan Li 86fb3bd51d
Added support for specifying init-containers for driver/executors (#740) 2019-12-16 16:15:52 -08:00
Jiaxin Shan 0bd7592035 Add volumes support for Spark scratch space spark.local.dir (#707) 2019-12-15 15:05:40 -08:00
avnerl 16ee3061fe Update who-is-using.md (#704) 2019-11-22 17:43:43 -08:00
Iftach Schonbaum 314657dd45 Update who-is-using.md (#700) 2019-11-20 08:18:49 -08:00
Vaishnavi Giridaran c40596e9b1 Fix typo in the k8s code gen in the developer guide. (#668) 2019-10-21 22:10:43 -07:00
akhurana001 85180bc593 Update Ingress details in quick-start-guide.md (#667)
* Update quick-start-guide.md

* Update Ingress Setup docs
2019-10-21 09:15:38 -07:00
Hen Ben Hemo 2b260cb6af Add Riskified to who-is-using.md (#663) 2019-10-17 07:38:19 -07:00
Pasalietis d615901d19 Adding exacaster (#659) 2019-10-14 21:18:43 -07:00
Yinan Li 74bd887581
Replaced 2.4.0 with 2.4.4 (#634) 2019-09-24 16:52:12 -07:00
Yinan Li 86ee076aab
Upgraded default Spark version from 2.4.0 to 2.4.4 (#625) 2019-09-20 11:03:55 -07:00
Hu Sheng a2403c2c39 Add batchSchedulerOptions support (#606)
* Add batchSchedulerOptions support

* Add batchSchedulerOptions support
2019-09-18 07:51:54 -07:00
Yinan Li e704c7b15d
Added TTL for SparkApplications (#615) 2019-09-13 14:21:58 -07:00
kevin hogeland 55a1eebc0c Generate CRD specs, bump to v1beta2 (#578)
* Generate CRD specs, bump to v1beta2

* Add short/singular CRD names

* Merge upstream/master

* Tweak Cores validation

* Fix typo, merge upstream

* Update remaining docs for v1beta2
2019-09-13 10:37:21 -07:00
Michael Marshall fb59a3aed6 Improve documentation for SparkJobNamespace (#608)
* Improve documentation for SparkJobNamespace

* Modifications after review
2019-09-11 08:14:43 -07:00
Hu Sheng 1a1fa21672 Add volcano integration docs (#599)
* Add volcano integration docs

* Fix comment issues
2019-09-06 08:29:06 -07:00
Ramya Rajasekaran b7361477c5 Updated prefixes in documentation to match produced metrics. (#574)
* Modified metrics documentation to reflect code

* Updated metrics prefix

* Updated prefixes to match produced metrics
2019-08-29 11:51:26 -07:00
Sam Clinckspoor 99e1ab337c Update api doc structure (#588) 2019-08-29 07:24:07 -07:00
kevin hogeland edcf4cdc32 Resource quota enforcement webhook (#544)
* Cert configuration and reloading

* Add support for strict webhook error handling

* Improve webhook error handling

* Don't deregister the webhook when failure policy is strict

* standard error message capitalization

* have the webhook parse its own configuration from flags

* clean up cert provider code

* Add explanation for skipping deregistration

* Resource Quota enforcement webhook

* Fix bad merge

* Cleanup, fixes

* Cleanup

* Document the quota enforcer
2019-08-28 14:11:38 -07:00
Yinan Li c5d04ab4a4
Added documentation for leader election and HA mode (#575) 2019-08-19 15:54:43 -07:00
Ramya Rajasekaran f9370be2a3 Modified metrics documentation to reflect code (#570) 2019-08-15 13:36:53 -07:00
kevin hogeland 249860f0cc Webhook enhancements (#543)
* Cert configuration and reloading

* Add support for strict webhook error handling

* Improve webhook error handling

* Don't deregister the webhook when failure policy is strict

* standard error message capitalization

* have the webhook parse its own configuration from flags

* clean up cert provider code

* Add explanation for skipping deregistration

* Tests and fixes
2019-08-15 10:46:44 -07:00
Martijn Zwennes 79ec720f7e Update docs with GKE private cluster info (#568) 2019-08-13 13:23:33 -07:00
Yinan Li 0c4cab5b02
Revert "Switched to use go modules (#562)" (#565)
This reverts commit 173a9d524c.
2019-08-12 11:44:08 -07:00
Yinan Li 173a9d524c
Switched to use go modules (#562) 2019-08-08 10:27:45 -07:00
shencheng 7abb0e247e Update user-guide.md (#552)
修改sparkConf,key:value都需要双引号
2019-07-22 23:56:14 -07:00