Yinan Li
7a34c91f6c
Use a Kubernetes Job to run spark-submit for multi-version support
2020-04-18 11:03:26 -07:00
Shiqi Sun
e1d70afe9c
Add total SparkApplication count metric ( #856 )
...
* Add total SparkApplication count metric
Total SparkApplication count is the total number of SparkApplications
that have been processed by the operator. This metric can be used to
track how many SparkApplication the users have submitted to the K8s API
server, and also can be used as denominator when computing job success
rate, for example.
* Export SparkApp count metric in sparkapp_metric.go
Invoking the export of SparkApp count metric in exportMetrics() in
sparkapp_metrics.go instead of syncSparkApplication() in controller.go,
in order to align with the metric exporting convention in the code base.
2020-04-05 18:03:59 -07:00
Shiqi Sun
0e0867f0b4
Add job start latency metrics and add namespace tag in metrics ( #852 )
...
* Add new metric for job start latency
* Add job latency histogram metric and namespace tag
Job start latency is defined as the time difference between when the
job is submitted by the user and when the job is in a running or any of
the terminal states. We use histogram with configurable boundaries
because users can provide different boundaries that they are interested
of. They can use one of them as their SLO/SLA and use the histogram
values to compute the percentage of number of jobs that meet the SLA. We
also added the namespace label into all the metrics when applicable when
the users specify it in the command line option. In addition, we fixed
the controller state machine diagram.
* Add start latency metrics doc, fix based on review
Added start latency summary and histogram metrics doc in
quick-start-guide.md. Added fixes based on the code review comments in
the PR.
Co-authored-by: Vaishnavi Giridaran <vgiridaran@salesforce.com>
2020-04-01 13:13:36 -07:00
Yinan Li
c8ae9e3520
Upgraded to Spark 2.4.5 ( #798 )
2020-02-10 09:01:06 -08:00
akhurana001
85180bc593
Update Ingress details in quick-start-guide.md ( #667 )
...
* Update quick-start-guide.md
* Update Ingress Setup docs
2019-10-21 09:15:38 -07:00
Yinan Li
86ee076aab
Upgraded default Spark version from 2.4.0 to 2.4.4 ( #625 )
2019-09-20 11:03:55 -07:00
kevin hogeland
55a1eebc0c
Generate CRD specs, bump to v1beta2 ( #578 )
...
* Generate CRD specs, bump to v1beta2
* Add short/singular CRD names
* Merge upstream/master
* Tweak Cores validation
* Fix typo, merge upstream
* Update remaining docs for v1beta2
2019-09-13 10:37:21 -07:00
Michael Marshall
fb59a3aed6
Improve documentation for SparkJobNamespace ( #608 )
...
* Improve documentation for SparkJobNamespace
* Modifications after review
2019-09-11 08:14:43 -07:00
Ramya Rajasekaran
b7361477c5
Updated prefixes in documentation to match produced metrics. ( #574 )
...
* Modified metrics documentation to reflect code
* Updated metrics prefix
* Updated prefixes to match produced metrics
2019-08-29 11:51:26 -07:00
Yinan Li
c5d04ab4a4
Added documentation for leader election and HA mode ( #575 )
2019-08-19 15:54:43 -07:00
Ramya Rajasekaran
f9370be2a3
Modified metrics documentation to reflect code ( #570 )
2019-08-15 13:36:53 -07:00
kevin hogeland
249860f0cc
Webhook enhancements ( #543 )
...
* Cert configuration and reloading
* Add support for strict webhook error handling
* Improve webhook error handling
* Don't deregister the webhook when failure policy is strict
* standard error message capitalization
* have the webhook parse its own configuration from flags
* clean up cert provider code
* Add explanation for skipping deregistration
* Tests and fixes
2019-08-15 10:46:44 -07:00
Martijn Zwennes
79ec720f7e
Update docs with GKE private cluster info ( #568 )
2019-08-13 13:23:33 -07:00
akhurana001
6fc512301d
Replace NodePort Service with ClusterIP ( #520 )
...
* Remove NodePort Service
* Docs update
2019-06-17 12:05:35 -07:00
Xingjun Wang
566d917a64
Fix a typo
...
Fix a typo
2019-04-28 16:14:57 +08:00
Alex Glikson
86bfeaf994
typo in quick-start-guide.md
2019-03-01 16:25:55 -05:00
Chaoran Yu
cc70d3c979
Fixed doc to remove createSparkJobNamespace
2019-01-25 09:51:27 +01:00
Yinan Li
2db484a82e
Updated the quick start guilde
2019-01-23 11:22:18 -08:00
Yinan Li
c41576b5ff
Updated the use the v1beta1 version of the APIs
2019-01-17 10:45:54 -08:00
Yinan Li
8a84d6f39f
Merge pull request #343 from lightbend/fixes
...
OpenShift image and fixed bug on running multiple operators/webhooks
2019-01-14 12:38:29 -08:00
Yinan Li
e802f66ea5
Some refactoring to main.go and the quick start guide
2018-12-18 11:30:46 -08:00
Chaoran Yu
73d09c8c4d
Added support for running the containers using a non-root user; fixed an issue with running multiple instances of operator; fixed doc to reflect webhook new default in the chart
2018-12-11 23:53:07 +08:00
Yinan Li
d909c246cd
Added yamls for CRDs
2018-11-27 14:59:23 -08:00
akhurana001
8c7fdbb306
Operator State Management + Ingress Creation ( #291 )
...
* SparkOperator: Prometheus Metrics Integration
* Prometheus Metric Update
* Spark Operator:Prometheus Metric Integration
* PositiveGauge rework
* remove unwanted dependencies
* Propogating ScheduledSpark App Labels
* Doc update
* Metric Description update
* fix app wait
* SparkOperator: Prometheus Metrics Integration
* Spark Operator metrics:PR Comments
* SparkOperator: Set completion time for Failed App
* Operator Metrics: PR comments
* Spark Operator: PR Comments
* Controller Update
* PR Comments
* Docs Update
* Driver State Transition Check Update
* Operator State Management
* Clean-up
* Exposing Spark Application Id in Operator
* SparkAppId updates
* Add Lyft as a user and contributor to operator
* Spark Operator Rework
* Reworking Restart-Policy
* Documentation update
* PR comments
* PR comments
* Ingress impl
* Ingress Tests + Updates
* go fmt
* PR Comments
* missing files
* AppId removal: Doc Update
* Doc update
* Delete UI/Ingress + Other minor changes
* Add PENDING_RETRY State
* PR comments
* PR comments
* Clean-up
* Update controller.go
* Add Terminal State
* Terminal State
* Spark improvements
* event type
* Events update
* Update controller.go
* Update controller.go
* PR Comments
* PR comments
* Support Best-effort Spec updates
* New State
* PR comments
* PR comments
* go fmt
* Docs update
* PR feedback
* PR Feedback
* PR comments
2018-11-19 15:25:55 -08:00
Chaoran Yu
61663a2ce6
Documentation updates
2018-11-06 11:49:22 -05:00
Chaoran Yu
77772a1987
Refactored tests to use generated client instead of kubectl
2018-11-06 11:49:22 -05:00
Chaoran Yu
83cb8dd90e
Documentation update corresponding to Helm chart v0.1.3
2018-11-06 11:49:22 -05:00
Yinan Li
b575f307ad
Upgraded to use the Spark 2.4.0 image
2018-11-05 07:51:32 -08:00
Sergey Samsonov
f350a23efe
Add support for tolerations
2018-10-20 17:21:18 -07:00
Piotr Mrowczynski
bc9bf22a73
Add batch job to initialize the webhook secret to manifest, remove requirement of manual step
2018-10-05 10:16:41 -07:00
Yinan Li
d277088c65
Added a developer guide and notes on a code gen issue
2018-09-28 14:08:52 -07:00
Sarjeet Singh
18dafddd64
Fix the namespace for sparkoperator deployment
...
Fix the namespace for sparkoperator deployment
2018-09-25 19:28:21 -07:00
Yinan Li
33f8cd1031
Fixed documentation after the go import path change
2018-09-23 21:28:25 -07:00
Yinan Li
c43465e832
Used "Kubernetes Operator for Apache Spark" in documentation
2018-09-19 09:11:59 -07:00
Yinan Li
29a4329ea4
Make the base spark image an argument
2018-09-10 11:51:38 -07:00
Chaoran Yu
71e5048a14
Moved to the install section
2018-09-06 21:54:26 -07:00
Chaoran Yu
cb51a5e7f2
Explain additional k8s resources that get created for webhook
2018-09-06 21:54:26 -07:00
Chaoran Yu
d60cf87f8c
Updated doc to use Helm chart
2018-09-06 21:54:26 -07:00
akhurana001
ceb230481e
SparkOperator: Prometheus Metrics Integration ( #227 )
...
* SparkOperator: Prometheus Metrics Integration
* Spark Operator metrics:PR Comments
* SparkOperator: Set completion time for Failed App
* Operator Metrics: PR comments
* Spark Operator: PR Comments
* Controller Update
* PR Comments
* Docs Update
* Driver State Transition Check Update
2018-08-03 13:56:58 -07:00
barney-s
7e878578ad
typo
2018-07-19 14:48:54 -07:00
Yinan Li
9a08ae361c
Get rid of the initializer (replaced by the mutating webhook)
2018-07-16 12:05:12 -07:00
Yinan Li
368b4a5ac3
Add mutating admission webhook ( #211 )
...
* Preparation to support affinity/anti-affinity
* Initial commit for the mutating admission webhook
2018-07-16 11:43:46 -07:00
Yinan Li
971de3b542
Made DNS check optional (on by default)
2018-06-09 22:28:45 -07:00
scotthew1
21567725f2
use multi-stage Dockerfile for reliable builds ( #174 )
...
* use multi-stage Dockerfile for reliable builds
* fixed date in codegen boilerplate
* clean up whitespace in quick-start-guide.md
* update quick-start-guide.md with new docker build
2018-06-01 10:15:06 -07:00
Scott Reisor
e941c18f11
added namespace to initializer, some updates from review
2018-06-01 07:23:09 -07:00
Jirka Kremser
7d78a21655
Fix the typos in quick start guide and adding the namespace explicitly
2018-05-10 09:32:21 -07:00
Palak Dalal
b5344d3ad5
docs changes
2018-05-03 13:59:37 -07:00
Yinan Li
74af880633
Updated spark-operator.yaml
2018-04-25 09:50:27 -07:00
Yinan Li
eff4c59cb2
Add cron support
2018-04-19 12:01:37 -07:00
Yinan Li
3624ad80de
Update quick-start-guide.md
2018-04-17 10:28:10 -07:00