SIG Big Data: updating resources documentation page with latest information

2018-10-04 12:01:22 +02:00 · 2018-10-04 12:01:22 +02:00 · 183dd773b8
parent ea516c11f5
commit 183dd773b8
1 changed files with 74 additions and 4 deletions
--- a/sig-big-data/resources.md
+++ b/sig-big-data/resources.md
@ -1,15 +1,85 @@
 # Resources

+## Kubernetes Integration status by Big Data product
+
 ### Spark
-* [Spark on Kubernetes Design Proposal](https://docs.google.com/document/d/1_bBzOZ8rKiOSjQg78DXOA3ZBIo_KkDJjqxVuq0yXdew/edit#)
-* [Spark Dynamic Allocation Proposal](https://docs.google.com/document/d/1S9OMnFaeSf_UUxWpMpvC7ERcWx-jDr2g85MWri3Hccc/edit?usp=sharing)
-* [SPARK-JIRA](https://issues.apache.org/jira/browse/SPARK-18278)
-* [Kubernetes Issue #34377](https://github.com/kubernetes/kubernetes/issues/34377)
+
+[Apache Spark](https://spark.apache.org) is a distributed data processing framework. 
+
+##### Status
+
+Kubernetes is supported as a mainline Spark scheduler since [release 2.3][https://spark.apache.org/releases/spark-release-2-3-0.html], see [the detailed documentation][https://spark.apache.org/docs/latest/running-on-kubernetes.html].
+
+* [Spark on Kubernetes original Design Proposal](https://docs.google.com/document/d/1_bBzOZ8rKiOSjQg78DXOA3ZBIo_KkDJjqxVuq0yXdew/edit#)
 * [External Repository](https://github.com/apache-spark-on-k8s/spark)

+##### Activities 
+
+Work is underway for Spark 2.4 to improve support and integration with HDFS.
+* Design Document [How Spark on Kubernetes will access Secure HDFS][https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg/edit#heading=h.verdza2f4fyd]  
+* [][]  
+* [][]  
+
 ### HDFS
+
+[Apache Hadoop HDFS][https://hadoop.apache.org/hdfs] is a distributed file system, the persistence layer for Hadoop.
+
+##### Status
+
+TODO, e.g. "No release yet."
+
+##### Activities
+
 * [Data Locality Doc](https://docs.google.com/document/d/1TAC6UQDS3M2sin2msFcZ9UBBQFyyz4jFKWw5BM54cQo/edit)
 * [External Repository](https://github.com/apache-spark-on-k8s/kubernetes-HDFS)

 ### Airflow
+
+[Apache Airflow][https://airflow.apache.org] is a platform to programmatically author, schedule and monitor workflows.
+
+##### Status
+
+The [Kubernetes executor][https://airflow.apache.org/kubernetes.html]  has been introduced with Airflow [release 1.10.0][https://github.com/apache/incubator-airflow/blob/master/CHANGELOG.txt]  with support of Kubernetes 1.10. 
+
+##### Activities
+
 * [Airflow roadmap](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71013666)
+
+### Flink
+
+[Apache Flink][https://flink.apache.org] is a distributed data processing framework.
+
+##### Status
+
+Flink 1.6 supports [running a session or job cluster on Kubernetes][https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html].
+
+##### Activities
+
+* [Native support for Kubernetes as a Flink runtime][https://issues.apache.org/jira/browse/FLINK-9953] 
+* [Lyft is working on an operator][https://lists.apache.org/thread.html/aa941030440c1d9e34c35c0caf5ddd2456755337fc34a4edebb32929@%3Cdev.flink.apache.org%3E] 
+
+### Kafka
+
+[Apache Kafka][https://kafka.apache.org/] is a distributed streaming platform.
+
+##### Status
+
+Confluent is working on an operator for Kafka.
+
+##### Activities   
+
+* [Confluent blog post][https://www.confluent.io/blog/getting-started-apache-kafka-kubernetes/] 
+* [Confluent operator landing page][https://www.confluent.io/confluent-operator/] 
+
+### Pulsar
+
+[Apache Pulsar][https://pulsar.apache.org] is an open-source distributed pub-sub messaging system.
+
+##### Status
+
+[Pulsar supports running on Kubernetes][https://pulsar.apache.org/docs/latest/deployment/Kubernetes/] 
+
+##### Activities
+
+TODO
+