Merge pull request #2200 from Yikun/spark

Add Apache Spark Docker Official Image doc
2023-07-19 10:14:20 -07:00 · 2023-07-19 10:14:20 -07:00 · 43e8bc6484
parent 04f3db4898 cbce68ea83
commit 43e8bc6484
8 changed files with 61 additions and 0 deletions
--- a/spark/README-short.txt
+++ b/spark/README-short.txt
@ -0,0 +1 @@
 Apache Spark - A unified analytics engine for large-scale data processing
--- a/spark/content.md
+++ b/spark/content.md
@ -0,0 +1,53 @@
 # What is Apache Spark™?
 Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
 %%LOGO%%
 ## Online Documentation
 You can find the latest Spark documentation, including a programming guide, on the [project web page](https://spark.apache.org/documentation.html). This README file only contains basic setup instructions.
 ## Interactive Scala Shell
 The easiest way to start using Spark is through the Scala shell:
 ```console
 docker run -it %%IMAGE%% /opt/spark/bin/spark-shell
 ```
 Try the following command, which should return 1,000,000,000:
 ```scala
 scala> spark.range(1000 * 1000 * 1000).count()
 ```
 ## Interactive Python Shell
 The easiest way to start using PySpark is through the Python shell:
 ```console
 docker run -it %%IMAGE%%:python3 /opt/spark/bin/pyspark
 ```
 And run the following command, which should also return 1,000,000,000:
 ```python
 >>> spark.range(1000 * 1000 * 1000).count()
 ```
 ## Interactive R Shell
 The easiest way to start using R on Spark is through the R shell:
 ```console
 docker run -it %%IMAGE%%:r /opt/spark/bin/sparkR
 ```
 ## Running Spark on Kubernetes
 https://spark.apache.org/docs/latest/running-on-kubernetes.html
 ## Configuration and environment variables
 See more in https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable
--- a/spark/get-help.md
+++ b/spark/get-help.md
@ -0,0 +1 @@
 [Apache Spark™ community](https://spark.apache.org/community.html)
--- a/spark/github-repo
+++ b/spark/github-repo
@ -0,0 +1 @@
 https://github.com/apache/spark-docker
--- a/spark/issues.md
+++ b/spark/issues.md
@ -0,0 +1 @@
 https://issues.apache.org/jira/browse/SPARK
--- a/spark/license.md
+++ b/spark/license.md
@ -0,0 +1,3 @@
 Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are trademarks of The Apache Software Foundation.
 Licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
--- a/spark/logo.png
+++ b/spark/logo.png
--- a/spark/maintainer.md
+++ b/spark/maintainer.md
@ -0,0 +1 @@
 [Apache Spark](https://spark.apache.org/committers.html)
		`@ -0,0 +1 @@`
							`Apache Spark - A unified analytics engine for large-scale data processing`
		`@ -0,0 +1 @@`
							`[Apache Spark™ community](https://spark.apache.org/community.html)`
		`@ -0,0 +1,3 @@`
							`Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are trademarks of The Apache Software Foundation.`

							`Licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).`
		`@ -0,0 +1 @@`
							`[Apache Spark](https://spark.apache.org/committers.html)`