diff --git a/spark/README-short.txt b/spark/README-short.txt new file mode 100644 index 000000000..1b5f8f525 --- /dev/null +++ b/spark/README-short.txt @@ -0,0 +1 @@ +Apache Spark - A unified analytics engine for large-scale data processing diff --git a/spark/content.md b/spark/content.md new file mode 100644 index 000000000..380ac1aa6 --- /dev/null +++ b/spark/content.md @@ -0,0 +1,53 @@ +# What is Apache Spark™? + +Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. + +%%LOGO%% + +## Online Documentation + +You can find the latest Spark documentation, including a programming guide, on the [project web page](https://spark.apache.org/documentation.html). This README file only contains basic setup instructions. + +## Interactive Scala Shell + +The easiest way to start using Spark is through the Scala shell: + +```console +docker run -it %%IMAGE%% /opt/spark/bin/spark-shell +``` + +Try the following command, which should return 1,000,000,000: + +```scala +scala> spark.range(1000 * 1000 * 1000).count() +``` + +## Interactive Python Shell + +The easiest way to start using PySpark is through the Python shell: + +```console +docker run -it %%IMAGE%%:python3 /opt/spark/bin/pyspark +``` + +And run the following command, which should also return 1,000,000,000: + +```python +>>> spark.range(1000 * 1000 * 1000).count() +``` + +## Interactive R Shell + +The easiest way to start using R on Spark is through the R shell: + +```console +docker run -it %%IMAGE%%:r /opt/spark/bin/sparkR +``` + +## Running Spark on Kubernetes + +https://spark.apache.org/docs/latest/running-on-kubernetes.html + +## Configuration and environment variables + +See more in https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable diff --git a/spark/get-help.md b/spark/get-help.md new file mode 100644 index 000000000..f4569f262 --- /dev/null +++ b/spark/get-help.md @@ -0,0 +1 @@ +[Apache Spark™ community](https://spark.apache.org/community.html) diff --git a/spark/github-repo b/spark/github-repo new file mode 100644 index 000000000..56646b9d2 --- /dev/null +++ b/spark/github-repo @@ -0,0 +1 @@ +https://github.com/apache/spark-docker diff --git a/spark/issues.md b/spark/issues.md new file mode 100644 index 000000000..3222af653 --- /dev/null +++ b/spark/issues.md @@ -0,0 +1 @@ +https://issues.apache.org/jira/browse/SPARK diff --git a/spark/license.md b/spark/license.md new file mode 100644 index 000000000..4170f2532 --- /dev/null +++ b/spark/license.md @@ -0,0 +1,3 @@ +Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are trademarks of The Apache Software Foundation. + +Licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). diff --git a/spark/logo.png b/spark/logo.png new file mode 100644 index 000000000..464eda547 Binary files /dev/null and b/spark/logo.png differ diff --git a/spark/maintainer.md b/spark/maintainer.md new file mode 100644 index 000000000..e4ef7ed7a --- /dev/null +++ b/spark/maintainer.md @@ -0,0 +1 @@ +[Apache Spark](https://spark.apache.org/committers.html)