Merge pull request #2200 from Yikun/spark
Add Apache Spark Docker Official Image doc
This commit is contained in:
commit
43e8bc6484
|
|
@ -0,0 +1 @@
|
||||||
|
Apache Spark - A unified analytics engine for large-scale data processing
|
||||||
|
|
@ -0,0 +1,53 @@
|
||||||
|
# What is Apache Spark™?
|
||||||
|
|
||||||
|
Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
|
||||||
|
|
||||||
|
%%LOGO%%
|
||||||
|
|
||||||
|
## Online Documentation
|
||||||
|
|
||||||
|
You can find the latest Spark documentation, including a programming guide, on the [project web page](https://spark.apache.org/documentation.html). This README file only contains basic setup instructions.
|
||||||
|
|
||||||
|
## Interactive Scala Shell
|
||||||
|
|
||||||
|
The easiest way to start using Spark is through the Scala shell:
|
||||||
|
|
||||||
|
```console
|
||||||
|
docker run -it %%IMAGE%% /opt/spark/bin/spark-shell
|
||||||
|
```
|
||||||
|
|
||||||
|
Try the following command, which should return 1,000,000,000:
|
||||||
|
|
||||||
|
```scala
|
||||||
|
scala> spark.range(1000 * 1000 * 1000).count()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Interactive Python Shell
|
||||||
|
|
||||||
|
The easiest way to start using PySpark is through the Python shell:
|
||||||
|
|
||||||
|
```console
|
||||||
|
docker run -it %%IMAGE%%:python3 /opt/spark/bin/pyspark
|
||||||
|
```
|
||||||
|
|
||||||
|
And run the following command, which should also return 1,000,000,000:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> spark.range(1000 * 1000 * 1000).count()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Interactive R Shell
|
||||||
|
|
||||||
|
The easiest way to start using R on Spark is through the R shell:
|
||||||
|
|
||||||
|
```console
|
||||||
|
docker run -it %%IMAGE%%:r /opt/spark/bin/sparkR
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running Spark on Kubernetes
|
||||||
|
|
||||||
|
https://spark.apache.org/docs/latest/running-on-kubernetes.html
|
||||||
|
|
||||||
|
## Configuration and environment variables
|
||||||
|
|
||||||
|
See more in https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable
|
||||||
|
|
@ -0,0 +1 @@
|
||||||
|
[Apache Spark™ community](https://spark.apache.org/community.html)
|
||||||
|
|
@ -0,0 +1 @@
|
||||||
|
https://github.com/apache/spark-docker
|
||||||
|
|
@ -0,0 +1 @@
|
||||||
|
https://issues.apache.org/jira/browse/SPARK
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are trademarks of The Apache Software Foundation.
|
||||||
|
|
||||||
|
Licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
||||||
Binary file not shown.
|
After Width: | Height: | Size: 11 KiB |
|
|
@ -0,0 +1 @@
|
||||||
|
[Apache Spark](https://spark.apache.org/committers.html)
|
||||||
Loading…
Reference in New Issue