Update home page and introduction

Signed-off-by: JesseStutler <chenzicong4@huawei.com>
This commit is contained in:
JesseStutler 2025-01-09 21:48:02 +08:00
parent 123af5c49f
commit 3c7d4b0f07
21 changed files with 300 additions and 142 deletions

View File

@ -76,7 +76,7 @@
[[zh.menu.main_right]]
name = "<img src=\"/img/icon_slack.svg\" alt=\"slack\" style=\" font-size: 1rem; line-height: 1.25; width:20px; height:20px; margin-top: 9px;\">"
post = ""
url = "https://volcano-sh.slack.com"
url = "https://cloud-native.slack.com/archives/C011GJDQS0N"
weight = 30
# Documentation version latest

View File

@ -52,7 +52,7 @@
name = "<img src=\"/img/icon_slack.svg\" alt=\"slack\" style=\" font-size: 1rem; line-height: 1.25; width:20px; height:20px; margin-top: 9px;\">"
post = ""
url = "https://volcano-sh.slack.com"
url = "https://cloud-native.slack.com/archives/C011GJDQS0N"
weight = 30
# Documentation version latest

View File

@ -44,10 +44,20 @@ interests = []
link = "https://twitter.com/volcano_sh"
+++
Volcano is system for running high-performance workloads on Kubernetes. It features powerful batch scheduling capability that Kubernetes cannot provide but is commonly required by many classes of high-performance workloads, including:
Volcano is CNCF's first cloud native batch computing project, focusing on high performance computing scenarios such as AI, big data, and genomics analysis. Its core capabilities include:
- Machine learning/Deep learning
- Bioinformatics/Genomics
- Other big data applications
• Unified Scheduling: Supports integrated job scheduling for both Kubernetes native workloads and mainstream computing frameworks (such as TensorFlow, Spark, PyTorch, Ray, Flink, etc.).
These types of applications typically run on generalized domain frameworks like TensorFlow, Spark, PyTorch, and MPI. Volcano is integrated with these frameworks to allow you to run your applications without adaptation efforts while enjoying remarkable batch scheduling.
• Queue Management: Provides multi-level queue management capabilities, enabling fine-grained resource quota control and task priority scheduling.
• Heterogeneous Device Support: Efficiently schedules heterogeneous devices like GPU and NPU, fully unleashing hardware computing potential.
• Network Topology Aware Scheduling: Greatly enhancing model training efficiency in AI distributed training scenarios.
• Multi-cluster Scheduling: Supports cross cluster job scheduling, improving resource pool management capabilities and achieving large scale load balancing.
• Online and Offline Workloads Colocation: Enables online and offline workloads colocation, improving cluster resource utilization through intelligent scheduling strategies.
• Load Aware Descheduling: Optimizing cluster load distribution and enhancing system stability.
As the industry's first cloud native batch computing engine, Volcano has been widely applied in high-performance computing scenarios such as artificial intelligence, big data, and genome sequencing, providing powerful support for enterprises to build elastic, efficient, and intelligent computing platforms.

View File

@ -15,8 +15,12 @@ type = "docs" # Do not modify.
+++
## What is Volcano
Volcano is a cloud native system for high-performance workloads, which has been accepted by [Cloud Native Computing Foundation (CNCF)](https://www.cncf.io/) as its first and only official container batch scheduling project. Volcano supports popular computing frameworks such as [Spark](https://spark.apache.org/), [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [Flink](https://flink.apache.org/), [Argo](https://argoproj.github.io/), [MindSpore](https://www.mindspore.cn/en), and [PaddlePaddle](https://www.paddlepaddle.org.cn/). Volcano also supports scheduling of computing resources on different architecture, such as x86, Arm, and Kunpeng.
Volcano is a cloud native system for high-performance workloads, which has been accepted by [Cloud Native Computing Foundation
(CNCF)](https://www.cncf.io/) as its first and only official container batch scheduling project. Volcano supports popular computing
frameworks such as [Spark](https://spark.apache.org/), [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/),
[Flink](https://flink.apache.org/), [Argo](https://argoproj.github.io/), [MindSpore](https://www.mindspore.cn/en),
[PaddlePaddle](https://www.paddlepaddle.org.cn/) and [Ray](https://www.ray.io/). Volcano also provides various scheduling capabilities including heterogeneous device scheduling, network topology-aware scheduling, multi-cluster scheduling, online-offline workloads colocation and so on.
## Why Volcano
Job scheduling and management become increasingly complex and critical for high-performance batch computing. Common requirements are as follows:
@ -27,25 +31,30 @@ Job scheduling and management become increasingly complex and critical for high-
Volcano is designed to cater to these requirements. In addition, Volcano inherits the design of Kubernetes APIs, allowing you to easily run applications that require high-performance computing on Kubernetes.
## Features
### Rich scheduling policies
Volcano supports a variety of scheduling policies:
### [Unified Scheduling](/en/docs/unified_scheduling/)
* Support native Kubernetes workload scheduling
* Provide complete support for frameworks like PyTorch, TensorFlow, Spark, Flink, Ray through VolcanoJob
* Unified scheduling for both online microservices and offline batch jobs to improve cluster resource utilization
* Gang scheduling
* Fair-share scheduling
* Queue scheduling
* Preemption scheduling
* Topology-based scheduling
* Reclaim
* Backfill
* Resource reservation
### Rich Scheduling Policies
* **Gang Scheduling**: Ensure all tasks of a job start simultaneously, suitable for distributed training and big data scenarios
* **Binpack Scheduling**: Optimize resource utilization through compact task allocation
* **Heterogeneous Device Scheduling**: Efficiently share GPU resources, support both CUDA and MIG modes for GPU scheduling, and NPU scheduling
* **Proportion/Capacity Scheduling**: Resource sharing/preemption/reclaim based on queue quotas
* **NodeGroup Scheduling**: Support node group affinity scheduling, implementing binding between queues and node groups
* **DRF Scheduling**: Support fair scheduling of multi-dimensional resources
* **SLA Scheduling**: Scheduling guarantee based on service quality
* **Task-topology Scheduling**: Support task topology-aware scheduling, optimizing performance for communication-intensive applications
* **NUMA Aware Scheduling**: Supports scheduling for NUMA architecture, optimizing resource allocation for tasks on multi-core processors, enhancing memory access efficiency and computational performance.
* ...
You can also configure plug-ins and actions to use custom scheduling policies.
### Enhanced job management
You can use enhanced job features of Volcano for high-performance computing:
Volcano supports custom plugins and actions to implement more scheduling algorithms.
* Multi-pod jobs
* Improved error handling
* Indexed jobs
### [Queue Resource Management](/en/docs/queue_resource_management/)
* Support multi-dimensional resource quota control (CPU, Memory, GPU, etc.)
* Provide multi-level queue structure and resource inheritance
* Support resource borrowing, reclaiming and preemption between queues
* Implement multi-tenant resource isolation and priority control
### Multi-architecture computing
Volcano can schedule computing resources from multiple architectures:
@ -56,24 +65,50 @@ Volcano can schedule computing resources from multiple architectures:
* Ascend
* GPU
### Faster scheduling
Compared with existing queue schedulers, Volcano shortens the average scheduling delay through a series of optimizations.
### Network Topology-aware Scheduling
* Supports network topology-aware scheduling, fully considering the network bandwidth characteristics between nodes. In AI scenarios, this network topology-aware scheduling effectively optimizes data transmission for communication-intensive distributed training tasks, significantly reducing communication overhead and improving model training speed and overall efficiency.
### Online and Offline Workloads Colocation
* Supports online and offline workloads colocation, enhancing resource utilization while ensuring online worloads QoS through unified scheduling, dynamic resource overcommitment, CPU burst, and resource isolation.
### Multi-cluster Scheduling
* Support cross-cluster job scheduling for larger-scale resource pool management and load balancing
> For more details about multi-cluster scheduling, see: [volcano-global](https://github.com/volcano-sh/volcano-global)
### Descheduling
* Support dynamic descheduling to optimize cluster load distribution and improve system stability
> For more details about descheduling, see: [descheduler](https://github.com/volcano-sh/descheduler)
### Monitoring and Observability
* Complete logging system
* Rich monitoring metrics
* Provides a dashboard, facilitating graphical interface operations for users.
> For more details about dashboard, see: [dashboard](https://github.com/volcano-sh/dashboard)
>
> For more details about volcano metrics, see: [metrics](https://github.com/volcano-sh/volcano/blob/master/docs/design/metrics.md)
## Ecosystem
Volcano allows you to use mainstream computing frameworks:
Volcano has become the de facto standard in batch computing scenarios and is widely used in the following high-performance computing frameworks:
* [Spark](https://spark.apache.org/)
* [TensorFlow](https://www.tensorflow.org/)
* [PyTorch](https://pytorch.org/)
* [Flink](https://flink.apache.org/)
* [Argo](https://argoproj.github.io/)
* [MindSpore](https://www.mindspore.cn/en)
* [Ray](https://www.ray.io/)
* [MindSpore](https://www.mindspore.cn/)
* [PaddlePaddle](https://www.paddlepaddle.org.cn/)
* [Open MPI](https://www.open-mpi.org/)
* [OpenMPI](https://www.open-mpi.org/)
* [Horovod](https://horovod.readthedocs.io/)
* [MXNet](https://mxnet.apache.org/)
* [Kubeflow](https://www.kubeflow.org/)
* [KubeGene](https://github.com/volcano-sh/kubegene)
* [Cromwell](https://cromwell.readthedocs.io/)
Volcano has been commercially used as the infrastructure scheduling engine by companies and organizations.
Additionally, Volcano has been widely adopted by various enterprises and organizations in the fields of AI and big data. With its powerful resource management capabilities, efficient job management mechanisms, and rich scheduling strategies (such as Gang scheduling, heterogeneous device scheduling, and topology-aware scheduling), it effectively meets the complex demands of distributed training and data analysis tasks. At the same time, Volcano enhances scheduling performance while ensuring the flexibility and reliability of task scheduling, providing strong support for enterprises to build an efficient resource utilization system.
## Future Outlook
Volcano will continue to expand its functional boundaries through community collaboration and technical innovation, becoming a leader in high-performance computing and cloud-native batch scheduling.

View File

@ -66,7 +66,7 @@ The community is committed to developing a system that helps running high-perfor
You can contribute in different areas, including filing issues, developing features, fixing critical bugs, and getting your work reviewed and merged.
If you have any question about the development process, visit the [Slack Channel](https://volcano-sh.slack.com) ([sign up](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
If you have any question about the development process, visit the [Slack Channel](https://cloud-native.slack.com/archives/C011GJDQS0N) ([sign up](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
or join our [mailing list](https://groups.google.com/forum/#!forum/volcano-sh).
#### Find Something to Work On

View File

@ -65,7 +65,7 @@ The community is committed to developing a system that helps running high-perfor
You can contribute in different areas, including filing issues, developing features, fixing critical bugs, and getting your work reviewed and merged.
If you have any question about the development process, visit the [Slack Channel](https://volcano-sh.slack.com) ([sign up](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
If you have any question about the development process, visit the [Slack Channel](https://cloud-native.slack.com/archives/C011GJDQS0N) ([sign up](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
or join our [mailing list](https://groups.google.com/forum/#!forum/volcano-sh).
#### Find Something to Work On

View File

@ -66,7 +66,7 @@ The community is committed to developing a system that helps running high-perfor
You can contribute in different areas, including filing issues, developing features, fixing critical bugs, and getting your work reviewed and merged.
If you have any question about the development process, visit the [Slack Channel](https://volcano-sh.slack.com) ([sign up](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
If you have any question about the development process, visit the [Slack Channel](https://cloud-native.slack.com/archives/C011GJDQS0N) ([sign up](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
or join our [mailing list](https://groups.google.com/forum/#!forum/volcano-sh).
#### Find Something to Work On

View File

@ -66,7 +66,7 @@ The community is committed to developing a system that helps running high-perfor
You can contribute in different areas, including filing issues, developing features, fixing critical bugs, and getting your work reviewed and merged.
If you have any question about the development process, visit the [Slack Channel](https://volcano-sh.slack.com) ([sign up](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
If you have any question about the development process, visit the [Slack Channel](https://cloud-native.slack.com/archives/C011GJDQS0N) ([sign up](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
or join our [mailing list](https://groups.google.com/forum/#!forum/volcano-sh).
#### Find Something to Work On

View File

@ -65,7 +65,7 @@ The community is committed to developing a system that helps running high-perfor
You can contribute in different areas, including filing issues, developing features, fixing critical bugs, and getting your work reviewed and merged.
If you have any question about the development process, visit the [Slack Channel](https://volcano-sh.slack.com) ([sign up](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
If you have any question about the development process, visit the [Slack Channel](https://cloud-native.slack.com/archives/C011GJDQS0N) ([sign up](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
or join our [mailing list](https://groups.google.com/forum/#!forum/volcano-sh).
#### Find Something to Work On

View File

@ -10,7 +10,7 @@ weight = 1
# Slide interval.
# Use `false` to disable animation or enter a time in ms, e.g. `5000` (5s).
interval = 600000
interval = 5000
# Minimum slide height.
# Specify a height to ensure a consistent height for each slide.
@ -41,44 +41,82 @@ height = "500px"
cta_icon = "graduation-cap"
[[item]]
title = "High Performance Powered by Efficient Scheduling"
content = "Computing jobs can be converted to Kubernetes workloads and scheduled in batches to deliver optimal performance."
title = "Network Topology Aware Scheduling"
content = "Supports network topology aware scheduling, significantly reducing application communication overhead between nodes, and greatly enhancing model training efficiency in AI distributed training scenarios"
align = "left"
#overlay_color = "#555" # An HTML color value.
#overlay_img = "headers/banner_02.png" # Image path relative to your `static/img/` folder.
overlay_filter = 0.05 # Darken the image. Value in range 0-1.
overlay_filter = 0.15
[[item]]
title = "Diverse Scheduling Policies"
content = "Co-scheduling, fair-share scheduling, gang scheduling, topologies, reservation/backfill, data-aware scheduling, and more"
title = "Online and Offline Workloads Colocation"
content = "Supports online and offline workloads colocation, enhancing resource utilization while ensuring online business QoS through unified scheduling, dynamic resource overcommitment, CPU burst capabilities, and resource isolation"
align = "left"
#overlay_color = "#333" # An HTML color value.
#overlay_img = "headers/banner_02.png" # Image path relative to your `static/img/` folder.
overlay_filter = 0.05 # Darken the image. Value in range 0-1.
[[item]]
title = "Enhanced Job Management"
content = "Managing jobs with multiple templates"
align = "left"
#overlay_color = "#333" # An HTML color value.
#overlay_img = "headers/banner_02.png" # Image path relative to your `static/img/` folder.
overlay_filter = 0.05 # Darken the image. Value in range 0-1.
overlay_filter = 0.20
[[item]]
title = "Multiple Runtimes"
content = "Singularity and GPU Accelerators"
title = "Multi Cluster Job Scheduling"
content = "Supports cross cluster job scheduling for larger scale resource pool management and load balancing"
content1 = "<a class=\"github-button\" href=\"https://github.com/volcano-sh/volcano-global\" data-size=\"large\" data-icon=\"octicon-star\" data-show-count=\"true\" aria-label=\"Star this on GitHub\">Star</a>"
align = "left"
#overlay_color = "#333" # An HTML color value.
#overlay_img = "headers/banner_02.png" # Image path relative to your `static/img/` folder.
overlay_filter = 0.15 # Darken the image. Value in range 0-1.
overlay_filter = 0.25
# Call to action button for multi-cluster scheduling
cta_label = "Learn more about volcano multi-cluster scheduling"
cta_url = "https://github.com/volcano-sh/volcano-global"
cta_icon_pack = "fas"
cta_icon = "graduation-cap"
[[item]]
title = "Hierarchical Queue Management"
content = "Supports multi-level queue management for fine-grained resource quota control"
align = "left"
overlay_filter = 0.3
[[item]]
title = "Descheduling"
content = "Supports dynamic descheduling to optimize cluster load distribution and improve system stability"
content1 = "<a class=\"github-button\" href=\"https://github.com/volcano-sh/descheduler\" data-size=\"large\" data-icon=\"octicon-star\" data-show-count=\"true\" aria-label=\"Star this on GitHub\">Star</a>"
align = "left"
overlay_filter = 0.35
# Call to action button for descheduling
cta_label = "Learn more about volcano descheduling"
cta_url = "https://github.com/volcano-sh/descheduler"
cta_icon_pack = "fas"
cta_icon = "graduation-cap"
[[item]]
title = "Multiple Scheduling Policies"
content = "Supports various scheduling strategies including Gang, Fair-Share, Binpack, DeviceShare, Capacity, Proportion, NUMA aware, and Task Topology, optimizing resource utilization"
align = "left"
overlay_filter = 0.4
[[item]]
title = "Heterogeneous Device Support"
content = "Supports scheduling for various heterogeneous devices like GPU and NPU, unleashing hardware computing power"
align = "left"
overlay_filter = 0.45
[[item]]
title = "High Performance Unified Scheduling"
content = "Supports native Kubernetes workload scheduling while providing complete support for frameworks like Ray, PyTorch, TensorFlow, Spark, and Flink through VolcanoJob, achieving unified job scheduling with excellent performance"
align = "left"
overlay_filter = 0.5
[[item]]
title = "Powerful Monitoring"
content = "Logging, metrics, and dashboard"
content = "Provides rich logging, monitoring metrics, and dashboards"
content1 = "<a class=\"github-button\" href=\"https://github.com/volcano-sh/dashboard\" data-size=\"large\" data-icon=\"octicon-star\" data-show-count=\"true\" aria-label=\"Star this on GitHub\">Star</a>"
align = "left"
#overlay_color = "#333" # An HTML color value.
#overlay_img = "headers/banner_02.png" # Image path relative to your `static/img/` folder.
overlay_filter = 0.15 # Darken the image. Value in range 0-1.
# overlay_color = "#333" # An HTML color value.
# overlay_img = "headers/volcano-slide-2.png" # Image path relative to your `static/img/` folder.
overlay_filter = 0.55 # Darken the image. Value in range 0-1.
# Call to action button for observability
cta_label = "Learn more about volcano dashboard"
cta_url = "https://github.com/volcano-sh/dashboard"
cta_icon_pack = "fas"
cta_icon = "graduation-cap"
+++

View File

@ -70,12 +70,12 @@ weight = 4
description = "The all-scenario deep learning framework developed by Huawei."
[[featured]]
img_src = "volcano_paddle.PNG"
img_width = "100px"
img_height = "60px"
name = "PaddlePaddle"
url = "https://www.paddlepaddle.org.cn/ "
description = "PaddlePaddle is an open source deep learning platform derived from industrial practice initiated by Baidu."
img_src = "ray_logo.png"
img_width = "300px"
img_height = "80px"
name = "Ray"
url = "https://github.com/ray-project/ray"
description = "Ray is a high-performance distributed computing framework that supports machine learning, deep learning, and distributed applications."
[[featured]]
img_src = "kubeflow.png"
@ -110,10 +110,10 @@ weight = 4
description = "A truly open source deep learning framework suited for flexible research prototyping and production."
[[featured]]
img_src = "kubegene_logo.png"
img_src = "volcano_paddle.PNG"
img_width = "100px"
img_height = "60px"
name = "KubeGene"
url = "https://github.com/kubegene/kubegene "
description = "The KubeGene is dedicated to making genome sequencing process simple, portable, and scalable."
name = "PaddlePaddle"
url = "https://www.paddlepaddle.org.cn/ "
description = "PaddlePaddle is an open source deep learning platform derived from industrial practice initiated by Baidu."
+++

View File

@ -51,19 +51,20 @@ interests = []
# link = "files/cv.pdf"
+++
Volcano是在Kubernetes上运行高性能工作负载的容器批量计算引擎。
它提供了Kubernetes目前缺少的一套机制这些机制通常是许多高性能
工作负载所必需的,包括:
Volcano是CNCF首个云原生批量计算项目专注于AI、大数据、基因分析等高性能计算场景。核心能力涉及
\- 机器学习/深度学习
\- 生物学计算/基因计算
\- 大数据应用
• 统一调度:支持 Kubernetes 原生负载及主流计算框架(如 TensorFlow、Spark、PyTorch、Ray、Flink等的一体化作业调度。
• 队列管理:提供多层级队列管理能力,实现精细化资源配额控制和任务优先级调度。
这些类型的应用程序通常运行在像Tensorflow、Spark、PyTorch、
MPI等通用领域框架上Volcano无缝对接这些框架。
• 异构设备支持高效调度GPU、NPU等异构设备充分释放硬件算力潜力。
***
• 网络拓扑感知支持网络拓扑感知调度显著降低跨节点间的应用通信开销在AI分布式训练场景中大幅提升模型训练效率
Volcano基于15年来使用多种系统和平台大规模运行各种高性能工作负载
的经验,并结合来自开源社区的最佳思想和实践。
• 多集群调度:支持跨集群作业调度,提升资源池管理能力,实现大规模负载均衡。
• 在离线混部:实现在线与离线任务混合部署,提升集群资源利用率。
• 负载感知重调度:支持负载感知重调度,优化集群负载分布,提升系统稳定性
作为业界首个云原生批量计算引擎Volcano已广泛应用于人工智能、大数据、基因测序等高性能计算场景为企业构建弹性、高效、智能的计算平台提供了强大支持。

View File

@ -2,7 +2,7 @@
title = "介绍"
date = 2019-01-28
lastmod = 2020-09-03
lastmod = 2025-01-09
draft = false # Is this a draft? true/false
toc = true # Show table of contents? true/false
@ -19,7 +19,7 @@ Volcano是[CNCF](https://www.cncf.io/) 下首个也是唯一的基于Kubernetes
少的一套机制这些机制通常是机器学习大数据应用、科学计算、特效渲染等多种高性能工作负载所需的。作为一个通用批处理平台Volcano与几乎所有的主流计算框
架无缝对接,如[Spark](https://spark.apache.org/) 、[TensorFlow](https://tensorflow.google.cn/) 、[PyTorch](https://pytorch.org/) 、
[Flink](https://flink.apache.org/) 、[Argo](https://argoproj.github.io/) 、[MindSpore](https://www.mindspore.cn/) 、
[PaddlePaddle](https://www.paddlepaddle.org.cn/) 等。它还提供了包括基于各种主流架构的CPU、GPU在内的异构设备混合调度能力。Volcano的设计
[PaddlePaddle](https://www.paddlepaddle.org.cn/)[Ray](https://www.ray.io/)等。它还提供了包括异构设备调度,网络拓扑感知调度,多集群调度,在离线混部调度等多种调度能力。Volcano的设计
理念建立在15年来多种系统和平台大规模运行各种高性能工作负载的使用经验之上并结合来自开源社区的最佳思想和实践。
## 由来
@ -33,26 +33,31 @@ Volcano是[CNCF](https://www.cncf.io/) 下首个也是唯一的基于Kubernetes
Volcano正是针对这些需求应运而生的。同时Volcano继承了Kubernetes接口的设计风格和核心概念。您可以在充分享受Volcano的高效性和便利性的同时不用改
变任何以前使用Kubernetes的习惯。
## 特性
### 丰富的调度策略
Volcano支持各种调度策略包括
* Gang-scheduling
* Fair-share scheduling
* Queue scheduling
* Preemption scheduling
* Topology-based scheduling
* Reclaims
* Backfill
* Resource Reservation
### [统一调度](/zh/docs/unified_scheduling/)
* 支持Kubernetes原生负载调度
* 支持使用VolcanoJob来进行PyTorch、TensorFlow、Spark、Flink、Ray等框架的一体化作业调度
* 将在线微服务和离线批处理作业统一调度,提升集群资源利用率
### 丰富的调度策略
* **Gang Scheduling**:确保作业的所有任务同时启动,适用于分布式训练、大数据等场景
* **Binpack Scheduling**:通过任务紧凑分配优化资源利用率
* **Heterogeneous device scheduling**高效共享GPU异构资源支持CUDA和MIG两种模式的GPU调度支持NPU调度
* **Proportion/Capacity Scheduling**:基于队列配额进行资源的共享/抢占/回收
* **NodeGroup Scheduling**:支持节点分组亲和性调度,实现队列与节点组的绑定关系
* **DRF Scheduling**:支持多维度资源的公平调度
* **SLA Scheduling**:基于服务质量的调度保障
* **Task-topology Scheduling**:支持任务拓扑感知调度,优化通信密集型应用性能
* **NUMA Aware Scheduling**支持NUMA架构的调度优化任务在多核处理器上的资源分配提升内存访问效率和计算性能
* ...
得益于可扩展性的架构设计Volcano支持用户自定义plugin和action以支持更多调度算法。
### 增强型的Job管理能力
Volcano提供了增强型的Job管理能力以适配高性能计算场景。这些特性罗列如下
* 多pod类型job
* 增强型的异常处理
* 可索引Job
### [队列资源管理](/zh/docs/queue_resource_management/)
* 支持多维度资源配额控制(CPU、内存、GPU等)
* 提供多层级队列结构和资源继承
* 支持队列间资源借用、回收与抢占
* 实现多租户资源隔离和优先级控制
### 异构设备的支持
Volcano提供了基于多种架构的计算资源的混合调度能力
@ -63,24 +68,50 @@ Volcano提供了基于多种架构的计算资源的混合调度能力
* 昇腾
* GPU
### 性能优化
与传统的队列调度器相比Volcano通过一系列的优化手段有效降低调度的平均时延等。
### 网络拓扑感知调度
* 支持基于网络拓扑的感知调度充分考虑节点间的网络带宽特性。在AI场景中针对分布式训练任务的通信密集型特点拓扑感知调度能够有效优化数据传输显著减少通信开销从而提升模型训练速度和整体效率。
### 在离线混部
* 支持在线和离线业务混合部署通过统一调度动态资源超卖CPU Burst资源隔离等能力提升资源利用率的同时保障在线业务QoS
### 多集群调度
* 支持作业跨集群调度将VolcanoJob的能力扩展到多集群实现更大规模的资源池管理
> Volcano多集群调度仓库详见[volcano-global](https://github.com/volcano-sh/volcano-global)
### 重调度
* 支持动态重调度,优化集群负载分布,提升系统稳定性
> Volcano重调度仓库详见[descheduler](https://github.com/volcano-sh/descheduler)
### 监控与可观测性
* 完整的日志系统
* 丰富的监控指标
* 提供可视化的Dashboard便于用户进行图形化界面操作
> Volcano Dashboard详见[dashboard](https://github.com/volcano-sh/dashboard)
>
> Volcano指标详见: [metrics](https://github.com/volcano-sh/volcano/blob/master/docs/design/metrics.md)
## 生态
Volcano已经支持几乎所有的主流计算框架
Volcano已经成为业界批量计算场景中的事实标准,并被广泛应用于以下高性能计算框架中
* [Spark](https://spark.apache.org/)
* [TensorFlow](https://tensorflow.google.cn/)
* [PyTorch](https://pytorch.org/)
* [Flink](https://flink.apache.org/)
* [Argo](https://argoproj.github.io/)
* [Ray](https://www.ray.io/)
* [MindSpore](https://www.mindspore.cn/)
* [PaddlePaddle](https://www.paddlepaddle.org.cn/)
* [OpenMPI](https://www.open-mpi.org/)
* [Horovod](https://horovod.readthedocs.io/)
* [mxnet](https://mxnet.apache.org/)
* [Kubeflow](https://www.kubeflow.org/)
* [KubeGene](https://github.com/volcano-sh/kubegene)
* [Cromwell](https://cromwell.readthedocs.io/)
另外Volcano已经被作为基础设施调度引擎被多个公司和组织采纳商用。
此外Volcano已被多个企业和组织广泛应用于AI和大数据领域。它通过强大的资源管理能力、高效的Job管理机制以及丰富的调度策略如Gang调度、异构设备调度、拓扑感知调度等有效满足了分布式训练、数据分析等任务的复杂需求。同时Volcano在提升调度性能的基础上兼顾了任务调度的灵活性和可靠性为企业打造高效的资源利用体系提供了有力支持。
## 未来展望
Volcano将继续通过社区协作和技术创新扩展其功能边界成为高性能计算和云原生批量调度的引领者。

View File

@ -63,7 +63,7 @@ Volcano 是一个社区驱动的开源项目,致力于打造健康、友好和
## 您的第一个贡献
我们将会帮助您在不同的领域做出贡献,如处理 issue、开发特性、修复关键 bug、检视您的代码并合入。如果您对开发流程还有疑问请查看[Slack Channel](https://volcano-sh.slack.com)( 注册[点击这里](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
我们将会帮助您在不同的领域做出贡献,如处理 issue、开发特性、修复关键 bug、检视您的代码并合入。如果您对开发流程还有疑问请查看[Slack Channel](https://cloud-native.slack.com/archives/C011GJDQS0N)( 注册[点击这里](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
也可以加入我们的[mailing list](https://groups.google.com/forum/#!forum/volcano-sh) 。
### 寻找您感兴趣的领域开展工作

View File

@ -62,7 +62,7 @@ Volcano 是一个社区驱动的开源项目,致力于打造健康、友好和
## 您的第一个贡献
我们将会帮助您在不同的领域做出贡献,如处理 issue、开发特性、修复关键 bug、检视您的代码并合入。如果您对开发流程还有疑问请查看[Slack Channel](https://volcano-sh.slack.com)( 注册[点击这里](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
我们将会帮助您在不同的领域做出贡献,如处理 issue、开发特性、修复关键 bug、检视您的代码并合入。如果您对开发流程还有疑问请查看[Slack Channel](https://cloud-native.slack.com/archives/C011GJDQS0N)( 注册[点击这里](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
也可以加入我们的[mailing list](https://groups.google.com/forum/#!forum/volcano-sh) 。
### 寻找您感兴趣的领域开展工作

View File

@ -63,7 +63,7 @@ Volcano 是一个社区驱动的开源项目,致力于打造健康、友好和
## 您的第一个贡献
我们将会帮助您在不同的领域做出贡献,如处理 issue、开发特性、修复关键 bug、检视您的代码并合入。如果您对开发流程还有疑问请查看[Slack Channel](https://volcano-sh.slack.com)( 注册[点击这里](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
我们将会帮助您在不同的领域做出贡献,如处理 issue、开发特性、修复关键 bug、检视您的代码并合入。如果您对开发流程还有疑问请查看[Slack Channel](https://cloud-native.slack.com/archives/C011GJDQS0N)( 注册[点击这里](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
也可以加入我们的[mailing list](https://groups.google.com/forum/#!forum/volcano-sh) 。
### 寻找您感兴趣的领域开展工作

View File

@ -63,7 +63,7 @@ Volcano 是一个社区驱动的开源项目,致力于打造健康、友好和
## 您的第一个贡献
我们将会帮助您在不同的领域做出贡献,如处理 issue、开发特性、修复关键 bug、检视您的代码并合入。如果您对开发流程还有疑问请查看[Slack Channel](https://volcano-sh.slack.com)( 注册[点击这里](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
我们将会帮助您在不同的领域做出贡献,如处理 issue、开发特性、修复关键 bug、检视您的代码并合入。如果您对开发流程还有疑问请查看[Slack Channel](https://cloud-native.slack.com/archives/C011GJDQS0N)( 注册[点击这里](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
也可以加入我们的[mailing list](https://groups.google.com/forum/#!forum/volcano-sh) 。
### 寻找您感兴趣的领域开展工作

View File

@ -62,7 +62,7 @@ Volcano 是一个社区驱动的开源项目,致力于打造健康、友好和
## 您的第一个贡献
我们将会帮助您在不同的领域做出贡献,如处理 issue、开发特性、修复关键 bug、检视您的代码并合入。如果您对开发流程还有疑问请查看[Slack Channel](https://volcano-sh.slack.com)( 注册[点击这里](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
我们将会帮助您在不同的领域做出贡献,如处理 issue、开发特性、修复关键 bug、检视您的代码并合入。如果您对开发流程还有疑问请查看[Slack Channel](https://cloud-native.slack.com/archives/C011GJDQS0N)( 注册[点击这里](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk))
也可以加入我们的[mailing list](https://groups.google.com/forum/#!forum/volcano-sh) 。
### 寻找您感兴趣的领域开展工作

View File

@ -10,7 +10,7 @@ weight = 1
# Slide interval.
# Use `false` to disable animation or enter a time in ms, e.g. `5000` (5s).
interval = 6000
interval = 5000
# Minimum slide height.
# Specify a height to ensure a consistent height for each slide.
@ -40,18 +40,15 @@ height = "500px"
cta_icon = "graduation-cap"
[[item]]
title = "高性能调度"
content = "将特定领域作业转化为Kubernetes负载并以绝佳的性能进行调度"
title = "网络拓扑感知调度"
content = "支持网络拓扑感知调度显著降低跨节点间的应用通信开销在AI分布式训练场景中大幅提升模型训练效率"
align = "left"
#overlay_color = "#555" # An HTML color value.
# overlay_img = "headers/volcano-slide-2.png" # Image path relative to your `static/img/` folder.
# #overlay_img = "headers/header-edge-2.jpg" # Image path relative to your `static/img/` folder.
overlay_filter = 0.15 # Darken the image. Value in range 0-1.
overlay_filter = 0.15
[[item]]
title = "多种调度策略"
content = "Co-scheduling, Fair-Share, Gang scheduling, Topologies, Reserve/BackFill, Data-aware Scheduling等"
title = "在离线混部"
content = "支持在线和离线业务混合部署通过统一调度动态资源超卖CPU Burst资源隔离等能力提升资源利用率的同时保障在线业务QoS"
align = "left"
# overlay_color = "#333" # An HTML color value.
@ -59,31 +56,77 @@ height = "500px"
overlay_filter = 0.20 # Darken the image. Value in range 0-1.
[[item]]
title = "增强的Job管理"
content = "多模板Job管理"
title = "多集群作业调度"
content = "支持作业跨集群调度,实现更大规模的资源池管理和负载均衡"
content1 = "<a class=\"github-button\" href=\"https://github.com/volcano-sh/volcano-global\" data-size=\"large\" data-icon=\"octicon-star\" data-show-count=\"true\" aria-label=\"Star this on GitHub\">Star</a>"
align = "left"
overlay_filter = 0.25
# overlay_color = "#333" # An HTML color value.
# overlay_img = "headers/volcano-slide-2.png" # Image path relative to your `static/img/` folder.
overlay_filter = 0.25 # Darken the image. Value in range 0-1.
# Call to action button for multi-cluster scheduling
cta_label = "深入了解Volcano多集群调度"
cta_url = "https://github.com/volcano-sh/volcano-global"
cta_icon_pack = "fas"
cta_icon = "graduation-cap"
[[item]]
title = "多运行时支持"
content = "Singularity和GPU加速器"
title = "层级队列管理"
content = "支持多层级队列管理,实现更精细的资源配额控制"
align = "left"
overlay_filter = 0.3
[[item]]
title = "负载感知重调度"
content = "支持负载感知重调度,优化集群负载分布,提升系统稳定性"
content1 = "<a class=\"github-button\" href=\"https://github.com/volcano-sh/descheduler\" data-size=\"large\" data-icon=\"octicon-star\" data-show-count=\"true\" aria-label=\"Star this on GitHub\">Star</a>"
align = "left"
# overlay_color = "#333" # An HTML color value.
# overlay_img = "headers/volcano-slide-2.png" # Image path relative to your `static/img/` folder.
overlay_filter = 0.35 # Darken the image. Value in range 0-1.
# Call to action button for descheduling
cta_label = "深入了解Volcano重调度"
cta_url = "https://github.com/volcano-sh/descheduler"
cta_icon_pack = "fas"
cta_icon = "graduation-cap"
[[item]]
title = "丰富的监控手段"
content = "日志、监控指标和仪表盘等"
title = "多种调度策略"
content = "支持 Gang、Fair-Share、Binpack、DeviceShare、Capacity、Proportion、NUMA aware、Task Topology等多种调度策略优化资源利用效率"
align = "left"
# overlay_color = "#333" # An HTML color value.
# overlay_img = "headers/volcano-slide-2.png" # Image path relative to your `static/img/` folder.
overlay_filter = 0.45 # Darken the image. Value in range 0-1.
overlay_filter = 0.4 # Darken the image. Value in range 0-1.
[[item]]
title = "异构设备支持"
content = "支持 GPU、NPU 等多种异构设备的调度,释放硬件算力"
align = "left"
overlay_filter = 0.45
[[item]]
title = "高性能统一调度"
content = "支持Kubernetes原生负载调度同时通过VolcanoJob为Ray、PyTorch、TensorFlow、Spark、Flink等框架提供完整支持以绝佳性能实现作业统一调度。"
align = "left"
overlay_filter = 0.5
[[item]]
title = "可观测性"
content = "提供丰富的日志、监控指标和Dashboard等"
content1 = "<a class=\"github-button\" href=\"https://github.com/volcano-sh/dashboard\" data-size=\"large\" data-icon=\"octicon-star\" data-show-count=\"true\" aria-label=\"Star this on GitHub\">Star</a>"
align = "left"
# overlay_color = "#333" # An HTML color value.
# overlay_img = "headers/volcano-slide-2.png" # Image path relative to your `static/img/` folder.
overlay_filter = 0.55 # Darken the image. Value in range 0-1.
# Call to action button for observability
cta_label = "深入了解Volcano Dashboard"
cta_url = "https://github.com/volcano-sh/dashboard"
cta_icon_pack = "fas"
cta_icon = "graduation-cap"
+++

View File

@ -69,12 +69,12 @@ weight = 4
description = "华为开发的全场景深度学习框架."
[[featured]]
img_src = "volcano_paddle.PNG"
img_width = "100px"
img_height = "60px"
name = "PaddlePaddle"
url = "https://www.paddlepaddle.org.cn/ "
description = "PaddlePaddle是一个由百度发起的工业实践衍生的开源深度学习平台."
img_src = "ray_logo.png"
img_width = "300px"
img_height = "80px"
name = "Ray"
url = "https://github.com/ray-project/ray"
description = "Ray是一个高性能分布式计算框架支持机器学习、深度学习和分布式应用程序。"
[[featured]]
img_src = "kubeflow.png"
@ -109,10 +109,10 @@ weight = 4
description = "一个真正的开源深度学习框架,适用于灵活的研究原型和生产."
[[featured]]
img_src = "kubegene_logo.png"
img_src = "volcano_paddle.PNG"
img_width = "100px"
img_height = "60px"
name = "KubeGene"
url = "https://github.com/kubegene/kubegene "
description = "KubeGene致力于简化便携式和可扩展的基因组测序过程."
name = "PaddlePaddle"
url = "https://www.paddlepaddle.org.cn/ "
description = "PaddlePaddle是一个由百度发起的工业实践衍生的开源深度学习平台."
+++

BIN
static/img/ray_logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB