DataEngine Workflow Designer
Caiyun Changtian DataEngine is a distributed, visual, and user-friendly open-source big data workflow scheduling system. It is designed to address dependency relationships among various tasks in complex data pipelines and provides a visual interface to manage these tasks and processes.
- Visual DesignEasy Creation
- Distributed RunHigh Availability
- Rich APIsEasy Integration
Details
Its core features and advantages:
1. Visual Drag-and-Drop Workflow Design:
• This is its most prominent feature. Users can define task nodes (such as Spark, Hive, Python, Shell, etc.) through simple drag-and-drop operations and set dependencies by connecting lines, thereby constructing complex data processing workflows (DAG diagrams). This significantly lowers the barrier to creating and maintaining workflows.
2. High Availability and Distributed Architecture:
• It adopts a decentralized Master-Worker architecture, supports horizontal scaling, and allows for multiple master and worker nodes. It offers excellent fault tolerance, ensuring that the failure of a single node does not affect the overall service operation.
3. Rich Support for Task Types:
• It supports processing various types of tasks, including but not limited to:
• Data Processing: Spark, Hive, MR, Flink, Python, Shell
• Data Synchronization: Sqoop, DataX
• Data Quality: Great Expectations, Deequ
• Notifications: Email, SMS, and Webhook, among others
• It also supports sub-workflows and custom task types.
4. Powerful Operational Controls:
• Users can easily perform timed scheduling, one-click start/stop, real-time monitoring, viewing history, and rerunning/recovering failed tasks for workflows.
• It provides a Gantt chart view to clearly display task execution duration and dependency relationships.
5. Multi-Tenancy and Permission Management:
• It supports the division of resource files by user and boasts a comprehensive permission control system. It can integrate well with internal corporate permission systems (such as LDAP) to ensure data security.
6. Rich API Calls:
• Provides comprehensive RESTful API interfaces covering all core functionalities including workflow creation, task scheduling, status monitoring, and execution history queries. Detailed API documentation and usage examples are included, allowing for easy integration with third-party systems and custom automated management scripts.