Parallel Job
This job type is used to execute several loads or jobs simultaneously. Parallel jobs come in two forms:
- Several jobs that are started at the same time and are executed simultaneously. Normally, a job of any other type is always executed in single mode, i.e., without another job running simultaneously, by queuing the job that is started later. Note that data previews and component tests are always executed simultaneously.
- Loads or sub-jobs that are executed in parallel, or sequentially, as in a process chain.
The definition of the job is similar to the standard job, with an additional "parallel" flag for all loads or sub-jobs that define its synchronization behavior. Every job of some other type (e.g. Groovy or switch) can be parallelized by simply creating new parallel job and adding the other job to it. Also, a parallel access to Microsoft Access DB is not possible due to restrictions of Microsoft Access Driver.
Example:
Job with three loads. An "x" denotes that the load is defined as parallel.
Load1 |
Load2 |
Load3 |
Explication |
- |
- |
- |
Executed sequentially |
X |
X |
X |
Executed in parallel |
X |
X |
- |
Load1 and Load2 in parallel, and when both are finished, Load3 is started |
- |
X |
X |
When Load1 is finished, Load2 and Load3 are executed in parallel |
Parallel jobs or loads are useful when there are several jobs or loads that are processing distinct data, such as when there are multiple cube-type loads, each of which writes into a distinct data area (such as various cubes, or different areas within the same cube). If the same source system is used in the parallelized loads, the performance gain of parallelization of course also depends on the parallelization capabilities of this source system.
Jobs or loads that are defined as parallel will use multiple CPU cores. However, the processing inside a single component itself (e.g. a complex transform) is always single-threaded. The amount of required memory will also be higher if parallel jobs are used, as the memory required by the various jobs will be used at the same time. The amount of parallel jobs that will be executed at the same time by default is limited to 5; this limit can be changed via a configuration option. If the amount of running parallel jobs reaches the configured limit, further parallel jobs will be queued and started when execution slots become available.
If a parallel job (i.e., a job of type "parallel") and a single job (i.e. a job of any other type) are queued, the parallel job will be executed first, independent of the order in which the jobs have been started. If several single jobs or several parallel jobs are queued, the starting order is respected. For example, a parallel job P can be included in a single job S, and the parallel sub-jobs of P will be executed in parallel. But if another parallel job is started independently, it will get queued until the job S is finished.
In a Parallel job execution, the Integrator Monitor shows complex parallel jobs as a nested structure, making it easier to trace their execution. The blue lines in the Integrator Monitor give a better overview of parent/child Parallel job executions.
Fail on status
If the job executes several loads or sub-jobs, the selected option defines the behavior in case of a warning or an error message in one of the loads or sub-jobs. The options are described below.
none | All subsequent loads or sub-jobs are executed even if errors or warnings occur. The job terminates with "Completed with warnings" or "Completed with errors" or "Completed successfully". |
error | In case of an error message, the job terminates without executing subsequent loads or sub-jobs and the job terminates with status "Failed". In case of warnings subsequent loads or sub-jobs are executed and the job terminates with "Completed with warnings". |
warning | In case of a warning or an error message, the job terminates without executing subsequent loads or sub-jobs and the job terminates with status "Failed". |
inherit | If the job is executed directly (without parent job) it uses failOnStatus "error". Otherwise if the job is used as a sub-job it inherits the failOnStatus of its parent job (see corresponding descriptions of these failOnStatus options above). |
Notes
-
Suppose at the beginning of a parallel execution there is already another running parallel execution. In that case, the new one will only have one parallel subexecution, i.e., no effective parallelism happens inside the execution. Therefore, as the best practice, at any given moment, just run only one execution of a parallel job with multiple parallel subexecutions. For this "interior parallelization" the limit of maxParallelSubExecutions (default: 5) applies.
-
Other parallel jobs that do not have parallel subexecutions can be efficiently executed simultaneously. For this "exterior parallelization " the limit maxParallelWriteExecutions (default: 5) applies.
Updated November 4, 2024