Using Triggers
Note: You can create triggers for batch data pipelines. You cannot create triggers for realtime pipelines.
In the Pipeline Studio, you can create a trigger on a batch data pipeline to have it run when one or more pipeline runs complete. These are called downstream and upstream pipelines. You create a trigger on the downstream pipeline so that it runs based on the completion of one or more upstream pipelines.
Triggers are useful when you want to:
Clean your data once and make it available to multiple downstream pipelines for consumption.
Share information such as runtime arguments and plugin configurations between pipelines. This is called payload configuration.
Having a set of dynamic pipelines that can run using the data of the hour/day/week/month, instead of a static pipeline that needs to be updated for every run.
For example, you have a dataset that contains all of the information about your companies’ shipments. You have several business questions that you want answered based on this data. So, you create one pipeline that cleanses the raw data about shipments, called Shipments Data Cleansing. Then you create a second pipeline, called Delayed Shipments USA, that reads the cleansed data and finds the shipments within the USA that were delayed by more than a specified threshold. The Delayed Shipments USA pipeline can be triggered as soon as the upstream Shipments Data Cleansing pipeline completes successfully.
Additionally, since the downstream pipeline consumes the output of the upstream pipeline, you want to specify that when the downstream pipeline runs using this trigger, it also receives the input directory to read from (which is the directory where the upstream pipeline generated its output). This is called passing payload configuration, which you define with runtime arguments. This enables you to have a set of dynamic pipelines that can run using the data of the hour/day/week/month, as opposed to a static pipeline that needs to be updated for every run.
Note: You can also use the Schedule Lifecycle Microservices to create inbound triggers.
Before you begin
In the Pipeline Studio, deploy the pipelines that are your upstream and downstream pipelines.
Optional: Set runtime arguments for your upstream pipeline
If you want to pass payload configuration as runtime arguments, set the runtime arguments for your upstream pipeline:
Go to the List page. In the Deployed tab, click the name of the upstream pipeline. The Deploy view for that pipeline appears.
Click the arrow to the right of the Run button.
Click the + button and fill in the Key and Value for your runtime argument.
Click Save.
Create an inbound trigger on a downstream pipeline
Starting in CDAP 6.8.0, you can create OR and AND triggers. OR triggers run the downstream pipeline when the event (Succeeds, Stops, Fails) for one of the upstream pipelines is met. AND triggers run the downstream pipeline when all of the events (Succeeds, Stops, Fails) for the upstream pipelines are met.
In CDAP 6.7.x and earlier, you can create only OR triggers.
When you upgrade to CDAP 6.8.x, all existing triggers are set as OR triggers.
To create an inbound trigger on a downstream pipeline, follow these steps:
Deploy both the upstream and downstream pipelines.
From the List > Deployed Pipeline page, click the name of the downstream pipeline. The pipeline opens in Deploy mode.