Call Kettle (PDI) from action sequence (xaction)
Call Kettle (PDI) from action sequence (xaction)
Contents:
The Pentaho BI Server is capable of Enterprise Information Integration (EII). It contains a Data Integration engine, enabling it to run ETL transformations and jobs. You can use an action sequence to call a transformation/job from a KTR/KJB file or PDI repository.
Before we start, you should be aware of the version of the PDI engine present in your BI Server. The KTR/KJB files still need to be created in PDI, so compatibility is important. For example, Pentaho 1.6 and 1.7 embed PDI 3.0, and Pentaho 2.0 embeds PDI 3.1.
Once you have a working transformation designed in PDI, open the Design Studio to create an action sequence (xaction).
- in "Process Actions", click "+" > "Get Data From" > "Pentaho Data Integration"
- enter the location of the transformation file (this can be relative, so you don't need to specify the path if the KTR/KJB file is in the same directory as the xaction)
- give the "Transformation Step" in the KTR/KJB to pull data from
- define an "Output Name"
- define inputs in the "Process Inputs"
- under "Transformation Inputs" in the "Pentaho Data Integration" process action, add the same inputs (they should be available in the "+" menu) in the order they will be used in the KTR/KJB
- open the KTR/KJB file in Kettle/PDI
- add a "Get System Info" step prior to the step that needs the variables (we'll assume you using "Table Input")
- in there, add the inputs with a Type of "command line argument 1", "command line argument 2", etc. in order of where they'll be used
- in the next step, use a question mark ("?") to refer to each input
- inputs are added the the data stream, and so are resolved in the order defined in the "Get System Info" step
- if the input is a date, use "date(?)" instead
- in the "Table Input" step, tick the "Replace variables in script?" box and enter the name of the "Get System Info" step in the "Insert data from step" text box
- in "Process Actions", click "+" > "Report" > "Pentaho Report"
- use the "Output Name" (defined above) in the "Report Data" field
- when creating the template in Report Designer, add an empty default Pentaho Dataset
- you will have to manually type in field names, as Report Designer doesn't know about the queries in Kettle