Apache NiFi processors are the basic blocks of creating a data flow. Every processor has different functionality, which contributes to the creation of output flowfile. Dataflow shown in the image below is fetching file from one directory using GetFile processor and storing it in another directory using PutFile processor.
GetFile process is used to fetch files of a specific format from a specific directory. It also provides other options to user for more control on fetching. We will discuss it in properties section below.
Following are the different settings of GetFile processor −
In the Name setting, a user can define any name for the processors either according to the project or by that, which makes the name more meaningful.
A user can enable or disable the processor using this setting.
This setting lets a user to add the penalty time duration, in the event of flowfile failure.
This setting is used to specify the yield time for processor. In this duration, the process is not scheduled again.
This setting is used to specify the log level of that processor.
This has a list of check of all the available relationship of that particular process. By checking the boxes, a user can program processor to terminate the flowfile on that event and do not send it further in the flow.
These are the following scheduling options offered by the GetFile processor −
You can either schedule the process on time basis by selecting time driven or a specified CRON string by selecting a CRON driver option.
This option is used to define the concurrent task schedule for this processor.
A user can define whether to run the processor in all nodes or only in Primary node by using this option.
It is used to define the time for time driven strategy or CRON expression for CRON driven strategy.
GetFile offers multiple properties as shown in the image below raging compulsory properties like Input directory and file filter to optional properties like Path Filter and Maximum file Size. A user can manage file fetching process using these properties.
This Section is used to specify any information about processor.
The PutFile processor is used to store the file from the data flow to a specific location.
The PutFile processor has the following settings −
In the Name setting, a user can define any name for the processors either according to the project or by that which makes the name more meaningful.
A user can enable or disable the processor using this setting.
This setting lets a user add the penalty time duration, in the event of flowfile failure.
This setting is used to specify the yield time for processor. In this duration, the process does not get scheduled again.
This setting is used to specify the log level of that processor.
This settings has a list of check of all the available relationship of that particular process. By checking the boxes, user can program processor to terminate the flowfile on that event and do not send it further in the flow.
These are the following scheduling options offered by the PutFile processor −
You can schedule the process on time basis either by selecting timer driven or a specified CRON string by selecting CRON driver option. There is also an Experimental strategy Event Driven, which will trigger the processor on a specific event.
This option is used to define the concurrent task schedule for this processor.
A user can define whether to run the processor in all nodes or only in primary node by using this option.
It is used to define the time for timer driven strategy or CRON expression for CRON driven strategy.
The PutFile processor provides properties like Directory to specify the output directory for the purpose of file transfer and others to manage the transfer as shown in the image below.
This Section is used to specify any information about processor.