Universal Redaction
The NiFi Ingest package includes an example flow definition for universal redaction (Universal_Redaction.json).
Universal redaction accepts Knowledge Discovery document FlowFiles, and redacts personal information (PII) in both image and document file formats. The output from the dataflow is a redacted PDF of the original file.
For example, when the input is a Microsoft Word document, the dataflow uses a KeyViewExportToHtml processor to replace the Word document with an HTML representation of the content. The HTML is then rendered by a RenderHTML processor and is replaced by an image. Each word in the text (with its location in the rendered image) is added to the document metadata.
Alternatively, when the input is an image, Optical Character Recognition is used to extract the words and their locations.
Following these steps, regardless of the input file type, the FlowFile contains an image and metadata about the words in the image. Next, an Eduction (Named Entity Recognition) processor locates any PII in the text. A redacted copy of the text is added to the document metadata. The image is redacted by a Python script, executed using an ExecuteDocumentPython processor. Then, another Python script generates the final PDF document.
Universal redaction has the following outputs:
- Success - Processing was successful. The file in the input document FlowFile was replaced by a redacted PDF.
- Failure - Processing failed.
- Unprocessed - The file type is not supported, for example a document cannot be processed if it cannot be exported by the File Content Extraction HTML Export SDK.
To deploy the universal redaction flow
-
Ensure that you have installed all of the following components into the
extensionsfolder of your NiFi instance:idol-nifi-framework-api-naridol-nifi-framework-naridol-nifi-media-api-naridol-nifi-media-nar
-
Add a process group containing the universal redaction flow.
-
In the NiFi web interface, drag the process group icon
from the components toolbar to the canvas. - In the dialog box that opens, click upload
and select the universal redaction JSON flow definition. -
Click ADD.
A process group is added to the canvas, containing the dataflow described in the file.
-
-
Obtain the ID of your IdolLicenseServiceImpl. Each NiFi instance can contain only one active IDOL License Service, so the universal redaction flow does not include one.
-
Right-click a blank area of the canvas and click Controller Services.
-
Find your IdolLicenseServiceImpl and click
followed by View Configuration.The Controller Service Details dialog box opens.
-
Click the Settings tab and copy the service ID.
-
-
Provide the ID of the IDOL License Service to the universal redaction flow, to license all of the controller services and processors that it contains.
-
Right-click the Universal Redaction process group and click Parameters.
The Edit Parameter Context dialog box opens.
-
On the Parameters tab, find the Idol License Service parameter and click
followed by Edit.The Edit Parameter dialog box opens.
-
In the Value box, enter the ID of your IDOL License Service that you copied in the previous step:
- Click Ok.
-
-
Enable the universal redaction controller services.
-
Right-click the Universal Redaction process group and click Controller Services.
The Controller Services page opens. You should see your IDOL License Service (in the "NiFi Flow" scope), and some new services (in the "Universal_Redaction" scope).
-
Enable each of the new services in turn (click
followed by Enable). The Media Service must be enabled last because it depends on KeyView.KeyViewExportServiceImplKeyViewFilterServiceImplMediaServiceImpl
-
-
In the universal redaction process group, configure the Eduction processor and choose the entities that you want to redact.
-
Connect the Universal Redaction process group (supply some input and connect the output relationships).
You can now start the components and begin redacting files.
TIP: To start all of the universal redaction components, select the process group and then click start (
) in the Operation area.