FilterPagesLuaScript
The path of a Lua script that contains a custom function to use for deciding whether to ingest pages.
NOTE: This feature is available only if your Web Connector license includes dynamic corpus functionality.
The script must contain a function named shouldIngestPage that returns true to ingest the page or false to ignore it. You can optionally return a list of links to override the links were extracted from the page by the connector. For example, if you want to ingest the page but not follow any of the links on the page, you can return true but specify an empty list.
The function should look like this:
function shouldIngestPage(url, contentType, contentFilename, textContentFilename, links, depth)
-- do something to decide return value...
-- to ingest the page and follow links extracted by the connector
return true
-- to ingest the page but not follow any links
return true, {}
-- to ignore the page
return false
end
The arguments supplied to the function are:
| Argument | Type | Description |
|---|---|---|
url
|
string | The page URL. |
contentType
|
string | The MIME content type. |
contentFilename
|
string | The path to the file that contains the page content. |
textContentFilename
|
string | The path to the file that contains the text that was extracted from the page (or nil if text could not be extracted). |
links
|
list of strings | The links that were extracted from the page. |
depth
|
integer | The page depth (the number of links that were followed from the starting point in order to reach the page). |
An example script, FilterPages_binarycat.lua, is included with the connector. This script decides whether to ingest a page by calling the IDOL Category component and running the action BinaryCatQuery.
| Type: | String (file path) |
| Default: | |
| Required: | No |
| Configuration Section: | TaskName or FetchTasks |
| Example: | FilterPagesLuaScript=MyCustomPageSelection.lua
|
| See Also: |