Control Named Entity Recognition Processing Time
Named Entity Recognitionmatching is usually fast, but OpenText recommends that you set limits on processing so that your application can deal with a variety of input.
For example, for a large input text with a high density of matches, it can be time-consuming to retrieve all the matches. You must consider carefully whether your application requires all matches (which might number in the millions), or if it is enough to capture the first few hundred for any particular piece of input text.
You can control the number of matches to process by using the MaxMatchesPerDoc configuration parameter, which instructs a Named Entity Recognition session to stop searching for matches after a certain number of matches have been found. To stop searching for specific entities after a certain number of matches have been found, but continue searching for other entities, set EntityMatchLimitN.
The following configuration parameters also strong affect the number of matches you might obtain:
To control the amount of time that Named Entity Recognition can spend processing data, you can set the RequestTimeout configuration parameter.
In the Named Entity Recognition C, .NET, and Java SDKs, you can use the following steps:
-
Set the RequestTimeout configuration parameter. Alternatively:
-
For C, use
EdkSessionSetRequestTimeoutPreciseon an individual session. -
For .NET, use the
ITextExtractionSession::SetRequestTimeoutPrecisemethod. -
For Java, use the function
TextExtractionSession::setRequestTimeoutPreciseto set a timeout for the session.
In each case, the argument must be in milliseconds.
-
-
Get the current time in epoch milliseconds, for example by using
time()in C,System.DateTimeOffset.Now.ToUnixTimeMilliseconds()in .NET, orSystem.currentTimeMillis()in Java. -
In the Named Entity Recognition SDK, send the time in epoch milliseconds to the session by using one of the following options:
-
For C, use
EdkSessionSetStartTime, passing in the value you obtained from the previous step as the argument. -
For .NET, use
ITextExtractionSession::SetStartTimePrecise, passing in the epoch milliseconds value you obtained from the previous step as the argument. -
For Java, use
TextExtractionSession::setStartTime, passing in the epoch milliseconds value you obtained from the previous step as the argument.
-
-
Obtain matches in the usual fashion, by calling
EdkGetNextMatchin C, or by looping over the session object in .NET and Java.DEPRECATED: Do not use
EdkGetNextMatchTimedin C, which is deprecated in Named Entity Recognition (Eduction) SDK version 12.8.0 and later. -
Check for timeouts in the match loop. You can do this by calling
EdkGetMatchTimedOutin C, orTextExtractionSession::getTimedOutin Java, orITextExtractionSession::getTimedOutin .NET. If a timeout has occurred, you can break out of the loop, as required for your application.
In the Named Entity Recognition Python SDK, you can use the following steps:
-
Set the RequestTimeout configuration parameter. Alternatively, assign to the
.request_timeoutproperty of theEdkSessioninstance. The value can be a adatetime.timedeltainstance, or a number of seconds. You can use fractional numbers to set sub-second timeout limits. -
Call the
.start_request()method for theEdkSessionto begin timing.Alternatively, if you want to use an overall application timeout, you can obtain the current time when your application begins processing, and then later, set this as the
EdkSessionstart time. You can do this by passing thestart_timeparameter when the session is created, or assigning to the.start_timeproperty for the session. The value must be adatetime.datetimeinstance, for example obtained by callingdatetime.datetime.now(), or an epoch seconds value. -
Obtain matches in the usual fashion, by looping over the session instance.
-
Check for timeouts by testing the
.current_match_timed_outproperty of the session. A warning with categoryedk.error.EdkTimeOutWarningis raised when a match times out. You can use the Python warnings module to change how to handle this error (for example, whether it is logged, or causes an exception).
You can find examples of timeout handling in the sample programs provided in the Named Entity Recognition SDK release package.
TIP: If your application does significant processing before you call Named Entity Recognition, and you want to use an overall application timeout, obtain the current time in epoch milliseconds at the very start of your application processing rather than waiting until just before you call the Named Entity Recognition functions.