Retrieve Documents from a Repository
The following table lists the fetch actions that retrieve information from a repository.
| Action | Description | Method to override |
|---|---|---|
action=fetch&fetchaction=synchronize
|
Sends ingest commands to the ingest target to bring it up to date with what is contained the repository. | synchronize
|
action=fetch&fetchaction=synchronize&identifiers=...
|
Forces a synchronize of the documents listed by the identifiers action parameter, whether they have changed or not. Ingest deletes are sent to the ingest target if the documents have been deleted. |
synchronizeIds
|
action=fetch&fetchaction=Collect
|
Retrieve content and metadata of specified documents from the repository. | collect
|
action=View
|
Retrieve a single document from the repository. | view
|
The synchronize action has already been demonstrated in A Complete Synchronize Action and Make an Incremental Synchronize Action.
synchronizeIds, collect, and viewDocInfo objects. These contain no metadata or content, but each contains the identifier of a document to retrieve. Your connector must try to set the content and metadata for these documents from the repository. For an individual DocInfo (doc), indicate success or failure to retrieve the document by performing the operations shown in the following table:
| Method | Success operation | Failure operation |
|---|---|---|
synchronizeIds
|
doc.success();
|
doc.failed(message);
|
collect
|
|
|
view
|
Return as normal | Throw an exception from the view method. |
You can throw exceptions for any fatal errors, such as network failures that cause the retrieval of all documents to fail, from any of the methods.
View and Collect Example
The collect fetch action and view action might appear to be very similar actions, and often they are implemented to share most of the implementation. However, there are some important differences:
Collectis an asynchronous fetch action. It should be able to handle stop requests if the action is likely to take some time.Viewis a synchronous action, so it should be quick to execute.Collectretrieves the content (file or text) and metadata for multiple documents when provided with the document identifiers.Viewretrieves the content (file) for a single document when provided with the document's identifier; metadata might also be retrieved but is discarded later.Collectshould handle any exception that might occur from an individual document so that remaining documents are still processed.Viewmight throw any exception caused by the attempt to retrieve the single document.
The following sample code shows how collect and view might be implemented for a basic connector, using the file system as a repository (like the connector introduced in Make an Incremental Synchronize Action):
void collect(const CollectTask& task)
{
const DocInfoList& documents = task.documents();
for (std::size_t ii = 0; ii < documents.size(); ++ii)
{
DocInfo docInfo = documents[ii];
try
{
collectDocument(task, docInfo);
docInfo.success();
}
catch (std::exception& ex)
{
docInfo.failed(ex.what());
}
}
}
void view(const ViewTask& task)
{
collectDocument(task, task.document());
}
private:
void collectDocument(const ConnectorTask& task, DocInfo& docInfo)
{
std::string filename = docInfo.id().reference();
if (boost::filesystem::exists(filename))
{
docInfo.setFile(task.docInfoBuilder().createDocFile(filename, false));
}
else
{
throw std::runtime_error("File Not Found: " + filename);
}
}
The view action is provided with a single document, task.document(), while the collect action is provided with multiple documents, task.documents(). Each document must be populated from the repository. setFile on a document to associate a file (if there is one) and update the document with any metadata.
If the information is successfully retrieved from the repository and set in the file, call the success method to indicate that the document was handled successfully. If there is a problem, call the failed method with a description of the error that can be reported to the user. If collect calls neither the nor the success method for a document, failure is assumed and a warning message is written to the logs.failed