Image recognition performance is an important quality attribute of the system.
The relevant schema diagram can be found here: Database Schema Diagram
When performing image recognition, the three important entities (tables) related to the processing of WTC files are TrackerFile, TrackerFilePage, and PublicationIssuePage. Each WTC file can hold many images. One tracker file may hold images from multiple pages. TrackerFiles have the pages (images) that they recognize listed in the TrackerFilePage, and may be redundant. Below is an example of multiple tracker files:
Tracker File
Tracker File ID | Is Most Recent | Tracker File Data |
---|---|---|
1 | False | Binary String |
2 | False | Binary String |
3 | False | Binary String |
4 | True | Binary String |
Publication Issue Page
PageID | Date Created |
---|---|
1 | 1 Year ago |
2 | 1 Year ago |
3 | 1 Month ago |
4 | 1 Day ago |
5 | 1 Day ago |
Tracker File Page
Tracker File ID | Page ID |
---|---|
1 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
3 | 5 |
4 | 4 |
4 | 5 |
In the above example, Tracker files exist for all records, but an additional tracker file (Tracker File ID 4) is created for pages that are recently created.
The most recent tracker files should be compiled together and marked "IsRecent". This allows for image recognition tracker files to have duplicated images. The job that organizes these tracker files should be run one hour after the last image recognition file is uploaded to the Publisher Portal (so if someone uploads one, then one 20 minutes later, it should wait until an hour after the second one).