Image recognition performance is an important quality attribute of the system.

Overview

The relevant schema diagram can be found here: Database Schema Diagram

When performing image recognition, the three important entities (tables) related to the processing of WTC files are TrackerFile, TrackerFilePage, and PublicationIssuePage. Each WTC file can hold many images. One tracker file may hold images from multiple pages. TrackerFiles have the pages (images) that they recognize listed in the TrackerFilePage, and may be redundant. Below is an example of multiple tracker files:

Tracker File

Tracker File IDIs Most RecentTracker File Data
1FalseBinary String
2FalseBinary String
3FalseBinary String
4TrueBinary String

Publication Issue Page

PageIDDate Created
11 Year ago
21 Year ago
31 Month ago
41 Day ago
51 Day ago

Tracker File Page

Tracker File IDPage ID
11
12
23
34
35
44
45

In the above example, Tracker files exist for all records, but an additional tracker file (Tracker File ID 4) is created for pages that are recently created.

 

The most recent tracker files should be compiled together and marked "IsRecent". This allows for image recognition tracker files to have duplicated images. The job that organizes these tracker files should be run one hour after the last image recognition file is uploaded to the Publisher Portal (so if someone uploads one, then one 20 minutes later, it should wait until an hour after the second one).