Trainable OCR has a Winner

Contents

Introduction

The TrainableOCR problem allows OCR algorithm developers to test their algorithms on all the datasets provided with a minimum of effort.   Alternatively, an algorithm customer with a particular problem may submit a dataset of that captures the specific features of their intended application and subsequently view a league table of which algorithms performed best on their data.   See the example log file for an illustration of how the performance is reported at present.  There are plans to give greater feedback to developers by showing the images that each algorithm misclassified.

Interface

OCR Algorithm developers must provide code that conforms to the simple interface called algoval.ocr.TrainableOCRInterface.

This is a simple interface that refers to just one other interface: algoval.image.dataset.ImageDataset.   IMPORTANT: for a compromise between efficiency and convenience, images are represented as 2d arrays of bytes (byte[][]).  Unless otherwise stated for a particular for a particular dataset a black pixel is represented by '0' and a white pixel by '-1' - which is two's complement for 255.  In retrospect it may have been better to define an image as a 2d array of double or float, with black as 0.0 and white as 1.0; we chose the more efficient but also more awkward option of the 2d array of byte.

Note that the ImageDataset is a minimal interface for a dataset of this nature.  It does not embody any concept of distinct training and test sets.   This is seen as the job of a higher-level system.  Implementors need not be concerned with details of this, but for interest, details of how the evaluation works are described here.

A minimal legal algorithm

As a starting point for your own submission, take a look at algoval.ocr.eval.SillyOcr.  This is 'silly' in the sense that it always returns the same answer.

A simple working algorithm

The source code for a simple working implementation of a Naive Bayes classifier can be found here.

Submitting an Algorithm

Before you get started, you may want to download the developers pack for this problem.  This includes some sample classes to help get you started.  These will allow you to compile and run evaluation tests locally on your own machine.  Then, to make a submission, follow these steps:

  1. Ensure that you have properly implemented the algoval.ocr.TrainableOCRInterface.
  2. Create an archive (zip) file with all your java class files in.   You should include all the class files that you wrote as part of your submission, but not any of the algoval files.  NOTE: it is vital that your zip file preserves the package structure of your class files in the file paths for each class file. 
  3. Go to the Algorithm Upload page.  Fill out the form there, and click to submit when you're ready.  Two alternatives at present:
  4. When the job has been run you can view the results.  An example log file of the type that gets generated for TrainableOCR can be found here.  Note that the log file does not contain references to the actual algorithm and dataset under test

Performance Criteria

  • Code size:  The total number of bytes in all the class files used to implement the algorithm.  This does not include any classes in the algoval package.
  • Trained size: The total number of bytes used to store the trained state of the classifier.
  • Train time: The training time in milliseconds per pattern in the training set.  The time is quoted per pattern to allow meaningful comparisons between datasets of different sizes.
  • Rec time: The recognition time in milliseconds per pattern in the test set.
  • Save time: The time in milliseconds to save the state of a trained classifier.
  • Load time  The time in milliseconds to load a previously trained classifier state.
  • Train Accuracy: The percentage of times the classifier gave the actual pattern class the highest output on the training set.
  • Test Accuracy:   Same as above except for test set.

Note that other performance aspects are measured, but are not summarised in the league tables.  These other aspects such as confusion matrices and rank-based accuracy analysis are only available in the log file for each experiment.

The rank-based evaluation table works something like this.  Each element of the classification vector is tagged with the corresponding class id (0 .. (nClasses - 1) ).  The tagged elements are then sorted into order of highest classification output first.  This is termed the rank order, with the highest output having rank=1.   We then increment the count for the rank of the actual class.  A perfect classifier would always have the correct class at rank 1.  The cumulative ranked accuracy gives some idea of how useful the classifier will be in a system that utilities contextual knowledge to correct classifier outputs.

Submitting a Dataset

Please note that dataset submission has not yet been enabled for this problem, though we hope to enable this feature in the near future - we're currently reviewing possible formats for this.

Viewing Results

To view the current results on the problem go to TrainableOCR Results on PO Digits.

Appendix 1: Evaluation Notes

In any evaluation of a pattern recognition system it is important to estimate how well a trained classifier can be expected to perform on previously unseen data.  Hence we have the concept of two datasets: a training set, which is used to estimate the model's parameters, and a test set, which is used to measure performance on unseen data, and hence the generalaisation performance of the classifier.

It is also common to see use of a third set: an evaluation set.  Some systems (e.g. neural networks) can be overtrained.  This means that they can learn the peculiarities of a training set rather than its general properties.   The evaluation set is not used directly for training, but as a guide for when to stop training.  In this scenario, the training set is still used to estimate the model parameters, but training is stopped when performance on the evaluation set ceases to improve. 

In the TrainableOCRInterface, there is no explicit use of an evaluation set.  If a system wishes to make use of an evaluation set, it is up to that system to partition the training set into two parts: one used for training and one used as the evaluation set.