We hold this competition as part of ICDAR 2015 competitions. ICDAR is the premier international forum for researchers and practitioners in the document analysis community for identifying, encouraging and exchanging ideas on the state-of-the-art technology in document analysis, understanding, retrieval, and performance evaluation.
In this competition we provide images labeled with transcribed text and their positions. The differences between our dataset and that of ICDAR Robust Reading are:
There are two tasks that need be performed on the test images: text locating and recognition. Please refer to Task & Evaluation section below for evaluation metrics.
Our dataset consists of images, text line polygons, and text annotation for each text line. The text lines may be divided into one of the four categories:
The categories with "Translucent" indicate presence of translucent text, which may be used to encode website link, name of shop, contact information, reading the encoded text will help determine if the text is in line with anti-spam policy of the site hosting the images. Some samples of such images are shown below.
The categories with "Other" indicate presence of multilingual text comprising of Chinese and English in natural/Internet image. Some examples are:
The competition participants are encouraged to report their result for each tag separately.
Each line of the ground truth file describes a text line by four fields:
<translucent> <english_only> <polygon> <text>where:
There are two tasks in this competition: Text Locating and Text Recognition.
For text locating, given an input image, you should produce a set of polygons in the image, which will be deemed as text line candidates. For simplicity, we adopt evaluation method from ICDAR 2003 robust reading competition, with the only difference being that we use polygon intersection area rather than rectangle in evaluation.
For text recognition, given an image containing a single sentence, word or part of a sentence, you should output a sequence of characters denoting the text in that image. We evaluate the result by case-sensitive normalized edit distances.
In the training data, we provide coordinates of text lines. Participant only interested in cropped image recognition are free to crop the text line images with help of these coordinates, as long as they are not using additional human annotation in this process. For example, one can take a 15% larger text line image expanded from the given coordinates.
Submission portal will be launched soon.