An Episodic Learning Network for Text Detection on Human Bodies in Sports Images

Article Type

Research Article

Publication Title

IEEE Transactions on Circuits and Systems for Video Technology


Due to the proliferation of sports-related multimedia content on the WWW, effective visual search and retrieval present interesting research challenges. These are caused by poor image quality, a wide range of possible camera points of view, pose variations on the part of athletes engaged in playing a sport, deformations of text appearing on sports person's clothing and uniforms in motion, occlusions caused by other objects, etc. To address these challenges, this paper presents a new method for detecting text on human bodies in sports images. Unlike most existing methods, which attempt to exploit locations of a player's torso, face, and skin, we propose an end-to-end episodic learning approach that employs inductive learning criteria for detecting clothing regions in an image, which are, in turn, then used for text detection. Our method integrates a Residual Network (ResNet) and Pyramidal Pooling Module (PPM) for generating a spatial attention map. The Progressive Scalable Expansion Algorithm (PSE) is adapted for text detection from these regions. Experimental results on our own dataset as well as several benchmarks (like RBNR and MMM which contain images of runners in marathons, and Re-ID which is a person re-identification dataset) demonstrate that the proposed method outperforms existing methods in terms of precision and F1-score. We also present results for sports images chosen from natural scene text detection datasets such as CTW1500 and MS-COCO to show the proposed method is effective and reliable across a range of inputs.

First Page


Last Page




Publication Date


This document is currently not available here.