Open Access Open Access  Restricted Access Subscription or Fee Access

Automatic Image Description Generation Using Deep Learning Techniques

Jinal Butala, Dr. Brijesh Bhatt


Automatic Image Description Generation methods are extremely useful for image retrieval, search and organization. Previous approaches either use the existing labelled dataset to compose sentences or compose a new description for the test image by exploiting available descriptions of the training images. In practice these methods have limited accuracy, hence if the most important objects in an image cannot be identified, then they cannot generate valid description. Another difficulty lies within the final description generation step; it is crucial to generate grammatically correct sentences.

In this paper, we propose a two-stage framework. In the first stage we predict image objects using convolutional neural network (CNN) network Alexnet1 and the second stage of our framework generates caption for the image using these objects. We segment images and then apply alexnet on each segment one by one and identify the objects present in image. Later, construct sentences using Computer Vision System Toolbox Cascade object detector (COD). Using this technique to identify image classes, we obtain BLEU score to be 51.96 for Alexnet based approach and 46.01 for CNN based approach. Note that BLEU2 (bilingual evaluation understudy) is an algorithm for evaluating the quality of machine-translated text.


Convolutional neural network, Deep learning, Alexnet, Artificial intelligence, Cascade object detector, Computer vision toolbox

Full Text:



  • There are currently no refbacks.