Clustering of Web Search Queries to Identify User's Intent.

Date of Submission

December 2015

Date of Award

Winter 12-12-2016

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)


Parui, Swapan Kumar (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

Over the past two decades, the expectation of people from a Search Engine has changed drastically. Today, people use Search Engine to accomplish task like finding a restaurant, searching for review, finding forum solution, booking a travel plan instead of looking up website lists. But Search Engine still serves to provide a list of web-sites on the input query. In this dissertation work, we propose a method to identify User Intention behind a search query generated by the User. This work paves the first step towards identifying and hence serving the User Task.First we try to find a keyword representation of the queries which is used to group the keywords into keyword groups that convey the same intent. Hence, the grouping proposed not only uses the Query string but as well the User ID, Location of User, Clicked Results, etc as meta data to the query to generate a set of robust keyword group invariant to spelling and intention. This process is aided by a Wikipedia Database for reference. We then propose few Unsupervised Clustering Models to cluster the Web Search Queries into User Intent clusters. All these set of clusters are evaluated manually to assess the pros and cons of the proposed clustering models. Hence, we conclude by generating User’s Intent Clustering technique and also identifying properties of an ideal clustering technique for Web Search Queries.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.