People search with textual queries

The demo application presented in this page has been developed to show how the people search system we proposed works. This system is able to search for individuals that match a textual query about their clothing appearance, into a given set of images or videos. An example of such a query is: "people wearing a long sleeves, white upper garment, and shorts".
For example, this functionality can be used in the following applications:
  • forensic analysis, for helping an operator to find individuals in surveillance videos, based on a textual description given by an eyewitness;
  • personal photo management, for searching and tagging photos of individuals exhibiting a given clothing appearance;
  • cultural heritage applications, for example, for retrieving pictures of individuals wearing traditional clothing.
This demo consists of a web interface that allows the user to formulate queries involving a predefined set of clothing characteristics (attributes), and to retrieve the corresponding images from the VIPER data set, one of the most used benchmarks for person re-identification tasks.
In this demo, the attributes are related to clothing colours of the upper part (torso and arms) and lower part (legs) of the body. These body parts are detected using a method based on pictorial structures.
For each attribute, a distinct detector is built. Given the image of an individual, each detector takes as input the image descriptor, and outputs an estimate of the probability that the corresponding attribute is present in that image. Detectors are implemented using the support vector machine (SVM) classification algorithm, and are trained on a separate subset of VIPER. The image descriptor used in this demo is the same one we proposed for person re-identification tasks.
The system is able to process simple queries that involve just one attribute (e.g., "people wearing a white upper garment") as well as complex queries obtained by Boolean combination of different attributes (e.g., "people wearing a white upper garment AND blue pants"). For each image, the outputs of the detectors involved in the query are combined into a final relevance score, taking into account the corresponding Boolean operators (for simple queries, the score is defined as the output of the corresponding detector). Finally, the images of the data set are shown to the operator, sorted for decreasing values of their relevance score.