The First IEEE Workshop on Visual Place Categorization (VPC '09)

In Association with CVPR 2009, Miami Beach, Florida, June 21, 2009




CVPR 2009

[Setup of Data Capture] [Get the Dataset] [Dataset organization] [Annotation of the dataset] [Development kit] [Baseline evaluation utility]

Setup of Data Capture

We want to provide a dataset consisting of videos captured autonomously. We used a rolling tripod plus a HD camcorder (JVC GR-HD1) to mimic a robot, and collected videos from 6 home environments.

The operator was mimicking a robot during the data collection: he was not looking at either the captured frames or the objects/furnitures in the rooms. Rather, the operator just traversed all traversable areas in a room, and made sure that the tripod+camcorder system would not hit into any obstacles.

We recommend the VPC workshop participants to explore this dataset.

Get the dataset

We collected two types of videos from each home. The first type of videos were collected using the tripod+camcorder system as described above. We use these videos to generate the VPC dataset.

The VPC dataset were generated by extracting every 3 frames from the videos as JPEG (95% quality) images. Each image is 1280x720 in resolution. The VPC dataset can be downloaded from here: (please notify wujx2001 AT if any of these files are broken or the bandwidth limit has been exceeded.)

Home index

File size

Home 1


Home 2


Home 3


Home 4


Home 5


Home 6


NOTE: We recommend using a software that supports resuming from partially downloaded files, since the dataset files are huge.

The second type of videos provided 360 degree views of rooms. We fixed the camcorder inside each room, and took video by slowly rotating the camcorder on the tripod. Due to storage and bandwidth limitation of our web server, we are unable to put these videos here for public downloading. Please contact Jianxin Wu (wujx AT if you are interested in acquiring these videos.

Dataset organization

Each .zip file contains all the frames collected from one home. For example, contains a directory called “Home1”, which in turn has 3 sub-directories. The sub-directory “0/” contains all the frames from the basement of Home 1. Similarly, sub-directory “1/” contain frames from the ground floor, and “2/” for second floor. Inside each sub-directory, frames are sequentially named as “00000000.jpg”, “00000001.jpg”, etc.

After downloading all the files, the first step is to create your VPC base directory. Below is an example sequence of commands under Linux, assuming “/data/VPC_Data” is the base directory:

mkdir /data/VPC_Data

cd /data/VPC_Data

# now download all Home*.zip file into this directory






Please appropriate software under Windows or Mac to organize the downloaded data. Your data needs to be organized in the structure as shown in the left figure, if you want to use the provided development kit.

Annotation of the dataset

A file “label.txt” under each “Home?/” directory contains annotations (labels) for this dataset. There are 11 categories (see VPC.h in the development kit for category names). We used a special category name “transition” to annotate video segments that are either difficult to categorize or contain more than 2 categories.

Labels are provided for video segments (i.e. a number of continuous frames), e.g. we can label the video segment from 00000234.jpg to 00001567 as “bedroom”. The structure of label.txt can be described by the following extended Backus-Naur Form:

Annotations = { floor } , ending line

EOL = “\n”

ending line = “-1 -1 end” , EOL

floor = sub-directory name, EOL, { video segment}, ending line

subdirectory name = string, EOL

video segment = starting frame index, ending frame index, category name, EOL

starting frame index = non-negative integer

ending frame index = non-negative integer

category name = string

where “non-negative integer” and “string” are defined as usual non-negative integers and character strings.

Development kit

We prepared a development kit for reading the label.txt and traverse all frames in all homes, written in the C++ programming language. For using the C++ code to access the VPC dataset, please carefully read the comments in VPC.h and VPC_IO.cpp. The comments also provides a way to help access the VPC dataset using other programming languages.

The development kit is available from here.

NOTE: If you downloaded a previous version of the toolkit, please note that I have changed a comment in VPC_IO.cpp about meaning of the 'floor' variable. 'floor==0' means that this floor is the first floor (sub-directory) which is included in the label.txt file, in which I have always chosen to be “1/”, i.e. the ground floor (first floor). The previous comment that “floor==0 means basement is wrong.

Baseline evaluation utility

In the VPC workshop Call for Papers we are inviting new methods for evaluating the performance of a visual place categorization system. In the mean time, a baseline evaluation system is included in the development kit. The baseline evaluation method uses a leave-one-out strategy and calculates per-frame accuracy.

Not all the 12 room categories are available in all homes, thus we recommend using only 5 categories at this time, bedroom, bathroom, kitchen, living_room, & dining_room. When you use the development kit, it is equivalent to setting categories[1]=categories[2]=categories[3]=categories[5]=categories[6]=true. Please read the README.txt and comments in source codes for details.

The development kit is tested using g++ 4.3.2 & Microsoft Visual Studio 2008. Please read the README.txt and comments in the .cpp files for more details.

Page created and maintained by Jianxin Wu (wujx2001 AT