A surveillance cameras which can recognize what they see and understand the possible interactions thanks to visual intelligence.
Today we will talk to you about I2T, an acronym for “Image To Text”, that is, a computer-based visual system, developed by UCLA researchers in cooperation with Virginia-based ObjectVideo.
The software which constitutes the core of I2T contains visual algorithms which analyze the images and are capable of compile a sort of list of whatever they can see within the image. In practice, the image is broken down into a series of shapes, and the shapes are matched with a name.
The process of matching shapes with their names actually is based on a human factor. In fact, Song-Chun Zhu, the system developer, in 2005 participated in a project which involved cooperation from some art students, who were set to work on a catalog of more than 2 million images, identifying and classifying their contents in over 500 categories.
In practice, once the image is split into several shapes, these shapes are matched against the archive and the corresponding name is assigned to them. Furthermore, Image To Text is also capable of describing movements of objects within the footage and describe them with automatically generated sentences like “Man1 enters car at 23:45, leaves car at 26:14”.
Recurring objects can be stored and recognized, so a car previously seen will still be called Car1 when reentering the scene, instead of Car2.
The I2T system, which could certainly be useful for closed circuit video surveillance systems, needs a lot of improvement before being a real match for human cognitive capabilities, and hitting the market. Our friends working the night surveillance shift can rest safely, they are not losing their jobs!