FREE ELECTRONIC LIBRARY - Theses, dissertations, documentation

Pages:   || 2 | 3 |

«ABSTRACT Unmanned aerial vehicles with high quality video cameras are able to provide videos from 50,000 feet up that show a surprising amount of ...»

-- [ Page 1 ] --

Object and Event Recognition for Aerial Surveillance

Yi Li, Indriyati Atmosukarto, Masaharu Kobashi, Jenny Yuen and Linda G. Shapiro

University of Washington, Department of Computer Science and Engineering, Box 352350,

Seattle, WA 98195-2350, U.S.A.


Unmanned aerial vehicles with high quality video cameras are able to provide videos from 50,000 feet up that

show a surprising amount of detail on the ground. These videos are difficult to analyze, because the airplane moves, the camera zooms in and out and vibrates, and the moving objects of interest can be in the scene, out of the scene, or partly occluded. Recognizing both the moving and static objects is important in order to find events of interest to human analysts. In this paper, we describe our approach to object and event recognition using multiple stages of classification.

Keywords: object recognition, event recognition, machine learning, aerial surveillance

1. INTRODUCTION Unmanned aerial vehicles (UAVs) are able to provide large amounts of video data over terrain of interest to defense and intelligence agencies. These agencies are looking for significant events that may be of importance for their missions. However, most of this footage will be eventless and therefore of no interest to the analysts responsible for checking it. If a computer system could scan the videos for potential events of interest, it would greatly lessen the work of the analyst, allowing them to focus on the events of possible importance.

Several different processes are needed for the computer analysis of aerial videos. First, the static objects in the video frames must be recognized to determine the context of the events. Static objects might include forests, fields, roads, runways, and buildings, among others. Next the moving objects in the video must be detected, tracked, and identified. Moving objects include vehicles (cars, trucks, tanks, and buses) and people. Given the static objects and moving objects in a set of frames, events are defined by the actions of the moving objects and their interactions with the static objects. For example, two cars might pull off a road and stop together in a field.

People might get out of the cars and approach each other for a meeting. A caravan of trucks might travel in one direction on a dirt road for a period of time and then make a U-turn and proceed in the opposite direction.

A vehicle might pull up to a building and disappear into an underground garage or tunnel, then reappear some time later. In all of these cases, both the moving objects and the static objects must be recognized and their interactions noted.

We are developing a system for object and event recognition for this purpose. In this paper we describe the structure of our system and give brief overviews of the underlying algorithms.

2. SYSTEM OVERVIEW In order to recognize events in a video, the moving objects across a sequence of frames and the static objects in each of these frames must be detected and recognized. Then using these moving and static objects as primitives, simple events can be defined in terms of the relationships among the moving objects and between the moving objects and the static ones. Finally, more complex events can be defined as sequences of simple events. For example, simple events such as a vehicle appearing on a road, moving forward for a short distance, and disappearing behind a tree would lead to a complex event as shown in the first row of Figure 1. Three other complex events (a convoy of cars making a U-turn, a car overtaking another car, and a truck making a turn and passing by a line of cars in the opposite direction) are also shown in Figure 1.

Further author information: Send correspondence to Linda Shapiro, E-mail: shapiro@cs.washington.edu, Telephone:

1 206 543 2196; Yi Li is now with Vidient Systems, Inc., Sunnyvale, CA, U.S.A.

Figure 1. Examples of events.

(Row 1) Vehicle disappears behind a tree, (Row 2) Cars making a Uturn, (Row 3) Car overtaking another car, (Row 4) Truck makes a turn and passes by cars moving in the opposite direction.

Figure 2 shows the architecture of our system. The system receives a video sequence as its input. The static feature extraction module will extract region features such as color regions, texture regions and structure regions from the static objects in the video, while the dynamic feature extraction module will extract features from the objects that are moving in the video. The object recognition module will use the features extracted by both the static and dynamic feature extraction modules to label the objects in the frames, while the object tracking module will track the objects that are moving from frame to frame. Relationships between objects over time such as relative position between objects within a frame will be computed by the object relationship extraction module. The results of all three modules: object recognition, object tracking and object relationship extraction will be used by the event recognition module to recognize the events happening in the video and output the results.

–  –  –




region features

Our methodology for object recognition has three main parts:

1. Select a set of features that have multiple attributes for recognition and design a unified representation for them.

2. Develop methods for encoding complex features into feature vectors that can be used by general-purpose classifiers.

3. Design a learning procedure for automating the development of classifiers for new objects.

The unified representation we have designed is called the abstract region representation. The idea is that all features will be regions, each with its own set of attributes, but with a common representation. The regions we are using in our work are color regions, texture regions and structure regions defined as follows.

Color regions are produced by a two-step procedure. The first step is color clustering using a variant of the K-means algorithm on the original color images represented in the CIELab color space. 1 The second step is an iterative merging procedure that merges multiple tiny regions into larger ones. Figure 3 illustrates this process on a football image in which the K-means algorithm produced hundreds of tiny regions for the multi-colored crowd, and the merging process merged them into a single region. Our texture regions come from a color-guided texture segmentation process. Color segmentation is first performed using the K-means algorithm. Next, pairs of regions are merged if after a dilation they overlap by more than 50%. Each of the merged region is segmented using the same clustering algorithm on the Gabor texture coefficients. Figure 4 illustrates the texture segmentation process.

The features we use for recognizing man-made structures are called structure features and are obtained using

the concept of a consistent line cluster.2 These features are obtained as follows:

Original Color Merged

–  –  –

Figure 4. The texture segmentation is color-guided: it is performed on regions of the initial color segmentation.

1. Apply the Canny edge detector3 and ORT line detector4 to extract line segments from the image.

2. For each line segment, compute its orientation and its color pairs (pairs of colors for which the first is on one side and the second on the other side of the line segment).

3. Cluster the line segments according to their color pairs, to obtain a set of color-consistent line clusters.

4. Within the color-consistent clusters, cluster the line segments according to their orientations to obtain a set of color-consistent orientation-consistent line clusters.

5. Within the orientation-consistent clusters, cluster the line segments according to their positions in the image to obtain a final set of consistent line clusters.

Figure 5 illustrates the abstract regions for several representative images. The first image is of a large campus building at the University of Washington. Regions such as the sky, the concrete, and the large brick section of the building show up as large homogeneous regions in both color segmentation and texture segmentation. The windowed part of the building breaks up into many regions for both the color and the texture segmentations, but it becomes a single region in the structure image. The structure-finder also captures a small amount of structure at the left side of the image. The second image of a park is segmented into several large regions in both color and texture. The green trees merge into the green grass on the right side in the color image, but the texture image separates them. No structure was found. In the last image of a sailboat, both the color and texture segmentations provide some useful regions that will help to identify the sky, water, trees and sailboat.

The sailboat is captured in the structure region. It is clear that no one feature type alone is sufficient to identify the objects.

In our framework for object and concept class recognition, each image is represented by sets of abstract regions and each set is related to a particular feature type. To learn the properties of a specific object, we must know which abstract regions correspond to it. Once we have the abstract regions from an object, we extract the common characteristics of those regions as the model of that object. Then given a new region, we can compare it to the object models in our database to decide to which it belongs. We designed the algorithm to learn Original Color Texture Structure Figure 5. The abstract regions constructed from a set of representative images using color clustering, color-guided texture clustering and consistent-line segment clustering.

correspondences between regions and objects in the training images to require only the list of objects in each training image. With such a solution, not only is the burden of constructing the training data largely relieved, the principle of keeping the system open to new image features is upheld.

3.2. EM-Variant Approach to Object Classification Our object recognition methodology uses whole images of abstract regions, rather than single regions for classification. A key part of our approach is that we do not need to know where in each image the objects lie. We only utilize the fact that objects exist in an image, not where they are located. We have designed an EM-like procedure that learns multivariate Gaussian models for object classes based on the attributes of abstract regions from multiple segmentations of color photographic images.5 The objective of this algorithm is to produce a probability distribution for each of the object classes being learned. It uses the label information from training images to supervise EM-like iterations. In the initialization phase of the EM-variant approach, each object is modeled as a Gaussian component, and the weight of each component is set to the frequency of the corresponding object class in the training set. Each object model is initialized using the feature vectors of all the regions in all the training images that contain the particular object, even though there may be regions in those images that do not contribute to that object. From these initial estimates, which are full of errors, the procedure iteratively re-estimates the parameters to be learned. The iteration procedure is also supervised by the label information, so that a feature vector only contributes to those Gaussian components representing objects present in its training image. The resultant components represent the learned object classes and one background class that accumulates the information from feature vectors of other objects or noise. With the Gaussian components, the probability that an object class appears in a test image can be computed. The EM-variant algorithm is trained on a set of training images, each of which is labeled by the set of objects it contains. For each test image, it computes the probability of each of the object classes appearing in that image. Figure 6 shows some sample classifications using abstract regions with both color and texture properties.


–  –  –

Figure 6. Classification results for grass and tree using the EM-variant approach with regions having both color and texture attributes.

3.3. Generative/Discriminative Approach to Object Classification

Our two-phase generative/discriminative learning approach addresses three goals 6:

1. We want to handle object classes with more variance in appearance.

2. We want to be able to handle multiple features in a completely general way.

3. We wish to investigate the use of a discriminative classifier to add more power.

Phase 1, the generative phase, is a clustering step that can be implemented with the classical EM algorithm (unsupervised) or the EM variant (partially supervised). The clusters are represented by a multivariate Gaussian mixture model and each Gaussian component represents a cluster of feature vectors that are likely to be found in the images containing a particular object class. Phase 1 also includes an aggregation step that has the effect of normalizing the description length of images that can have an arbitrary number of regions. The aggregation step produces a fixed-length feature vector for each training image whose elements represent that image’s contribution to each Gaussian component of each feature type.

Phase 2, the discriminative phase, is a classification step that uses the feature vectors of Phase 1 to train a classifier to determine the probability that a given image contains a particular object class. It also generalizes to any number of different feature types in a seamless manner, making it both simple and powerful. We currently use neural net classifiers (multi-layer perceptions) in Phase 2.

Pages:   || 2 | 3 |

Similar works:

«Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2013 ATRX loss refines the classification of anaplastic gliomas and identifies a subgroup of IDH mutant astrocytic tumors with better prognosis Wiestler, B; Capper, D; Holland-Letz, T; Korshunov, A; von Deimling, A; Pfister, S M; Platten, M; Weller, M; Wick, W Abstract: Mutation/loss of alpha-thalassemia/mental retardation syndrome X-linked (ATRX) expression has been...»


«STATE OF MINNESOTA IN COURT OF APPEALS A14-1482 State of Minnesota, Respondent, vs. Douglas John Olson, Appellant. Filed July 13, 2015 Reversed Ross, Judge Hennepin County District Court File No. 27-CR-14-3196 Lori Swanson, Attorney General, St. Paul, Minnesota; and Susan L. Segal, Minneapolis City Attorney, Paula J. Kruchowski, Assistant City Attorney, Minneapolis, Minnesota (for respondent) John L. Lucas, Minneapolis, Minnesota (for appellant) Considered and decided by Peterson, Presiding...»

«DOCUMENT RES UME CS 010 829 ED 341 954 Shermis, S. Samuel AUTHOR Critical Thinking: Helping Students Learn TITLE Reflectively. ERIC Clearinghouse on Reading and Communication INSTITUTION Skills, Bloomington, IN. Office of Educational Research and Improvement (ED), SPONS AGENCY Washington, DC. ISBN-0-927516-28-4 REPORT NO PUB DATE 92 RI88062001 CONTRACT 97p.; Also published by EDINFO Press. NOTE ERIC Clearinghouse on Reading and Communication AVAILABLE FROM Skills, Indiana University, 2805 E....»

«Los riesgos del interés superior del niño O cómo se esconde el Caballo de Troya en la Convención por Diego FREEDMAN El interés superior del niño como el Caballo de Troya de la Convención sobre derechos del niño Se dirigió a Demódoco el muy inteligente Odiseo: 'Demódoco, muy por encima de todos los mortales te alabo; seguro que te han enseñado Musa, la hija de Zeus, o Apolo. Pues con mucha belleza cantas el destino de los aqueos -cuánto hicieron y sufrieron y cuánto soportaroncomo...»

«Flight-Test-Determined Aerodynamic Force and Moment Characteristics of the X-43A at Mach 7.0 Mark C. Davis* NASA Dryden Flight Research Center, Edwards, California, 93523 and J. Terry White† AS&M, NASA Dryden Flight Research Center, Edwards, California, 93523 The second flight of the Hyper-X program afforded a unique opportunity to determine the aerodynamic force and moment characteristics of an airframe-integrated scramjetpowered aircraft in hypersonic flight. These data were gathered via a...»

«Case 2:13-cv-00108 Document 54 Filed in TXSD on 06/26/14 Page 1 of 11 UNITED STATES DISTRICT COURT SOUTHERN DISTRICT OF TEXAS CORPUS CHRISTI DIVISION PLANTBIKES, LLC; dba RUGGED § CYCLES, § § Plaintiff, § § CIVIL ACTION NO. 2:13-CV-108 VS. § BIKE NATION, INC., et al, § § Defendants. § ORDER ON MOTION TO DISMISS FOR LACK OF JURISDICTION On May 22, 2014, after jurisdictional discovery, Plaintiff Plantbikes, LLC d/b/a Rugged Cycles (Rugged Cycles) filed its First Amended Complaint (D.E....»

«E-mail May Not Reflect The Social Network Francesca Grippa 1, Antonio Zilli 1 e-Business Management Section, ISUFI, University of Lecce, Italy francesca.grippa@ebms.unile.it, antonio.zilli @ebms.unile.it Robert Laubacher 2, Peter A. Gloor 2 MIT Center for Coordination Science rjl@mit.edu, pgloor@mit.edu Abstract. This paper aims to demonstrate that ties obtained by mining e-mails archives do not necessarily provide a complete and realistic approximation of interactions by other communication...»

«BOZEMAN, MONTANA DENVER, COLORADO HONOLULU, HAWAI`I JUNEAU, ALASKA OAKLAND, CALIFORNIA SEATTLE, WASHINGTON TALLAHASSEE, FLORIDA WASHINGTON, D.C. For more information contact: Glenn Sugameli, Senior Legislative Counsel, 202-667-4500 x221 William Myers’ Views on Access to the Courts Violate Ninth Circuit Precedent and Would Effectively Bar Many Vital Environmental and Other Public Interest Claims I. Introduction Ninth Circuit nominee William G. Myers III’s written response to Senator Dianne...»

«Risø-R-1000(EN) Cost Optimization of Wind Turbines for Large-scale Off-shore Wind Farms Peter Fuglsang, Kenneth Thomsen Risø National Laboratory, Roskilde February 1998 Abstract This report contains a preliminary investigation of site specific design of offshore wind turbines for a large off-shore wind farm project at Rødsand that is currently being proposed by ELKRAFT/SEAS. The results were found using a design tool for wind turbines that involve numerical optimization and aeroelastic...»

«Press releases Introduction 2 Bamberg Old Town: UNESCO-World Heritage since 1993 3-5 A perfect work of art of European rank presents itself 5-6 The Trail of Nativity Scenes in Bamberg 7-9 Bamberg at Christmas – simply wonderful! 9 Join the Brewery Trail in the World Heritage City of Bamberg 10 Annual Bamberg events 11 12 _ BAMBERG Tourism & Congress Service Geyerswoerthstrasse 5 D-96047 Bamberg phone +49 (0) 951 29 76 200 telefax: +49 (0) 951 29 76 222 www.bamberg.info PR contact: Anna-Maria...»

«NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS POSITIONING THE RESERVE HEADQUARTERS SUPPORT (RHS) SYSTEM FOR MULTI-LAYERED ENTERPRISE USE by Douglas J. Koch September 2009 Thesis Advisor: Glenn Cook Second Reader: Karl Pfeiffer Approved for public release; distribution is unlimited REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction,...»

<<  HOME   |    CONTACTS
2016 www.theses.xlibx.info - Theses, dissertations, documentation

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.