Java知识分享网 - 轻松学习从此开始!    

Java知识分享网

Java1234官方群25:java1234官方群17
Java1234官方群25:838462530
        
SpringBoot+SpringSecurity+Vue+ElementPlus权限系统实战课程 震撼发布        

最新Java全栈就业实战课程(免费)

springcloud分布式电商秒杀实战课程

IDEA永久激活

66套java实战课程无套路领取

锋哥开始收Java学员啦!

Python学习路线图

锋哥开始收Java学员啦!
当前位置: 主页 > Java文档 > Java基础相关 >

Anomaly Detection - A Survey PDF 下载


分享到:
时间:2020-07-16 10:26来源:http://www.java1234.com 作者:转载  侵权举报
Anomaly Detection - A Survey PDF 下载
失效链接处理
Anomaly Detection - A Survey PDF 下载

本站整理下载:
 
相关截图:
 
主要内容:

tection, we provide a detailed discussion of the application domains where anomaly
detection techniques have been used. For each domain we discuss the notion of an
anomaly, the different aspects of the anomaly detection problem, and the challenges
faced by the anomaly detection techniques. We also provide a list of techniques
that have been applied in each application domain.
The existing surveys discuss anomaly detection techniques that detect the sim￾plest form of anomalies. We distinguish the simple anomalies from complex anoma￾lies. The discussion of applications of anomaly detection reveals that for most ap￾plication domains, the interesting anomalies are complex in nature, while most of
the algorithmic research has focussed on simple anomalies.
1.5 Organization
This survey is organized into three parts and its structure closely follows Figure
2. In Section 2 we identify the various aspects that determine the formulation
of the problem and highlight the richness and complexity associated with anomaly
detection. We distinguish simple anomalies from complex anomalies and define two
types of complex anomalies, viz., contextual and collective anomalies. In Section
3 we briefly describe the different application domains where anomaly detection
has been applied. In subsequent sections we provide a categorization of anomaly
detection techniques based on the research area which they belong to. Majority
of the techniques can be categorized into classification based (Section 4), nearest
neighbor based (Section 5), clustering based (Section 6), and statistical techniques
(Section 7). Some techniques belong to research areas such as information theory
(Section 8), and spectral theory (Section 9). For each category of techniques we also
discuss their computational complexity for training and testing phases. In Section
10 we discuss various contextual anomaly detection techniques. We discuss various
collective anomaly detection techniques in Section 11. We present some discussion
on the limitations and relative performance of various existing techniques in Section
12. Section 13 contains concluding remarks.
2. DIFFERENT ASPECTS OF AN ANOMALY DETECTION PROBLEM
This section identifies and discusses the different aspects of anomaly detection. As
mentioned earlier, a specific formulation of the problem is determined by several
different factors such as the nature of the input data, the availability (or unavailabil￾ity) of labels as well as the constraints and requirements induced by the application
domain. This section brings forth the richness in the problem domain and justifies
the need for the broad spectrum of anomaly detection techniques.
2.1 Nature of Input Data
A key aspect of any anomaly detection technique is the nature of the input data.
Input is generally a collection of data instances (also referred as object, record, point,
vector, pattern, event, case, sample, observation, entity) [Tan et al. 2005, Chapter
2] . Each data instance can be described using a set of attributes (also referred
to as variable, characteristic, feature, field, dimension). The attributes can be of
different types such as binary, categorical or continuous. Each data instance might
consist of only one attribute (univariate) or multiple attributes (multivariate). In
To Appear in ACM Computing Surveys, 09 2009.
Anomaly Detection : A Survey · 7
the case of multivariate data instances, all attributes might be of same type or
might be a mixture of different data types.
The nature of attributes determine the applicability of anomaly detection tech￾niques. For example, for statistical techniques different statistical models have to
be used for continuous and categorical data. Similarly, for nearest neighbor based
techniques, the nature of attributes would determine the distance measure to be
used. Often, instead of the actual data, the pairwise distance between instances
might be provided in the form of a distance (or similarity) matrix. In such cases,
techniques that require original data instances are not applicable, e.g., many sta￾tistical and classification based techniques.
Input data can also be categorized based on the relationship present among data
instances [Tan et al. 2005]. Most of the existing anomaly detection techniques deal
with record data (or point data), in which no relationship is assumed among the
data instances.
In general, data instances can be related to each other. Some examples are
sequence data, spatial data, and graph data. In sequence data, the data instances
are linearly ordered, e.g., time-series data, genome sequences, protein sequences. In
spatial data, each data instance is related to its neighboring instances, e.g., vehicular
traffic data, ecological data. When the spatial data has a temporal (sequential)
component it is referred to as spatio-temporal data, e.g., climate data. In graph
data, data instances are represented as vertices in a graph and are connected to
other vertices with edges. Later in this section we will discuss situations where
such relationship among data instances become relevant for anomaly detection.
2.2 Type of Anomaly
An important aspect of an anomaly detection technique is the nature of the desired
anomaly. Anomalies can be classified into following three categories:
2.2.1 Point Anomalies. If an individual data instance can be considered as
anomalous with respect to the rest of data, then the instance is termed as a point
anomaly. This is the simplest type of anomaly and is the focus of majority of
research on anomaly detection.
For example, in Figure 1, points o1 and o2 as well as points in region O3 lie
outside the boundary of the normal regions, and hence are point anomalies since
they are different from normal data points.
As a real life example, consider credit card fraud detection. Let the data set
correspond to an individual’s credit card transactions. For the sake of simplicity,
let us assume that the data is defined using only one feature: amount spent. A
transaction for which the amount spent is very high compared to the normal range
of expenditure for that person will be a point anomaly.
2.2.2 Contextual Anomalies. If a data instance is anomalous in a specific con￾text (but not otherwise), then it is termed as a contextual anomaly (also referred
to as conditional anomaly [Song et al. 2007]).
The notion of a context is induced by the structure in the data set and has to be
specified as a part of the problem formulation. Each data instance is defined using
following two sets of attributes:
To Appear in ACM Computing Surveys, 09 2009.
Monthly Temp
Time
Mar Jun Sept Dec Mar Jun Sept Dec Mar Jun Sept Dec
t t2 1 8 · Chandola, Banerjee and Kumar
(1) Contextual attributes. The contextual attributes are used to determine the
context (or neighborhood) for that instance. For example, in spatial data sets,
the longitude and latitude of a location are the contextual attributes. In timeseries data, time is a contextual attribute which determines the position of an
instance on the entire sequence.
(2) Behavioral attributes. The behavioral attributes define the non-contextual characteristics of an instance. For example, in a spatial data set describing the
average rainfall of the entire world, the amount of rainfall at any location is a
behavioral attribute.
The anomalous behavior is determined using the values for the behavioral attributes
within a specific context. A data instance might be a contextual anomaly in a given
context, but an identical data instance (in terms of behavioral attributes) could
be considered normal in a different context. This property is key in identifying
contextual and behavioral attributes for a contextual anomaly detection technique

 
 
------分隔线----------------------------

锋哥公众号


锋哥微信


关注公众号
【Java资料站】
回复 666
获取 
66套java
从菜鸡到大神
项目实战课程

锋哥推荐