Title: Self-Similar Traffic
1Self-Similar Traffic
- COMP5416
- Advanced Network Technologies
2Why Self-Similarity?
- Trace data not consistent with queueing models
3On the Self-Similar Nature of Ethernet Traffic
Will E. Leland, Walter Willinger and Daniel V.
Wilson BellcoreMurad S. Taqqu Boston
University
The Classic Paper
4Overview
- What is Self Similarity?
- Ethernet Traffic is Self-Similar
- Source of Self Similarity
- Implications of Self Similarity
5Intuition of Self-Similarity
- Something feels the same regardless of scale
6(No Transcript)
7(No Transcript)
8Stochastic Objects
- In case of stochastic objects like time-series,
self-similarity is used in the distributional
sense - their mean, variance, correlation etc.
9Pictorial View of Self-Similarity
10Why is Self-Similarity Important?
- Recently, some network packet traffic has been
identified as being self-similar - Current network traffic modeling using Poisson
distributing (etc.) does not take into account
the self-similar nature of traffic - This leads to inaccurate modeling of network
traffic - Is self-similarity relevant everytime?
- remains a hot research area!
11Problems with Current Models
- A Poisson process
- When observed on a fine time scale will appear
bursty - When aggregated on a coarse time scale will
flatten (smooth) to white noise - A Self-Similar (fractal) process
- When aggregated over wide range of time scales
will maintain its bursty characteristic
12Pictorial View of Current Modeling
13Consequences of Self-Similarity
- Traffic has similar statistical properties at a
range of timescales ms, secs, mins, hrs, days - Merging of traffic (as in a statistical
multiplexer) does NOT result in smoothing of
traffic
Aggregation
Bursty Data Streams
Bursty Aggregate Streams
14Side-by-side View
15Definitions and Properties
- Long-Range Dependence
- Autocorrelation Rx(t1,t2) EX(t1)X(t2)
decays slowly - Hurst Parameter
- Developed by Harold Hurst (1965)
- Studies of Nile River flooding over 800 year
period - H is a measure of burstiness
- also considered a measure of self-similarity
- 0.5 lt H lt 1.0
16Continuous-Time Definition
The process x(t) is self-similar with parameter H
if it has the same statistical properties as the
process a-H x(at) for any real agt0.
17Discrete-Time Definition
- X (Xt t 0, 1, 2, .) is random process
defined at discrete points in time - Let X(m)Xk(m) denote the new process obtained
by averaging the original series X in
non-overlapping sub-blocks of size m.
E.g. X(1) 4,12,34,2,-6,18,21,35Then
X(2)8,18,6,28X(4)13,17
18Auto-correlation Definition
- X is exactly self-similar if
- The aggregated processes have the same
autocorrelation structure as X. i.e. - r (m) (k) r(k), k?0 for all m 1,2,
- X is asymptotically self-similar if the above
holds when r (m) (k) ? r(k), m? ?
19Self-Similarity in Traffic Measurement(?)
Network Traffic
20Auto-correlation
- Most striking feature of self-similarity
Correlation structures of the aggregated process
do not degenerate as m ? ? - This is in contrast to traditional models
- Correlation structures of their aggregated
processes degenerate, i.e. r (m) (k) ? 0 as m? ?
, for k 1,2,3,...
21(No Transcript)
22Long Range Dependence
- Processes with Long Range Dependence are
characterized by an autocorrelation function that
decays hyperbolically as k increases - Important Property This is also called
non-summability of correlation
23Recap
- Self-similarity manifests itself in several
equivalent fashions - Non-degenerate autocorrelations
- Slowly decaying variance
- Long range dependence
- Hurst effect
24The Famous Data
- Leland and Wilson collected hundreds of millions
of Ethernet packets without loss and with
recorded time-stamps accurate to within 100µs. - Data collected from several Ethernet LANs at the
Bellcore Morristown Research and Engineering
Center at different times over the course of
approximately 4 years.
25Plots Showing Self-Similarity (?)
High Traffic
5.0-30.7
Mid Traffic
3.4-18.4
Low Traffic
1.3-10.4
Higher Traffic, Higher H!
26Crucial Findings
- Ethernet LAN traffic is statistically
self-similar - H ? the degree of self-similarity ?
- H ? a function of utilization ?
- H ? a measure of burstiness ?
- Models like Poisson are not able to capture
self-similarity - As number of Ethernet users increases, the
resulting aggregate traffic becomes burstier
instead of smoother!!
27Discussions
- How to explain self-similarity ?
- Heavy tailed file sizes
- How this would impact existing performance?
- Limited effectiveness of buffering
- Effectiveness of FEC
- error control for data transmission, whereby the
sender adds redundant data to its messages, which
allows the receiver to detect and correct errors
without the need to ask the sender
28Explaining Self-Similarity
- The superposition of many ON/OFF sources whose
ON-periods and OFF-periods exhibit the Noah
Effect produces aggregate network traffic that
features the Joseph Effect
Noah Effect High variability or infinite
variance
Joseph Effect Self-similar or long-range
dependent traffic
Also known as packet train models
29The Noah Effect
- Noah Effect is the essential point of departure
from traditional to self-similar traffic modeling - Results in highly variable ON-OFF periods
Train length and inter-train distances can be
very large with non-negligible probabilities - Infinite Variance Syndrome Many naturally
occurring phenomenon can be well described with
infinite variance distributions - Heavy-tail distributions, ? parameter
30Traditional Models
- Traditional traffic models finite variance
ON/OFF source models - Superposition of such sourcesbehaves like white
noise, with only short range correlations
31The heavy-tail distribution
- A distribution is said to be heavy-tailed if
- Property (1) is the infinite variance syndrome or
the Noah Effect. - ? ? 2 implies E(U2) ?
- ? gt 1 ensures that E(U) lt ?
- The asymptotic shape of the distribution is
hyperbolic - The simplest heavy-tail distribution is the
Pareto distribution - For example, we consider the sizes of files
transferred from a web-server - Heavy-tail ? A large number of small files
transferred but, crucially, the number of very
large files transferred remains significant.
32http//statistik.wu-wien.ac.at/cgi-bin/anuran.pl
33Important Findings
- Most surprising result Noah Effect is extremely
widespread , regardless of source machine
(fileserver or client machine) - Explanations
- Hyperbolic tail behavior for file sizes residing
in file sizes - Pareto-like tail behavior for UNIX processes run
time - Human-computer interactions occur over a wide
range of timescales - Although network traffic is intrinsically
complex, parsimonious modeling is still possible. - Estimating a single parameter ? (intensity of the
Noah Effect) is enough
34An example File size Distribution on a Win2000
machine
35Impact of Self Similarity
36Conclusion
- The presence of the Noah Effect in measured
Ethernet LAN traffic is confirmed - The superposition of many ON/OFF models with Noah
Effect results in aggregate packet streams that
are consistent with measured network traffic, and
exhibits the self-similar or fractal properties - Self-similarity in packetised data networks
caused by the distribution of file sizes, human
interactions and/or Ethernet dynamics
Spawned research around the network community
37Self-similarity and long range dependence in
networks
- Vern Paxson and Sally Floyd, Wide-Area Traffic
The Failure of Poisson Modeling - Mark E. Crovella and Azer Bestavros,
Self-Similarity in World Wide Web Traffic
Evidence and Possible Causes - It shows that self-similarity in Web traffic can
be explained based on the underlying distribution
of transferred document sizes, the effects of
caching and user preference in file transfer, the
effect of user think time'', and the
superimposition of many such transfers in a local
area network. - A. Feldmann, A. C. Gilbert, W. Willinger, and T.
G. Kurtz, The Changing Nature of Network Traffic
Scaling Phenomena , - Mark Garrett and Walter Willinger, Analysis,
Modeling and Generation of Self-Similar VBR Video
Traffic - The paper shows that the marginal bandwidth
distribution can be described as being
heavy-tailed and that the video sequence itself
is long-range dependent and can be modeled using
a self-similar process - The paper presents a new source model for VBR
video traffic and describes how it may be used to
generate VBR traffic synthetically.
38Heavy tailed distributions in network traffic
- Gordon Irlam, Unix File Size Survey,
- Will Leland and Teun Ott, Load-balancing
Heuristics and Process Behavior, - Mor Harchol-Balter and Allen Downey, Exploiting
Process Lifetime Distributions for Dynamic Load
Balancing - Carlos Cunha, Azer Bestavros, Mark Crovella,
Characteristics of WWW Client-based Traces - This paper presents some of the first Web client
measurement ever made. It characterizes traces
taken using an instrumented version of Mosaic
from a university computer lab and shows that a
number of Web properties can be modeled using
heavy tailed distributions. - These properties include document size, user
requests for a document, and document popularity.