1 / 37

Self-Similar Traffic

- COMP5416
- Advanced Network Technologies

Why Self-Similarity?

- Trace data not consistent with queueing models

On the Self-Similar Nature of Ethernet Traffic

Will E. Leland, Walter Willinger and Daniel V.

Wilson BellcoreMurad S. Taqqu Boston

University

The Classic Paper

Overview

- What is Self Similarity?
- Ethernet Traffic is Self-Similar
- Source of Self Similarity
- Implications of Self Similarity

Intuition of Self-Similarity

- Something feels the same regardless of scale

(No Transcript)

(No Transcript)

Stochastic Objects

- In case of stochastic objects like time-series,

self-similarity is used in the distributional

sense - their mean, variance, correlation etc.

Pictorial View of Self-Similarity

Why is Self-Similarity Important?

- Recently, some network packet traffic has been

identified as being self-similar - Current network traffic modeling using Poisson

distributing (etc.) does not take into account

the self-similar nature of traffic - This leads to inaccurate modeling of network

traffic - Is self-similarity relevant everytime?
- remains a hot research area!

Problems with Current Models

- A Poisson process
- When observed on a fine time scale will appear

bursty - When aggregated on a coarse time scale will

flatten (smooth) to white noise - A Self-Similar (fractal) process
- When aggregated over wide range of time scales

will maintain its bursty characteristic

Pictorial View of Current Modeling

Consequences of Self-Similarity

- Traffic has similar statistical properties at a

range of timescales ms, secs, mins, hrs, days - Merging of traffic (as in a statistical

multiplexer) does NOT result in smoothing of

traffic

Aggregation

Bursty Data Streams

Bursty Aggregate Streams

Side-by-side View

Definitions and Properties

- Long-Range Dependence
- Autocorrelation Rx(t1,t2) EX(t1)X(t2)

decays slowly - Hurst Parameter
- Developed by Harold Hurst (1965)
- Studies of Nile River flooding over 800 year

period - H is a measure of burstiness
- also considered a measure of self-similarity
- 0.5 lt H lt 1.0

Continuous-Time Definition

- Hurst Parameter

The process x(t) is self-similar with parameter H

if it has the same statistical properties as the

process a-H x(at) for any real agt0.

Discrete-Time Definition

- X (Xt t 0, 1, 2, .) is random process

defined at discrete points in time - Let X(m)Xk(m) denote the new process obtained

by averaging the original series X in

non-overlapping sub-blocks of size m.

E.g. X(1) 4,12,34,2,-6,18,21,35Then

X(2)8,18,6,28X(4)13,17

Auto-correlation Definition

- X is exactly self-similar if
- The aggregated processes have the same

autocorrelation structure as X. i.e. - r (m) (k) r(k), k?0 for all m 1,2,
- X is asymptotically self-similar if the above

holds when r (m) (k) ? r(k), m? ?

Self-Similarity in Traffic Measurement(?)

Network Traffic

Auto-correlation

- Most striking feature of self-similarity

Correlation structures of the aggregated process

do not degenerate as m ? ? - This is in contrast to traditional models
- Correlation structures of their aggregated

processes degenerate, i.e. r (m) (k) ? 0 as m? ?

, for k 1,2,3,...

(No Transcript)

Long Range Dependence

- Processes with Long Range Dependence are

characterized by an autocorrelation function that

decays hyperbolically as k increases - Important Property This is also called

non-summability of correlation

Recap

- Self-similarity manifests itself in several

equivalent fashions - Non-degenerate autocorrelations
- Slowly decaying variance
- Long range dependence
- Hurst effect

The Famous Data

- Leland and Wilson collected hundreds of millions

of Ethernet packets without loss and with

recorded time-stamps accurate to within 100µs. - Data collected from several Ethernet LANs at the

Bellcore Morristown Research and Engineering

Center at different times over the course of

approximately 4 years.

Plots Showing Self-Similarity (?)

High Traffic

5.0-30.7

Mid Traffic

3.4-18.4

Low Traffic

1.3-10.4

Higher Traffic, Higher H!

Crucial Findings

- Ethernet LAN traffic is statistically

self-similar - H ? the degree of self-similarity ?
- H ? a function of utilization ?
- H ? a measure of burstiness ?
- Models like Poisson are not able to capture

self-similarity - As number of Ethernet users increases, the

resulting aggregate traffic becomes burstier

instead of smoother!!

Discussions

- How to explain self-similarity ?
- Heavy tailed file sizes
- How this would impact existing performance?
- Limited effectiveness of buffering
- Effectiveness of FEC
- error control for data transmission, whereby the

sender adds redundant data to its messages, which

allows the receiver to detect and correct errors

without the need to ask the sender

Explaining Self-Similarity

- The superposition of many ON/OFF sources whose

ON-periods and OFF-periods exhibit the Noah

Effect produces aggregate network traffic that

features the Joseph Effect

Noah Effect High variability or infinite

variance

Joseph Effect Self-similar or long-range

dependent traffic

Also known as packet train models

The Noah Effect

- Noah Effect is the essential point of departure

from traditional to self-similar traffic modeling - Results in highly variable ON-OFF periods

Train length and inter-train distances can be

very large with non-negligible probabilities - Infinite Variance Syndrome Many naturally

occurring phenomenon can be well described with

infinite variance distributions - Heavy-tail distributions, ? parameter

Traditional Models

- Traditional traffic models finite variance

ON/OFF source models - Superposition of such sourcesbehaves like white

noise, with only short range correlations

The heavy-tail distribution

- A distribution is said to be heavy-tailed if
- Property (1) is the infinite variance syndrome or

the Noah Effect. - ? ? 2 implies E(U2) ?
- ? gt 1 ensures that E(U) lt ?
- The asymptotic shape of the distribution is

hyperbolic - The simplest heavy-tail distribution is the

Pareto distribution - For example, we consider the sizes of files

transferred from a web-server - Heavy-tail ? A large number of small files

transferred but, crucially, the number of very

large files transferred remains significant.

http//statistik.wu-wien.ac.at/cgi-bin/anuran.pl

Important Findings

- Most surprising result Noah Effect is extremely

widespread , regardless of source machine

(fileserver or client machine) - Explanations
- Hyperbolic tail behavior for file sizes residing

in file sizes - Pareto-like tail behavior for UNIX processes run

time - Human-computer interactions occur over a wide

range of timescales - Although network traffic is intrinsically

complex, parsimonious modeling is still possible. - Estimating a single parameter ? (intensity of the

Noah Effect) is enough

An example File size Distribution on a Win2000

machine

Impact of Self Similarity

Conclusion

- The presence of the Noah Effect in measured

Ethernet LAN traffic is confirmed - The superposition of many ON/OFF models with Noah

Effect results in aggregate packet streams that

are consistent with measured network traffic, and

exhibits the self-similar or fractal properties - Self-similarity in packetised data networks

caused by the distribution of file sizes, human

interactions and/or Ethernet dynamics

Spawned research around the network community

Self-similarity and long range dependence in

networks

- Vern Paxson and Sally Floyd, Wide-Area Traffic

The Failure of Poisson Modeling - Mark E. Crovella and Azer Bestavros,

Self-Similarity in World Wide Web Traffic

Evidence and Possible Causes - It shows that self-similarity in Web traffic can

be explained based on the underlying distribution

of transferred document sizes, the effects of

caching and user preference in file transfer, the

effect of user think time'', and the

superimposition of many such transfers in a local

area network. - A. Feldmann, A. C. Gilbert, W. Willinger, and T.

G. Kurtz, The Changing Nature of Network Traffic

Scaling Phenomena , - Mark Garrett and Walter Willinger, Analysis,

Modeling and Generation of Self-Similar VBR Video

Traffic - The paper shows that the marginal bandwidth

distribution can be described as being

heavy-tailed and that the video sequence itself

is long-range dependent and can be modeled using

a self-similar process - The paper presents a new source model for VBR

video traffic and describes how it may be used to

generate VBR traffic synthetically.

Heavy tailed distributions in network traffic

- Gordon Irlam, Unix File Size Survey,
- Will Leland and Teun Ott, Load-balancing

Heuristics and Process Behavior, - Mor Harchol-Balter and Allen Downey, Exploiting

Process Lifetime Distributions for Dynamic Load

Balancing - Carlos Cunha, Azer Bestavros, Mark Crovella,

Characteristics of WWW Client-based Traces - This paper presents some of the first Web client

measurement ever made. It characterizes traces

taken using an instrumented version of Mosaic

from a university computer lab and shows that a

number of Web properties can be modeled using

heavy tailed distributions. - These properties include document size, user

requests for a document, and document popularity.