Video Processing - PowerPoint PPT Presentation

About This Presentation

Video Processing


Most computer systems use Component Video, ... I and Q are combined into a chroma signal, ... videos and music Traditional solutions: file IDs, keywords, ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 26
Provided by: cwy60


Transcript and Presenter's Notes

Title: Video Processing

  • Video Processing
  • Wen-Hung Liao
  • 6/2/2005

  • Basics of Video
  • Video Processing
  • Video coding/compression/conversion
  • Digital video production
  • Video special effects
  • Video content analysis
  • Summary

Basics of Video
  • Component video
  • Composite video
  • Digital video

Component Video
  • Higher-end video systems make use of three
    separate video signals for the red, green, and
    blue image planes. Each color channel is sent as
    a separate video signal.
  • Most computer systems use Component Video, with
    separate signals for R, G, and B signals.
  • For any color separation scheme, Component Video
    gives the best color reproduction since there is
    no crosstalk between the three channels.
  • This is not the case for S-Video or Composite
    Video, discussed next.
  • Component video, however, requires more bandwidth
    and good synchronization of the three components.

Composite Video
  • Color (chrominance) and intensity (luminance)
    signals are mixed into a single carrier wave.
  • Chrominance is a composition of two color
    components (I and Q, or U and V).
  • In NTSC TV, e.g., I and Q are combined into a
    chroma signal, and a color subcarrier is then
    employed to put the chroma signal at the
    high-frequency end of the signal shared with the
    luminance signal.
  • The chrominance and luminance components can be
    separated at the receiver end and then the two
    color components can be further recovered.
  • When connecting to TVs or VCRs, Composite Video
    uses only one wire and video color signals are
    mixed, not sent separately. The audio and sync
    signals are additions to this one signal.
  • Since color and intensity are wrapped into the
    same signal, some interference between the
    luminance and chrominance signals is inevitable.

  • As a compromise, (Separated video, or
    Super-video, e.g., in S-VHS) uses two wires, one
    for luminance and another for a composite
    chrominance signal.
  • As a result, there is less crosstalk between the
    color information and the crucial gray-scale
  • The reason for placing luminance into its own
    part of the signal is that black-and-white
    information is most crucial for visual
  • In fact, humans are able to differentiate spatial
    resolution in gray-scale images with a much
    higher acuity than for the color part of color
  • As a result, we can send less accurate color
    information than must be sent for intensity
    information we can only see fairly large blobs
    of color, so it makes sense to send less color

Digital Video
  • The advantages of digital representation for
    video are many.
  • For example
  • Video can be stored on digital devices or in
    memory, ready to be processed (noise removal, cut
    and paste, etc.), and integrated to various
    multimedia applications
  • Direct access is possible, which makes nonlinear
    video editing achievable as a simple, rather than
    a complex, task
  • Repeated recording does not degrade image
  • Ease of encryption and better tolerance to
    channel noise.

Chroma Subsampling
  • Since humans see color with much less spatial
    resolution than they see black and white, it
    makes sense to decimate the chrominance signal.
  • Interesting (but not necessarily informative!)
    names have arisen to label the different schemes
  • To begin with, numbers are given stating how many
    pixel values, per four original pixels, are
    actually sent
  • The chroma subsampling scheme 444 indicates
    that no chroma subsampling is used each pixel's
    Y, Cb and Cr values are transmitted, 4 for each
    of Y, Cb, Cr.

Chroma Subsampling (2)
  • The scheme 422 indicates horizontal subsampling
    of the Cb, Cr signals by a factor of 2. That is,
    of four pixels horizontally labeled as 0 to 3,
    all four Ys are sent, and every two Cb's and two
    Cr's are sent, as (Cb0, Y0)(Cr0,Y1)(Cb2, Y2)(Cr2,
    Y3)(Cb4, Y4), and so on (or averaging is used).
  • The scheme 411 subsamples horizontally by a
    factor of 4.
  • The scheme 420 subsamples in both the
    horizontal and vertical dimensions by a factor of

Chroma Subsampling (3)
RGB/YUV Conversion
  • http//
  • RGB to YUV Conversion
  • Y (0.257 R) (0.504 G) (0.098 B) 16
  • Cr V (0.439 R) - (0.368 G) - (0.071 B)
  • Cb U -(0.148 R) - (0.291 G) (0.439 B)
  • YUV to RGB Conversion
  • B 1.164(Y - 16) 2.018(U - 128)
  • G 1.164(Y - 16) - 0.813(V - 128) - 0.391(U -
  • R 1.164(Y - 16) 1.596(V - 128)

Video Coding Standards
  • MPEG Standards (1, 2,4,7,21)
  • MPEG-1 VCD
  • MPEG-2 DVD
  • MPEG-4 video objects
  • MPEG-7 Multimedia database
  • MPEG-21 framework
  • H.26x series (H.261,H.263,H.264) video

Digital Video Production
  • Tools Adobe Premiere, After Effects,
  • Resources http//
  • Exampleshttp//

Video Special Effects
  • Examples
  • EffectTV http//
  • FreeFrame http//

Types of Special Effects
  • Applying to the whole image frame
  • Applying to part of the image (edges, moving
  • Applying to a collection of frames
  • Applying to detected areas
  • Overlaying virtual objects
  • at pre-determined locations
  • in response to users position

Video Content Analysis
  • Event detection
  • For indexing/searching
  • To obtain high-level semantic description of the

Image Databases
  • Problem accessing and searching large databases
    of images, videos and music
  • Traditional solutions file IDs, keywords,
    associated text.
  • Problems
  • cant query based on visual or musical properties
  • depends on the particular vocabulary used
  • doesnt provide queries by example
  • time consuming
  • Solution content-based retrieval using automatic
    analysis tools (see http//

Retrieval of images by similarity
  • Components
  • Extraction of features or image signatures and
    efficient representation and storage
  • A set of similarity measures
  • A user interface for efficient and ordered
    representation of retrieved images and to
    support relevance feedback
  • Considerations
  • Many definitions of similarity are possible
  • User interface plays a crucial role
  • Visual content-based retrieval is best utilized
    when combined with traditional search

Image features for similarity definition
  • Color similarity
  • Similarity e.g., distance between color
  • Should use perceptually meaningful color spaces
    (HSV, Lab...)
  • Should be relatively independent of illumination
    (color constancy)
  • Localityfind a red object such as this one
  • Texture similarity
  • Texture feature extraction (statistical models)
  • Texture qualities directionality, roughness,

Shape Similarity
  • Must distinguish between similarity between
    actual geometrical 2-D shapes in the image and
    underlying 3-D shape
  • Shape features circularity, eccentricity,
    principal axis orientation...
  • Spatial similarity
  • Assumes images have been (automatically or
    manually) segmented into meaningful objects
    (symbolic image)
  • Considers the spatial layout of the objects in
    the scene
  • Object presence analysis
  • Is this particular object in the image?

Main components of retrieval system
  • Database population images and videos are
    processed to extract features (color, texture,
    shape, camera and object motion)
  • Database query user composes query via graphic
    user interface. Features are generated from
    graphical query and input to matching engine
  • Relevance feedback automatically adjusts
    existing query using information fed back by user
    about relevance of previously retrieved objects

Video parsing and representation
  • Interaction with video using conventional
    VCR-like manipulation is difficult - need to
    introduce structural video analysis
  • Video parsing
  • Temporal segmentation into elemental units
  • Compact representation of elemental unit

Temporal segmentation
  • Fundamental unit of video manipulation video
  • Types of transition between shots
  • Abrupt shot change
  • Fades slow change in brightness
  • Dissolve
  • Wipe pixels from second shots replace those of
    previous shot in regular patterns
  • Other factors of image change
  • Motion, including camera motion and object motion
  • Luminosity changes and noise

Representation of Video
  • Video database population has three major
  • Shot detection
  • Representative frame creation for each shot
  • Derivation of layered representation of
    coherently moving structures/objects
  • A representative frame (R-frame) is used for
  • population R-frame is treated as a still image
    for representation
  • query R-frames are basic units initially
    returned in video query
  • Choice of R-frame
  • first - middle - last frame in video shot
  • sprite built by seamless mosaicing all frames in
    a shot

Video soundtrack analysis
  • Image/sound relationships are critical to the
    perception and understanding of video content.
  • Speech, music and Foley sound, detection and
  • Locutor identification and retrieval
  • Word spotting and labeling (speech recognition)
  • A possible query could be find the next time
    this locutor is again present in this soundtrack
  • Video scene analysis
  • 500-1000 shots per hours in typical movies
  • One level above shot sequence or scene (a series
    of consecutive shots constituting a unit from the
    narrative point of view)
Write a Comment
User Comments (0)