Handwritten Character Recognition using Hidden Markov Models - PowerPoint PPT Presentation

About This Presentation

Handwritten Character Recognition using Hidden Markov Models


Handwritten Character Recognition using Hidden Markov Models ... Optical Character Recognition Rich field of research with many applicable domains Off-line vs. – PowerPoint PPT presentation

Number of Views:816
Avg rating:3.0/5.0
Slides: 10
Provided by: ekre3
Learn more at: http://www.cs.cmu.edu


Transcript and Presenter's Notes

Title: Handwritten Character Recognition using Hidden Markov Models

Handwritten Character Recognition using Hidden
Markov Models
  • Quantifying the marginal benefit of exploiting
    correlations between adjacent characters and words

Optical Character Recognition
  • Rich field of research with many applicable
  • Off-line vs. On-line (includes time-sequence
  • Handwritten vs. Typed
  • Cursive vs. Hand-printed
  • Cooperative vs. Random Writers
  • Language-specific differences of grammar and
    dictionary size
  • We focus on off-line mixed-modal English data set
    with mostly handwritten and some cursive data
  • Observation is monochrome bitmap representation
    of each letter with segmentation problem already
    solved for us (but poorly)
  • Pre-processing of dataset for noise filtering and
    normalizations of scale also assumed done

Common Approaches to OCR
  • Statistical Grammar Rules and Dictionaries
  • Feature Extraction of observations
  • Global features Moments and invariants of image
    (e.g., percentage of pixels in certain region,
    measuring curvature)
  • Local features Group windows around image pixels
  • Hidden Markov Models
  • Used mostly in cursive domain for easy training
    and to avoid segmentation issues
  • Most HMMs use very large models with words as
    states, combined with above approaches, which is
    more applicable to domains of small dictionary
    size with other restrictions

Visualizing the Dataset
  • Data Collected from 159 subjects with varying
    styles, printed and cursive
  • Missing first letter of each word to simplify
    capital letters
  • Each character represented by 16x8 array of bits
  • Character meta-data includes correct labels and
    end-of-word boundaries
  • Pre-processed into 10 cross-validation folds

Our Approach HMMs
  • Primary Goal Quantify the impact of
    correlations between adjacent letters and words
  • Secondary Goal Learn an accurate classifier for
    our data set
  • Our Approach Use a HMM and compare to other
  • 26 states of HMM each represent letter of
  • Supervised learning of model with labeled data
  • Prior probabilities and transition matrix learned
    by frequency of letters in training
  • Learning algorithm for emission probabilities
    uses Naive Bayes assumption (i.e., pixels
    conditionally independent given the letter)
  • Viterbi algorithm predicts most probable sequence
    of states given the observed character pixel maps

Algorithms and Optimizations
  • Learning algorithms implemented and tested
  • Baseline Algorithm Naïve Bayes Classifier (no
  • Algorithm 2 NB with maximum probable
    classification over a set of shifted observations
  • Motivation was to compensate for correlations
    between adjacent pixels not included in Naïve
    Bayes assumption
  • Algorithm 3 HMM with NB assumption
  • Fix for incomplete data Examples hallucinated
    prior to training
  • Algorithm 4 Optimized HMM with NB assumption
  • Ignore effects of inter-word transitions when
    learning HMM
  • Algorithm 5 Dictionary Creation and Lookup with
    NB assumption (no HMM)
  • Geared toward specific data set with small
    dictionary size, but less generalizable to more
    constrained data sets with larger dictionaries

Alternative Algorithms and Experimental Setup
  • Other variants considered but not implemented
  • Joint Bayes parameter estimation (too many
    probabilities to learn, 2128 vs. 3,328)
  • HMM with 2nd-order Markov assumption (exponential
    in number of Viterbi paths)
  • Training Naïve Bayes over a set of shifted and
    overlayed observations (preprocessing to create
    thicker boundary)
  • All experiments run with 10-fold cross-validation
  • Results given as averages with standard deviations

Experimental Results
  • Naïve Bayes classifier did pretty good on its own
    (62.7 accuracy - 15x better than random
  • Classification on shifted data did worse since we
    lost data on edges!
  • Small dictionary size of dataset affected
  • Optimized HMM w/ NB achieves 71 accuracy
  • Optimizations only marginally significant because
    of dataset
  • More simple and flexible approach for achieving
    impressive results on other datasets
  • Dictionary approach is almost perfect with 99.3
  • Demonstrates additional benefit of exploiting
    domain constraints, grammatical or syntactic
  • Not always feasible dictionary may be unknown,
    too large, or the data may not be predictable
Write a Comment
User Comments (0)
About PowerShow.com