The Use of Optical Character Recognition Technology In National Statistical Offices - PowerPoint PPT Presentation

1 / 17
About This Presentation

The Use of Optical Character Recognition Technology In National Statistical Offices


The Use of Optical Character Recognition Technology In National Statistical Offices What is Optical Character Recognition? It is a technology that recognises and ... – PowerPoint PPT presentation

Number of Views:892
Avg rating:3.0/5.0
Slides: 18
Provided by: snx0


Transcript and Presenter's Notes

Title: The Use of Optical Character Recognition Technology In National Statistical Offices

The Use of Optical Character Recognition
Technology In National Statistical Offices
What is Optical Character Recognition?
  • It is a technology that recognises and captures
    alphanumeric characters on a computer at high
  • It provides complete form processing and
    documents capture solution.
  • It is sometimes called Optical Character Reader
    (OCR) or Intelligent Character Reader (ICR).

Why do National Statistical Offices require OCR?
  • Most NSOs are moving from the traditional way of
    doing things by adopting Optical Character
    Recognition technology. Its use may offer the
    following benefits/advantages
  • It allows the NSO to process information more
    quickly, more accurately and more efficiently
    thus allowing them to release and disseminate
    data timeously to support the evidence-based
    decision making process.

Why require OCR? contd
  • It reduces the data entry time and increases its
    accuracy when compared to the use of manual data
    entry operators.
  • It allows validation rules to be incorporated in
    the system so as to validate and correct the
  • Errors can be identified using different colours
    that facilitate the review and correction
  • Scanned forms are stored digitally thus
    eliminating the need for physical storage of
    questionnaires for these can be destroyed after
    the initial scanning, recognition and repair.

Why require OCR? contd
  • The system stores data in a database thus
    facilitating data analysis.
  • It reduces the number of data entry personnel.

What are the disadvantages of OCR?
  • The speed of gathering data in the field by
    enumerators is severely reduced for the filling
    in of OCR/ICR forms needs more care to write in
    the specified boxes.
  • Has a severe limitation when it comes to human
    handwriting. Variation in enumerator handwriting
    can cause problems in form processing and may
    thus decrease the character recognition rate.
  • Errors in filling of questionnaires decrease the
    rate of recognition.
  • Printing quality can cause problems if it is too
    dark or too light. This may reduce the
    recognition rate of characters.

Factors to consider when implementing OCR.
  • Although OCR has advantages in speeding data
  • processing, analysis and ultimately the release
  • data, adoption of this technology becomes an
  • organisational consideration.
  • The following considerations come to mind
  • Does the organisation have the capacity to use
    the technology, and if not, is it possible to
    outsource skills, funding the exercise of
    outsourcing and are there possibilities of
    creating capacity in the immediate future.
  • How comparable is the quality of data obtained
    through the use of OCR/ICR to that obtained
    through the use of human labour particularly at
    data entry.

Factors to consider contd
  • Differences in the error rate between OCR/ICR and
    the traditional use of data entry personnel.
  • Cost implication of the technology as compared to
    the use of human labour. In the South African
    case, the planned use of OCR technology in the
    Census 2001 was expected to reduce cost compared
    with the 1996 Census by between 30 and 40
  • The above factors are basically querying,
    whether Optical Character Recognition is an
    appropriate technology in National Statistical

Factors to consider contd
  • The need to clearly define the roles or
    responsibilities of the District Office,
    Provincial Office and Head Office. This entails
    deciding where manual editing of questionnaires,
    data entry and final analysis and production of
    statistical data or information will be done.
  • Pilot testing questionnaires to evaluate
    enumerator training, data entry by enumerators
    and using OCR technology e.g. character
    recognition. This activity requires funding and
    the question to ask is Do National Statistical
    Offices have the funds to carry out these

How to obtain good results from scanning?
  • There are three requirements
  • quality of the form.
  • appropriate preparation of field staff and their
  • appropriate design of the quality control

Quality of the form
  • The quality of the form may be increased in one
  • the following ways
  • Select adequate paper quality.
  • Use paper heavier than 80 grams per square meters
    to avoid paper crashes or over read the other
    side of a single page.
  • Source a reliable print press.
  • Select an appropriate drop out colour, usually
    red to allow the system to pick up only the
    meaningful information from an OCR form.
  • It advisable to use marks or ticks as much as
  • Avoid using open ended questions.

Preparation of field staff and their supplies
  • Emphasis should be placed on the following
  • aspects
  • Careful handling and filing of materials or
    documents. This means that enumerators should
    have appropriate supplies such as a documents
    bag, several black pencils, correctors or erasers
    among other supplies.
  • Training of field staff should pay attention on
    aspects of how to write numeric or alphabetic
    characters so as to achieve maximum character
    recognition. Spend time emphasising scanning hand

Field staff and their supplies contd
  • Adequate instructions stating that each box
    should contain only one character, characters
    should not extend outside the designated boxes
    and unnecessary lines of characters such as
    points, strokes are prohibited, strokes should
    not be ended with extensions, all lines should be
    connected without breaks and all lines and dots
    should be pressed with the same pressure.
  • Ensure that all answers in the questionnaire are
    numeric codes.

Field staff and their supplies contd
  • Instructions should be given on reasons of error
    reading by OCR, e.g. bad condition of the form
    because it is dirty, folded or crumbled or forms
    are incompletely filled.

Quality control process
  • A number of quality control processes have
  • to be put in place to ensure the following
  • that all questionnaires have been scanned
    completely, with no omissions and duplications.
  • Quality assurance tests are done on the quality
    of recognition to ensure that acceptable
    recognition rates are maintained.

  • http//
  • http//
  • National Sample Census of Agriculture 2002/2003,
    Volume 1 Technical and Operation Report,
    September 2006.

(No Transcript)
Write a Comment
User Comments (0)