Title: The Use of Optical Character Recognition Technology In National Statistical Offices
1The Use of Optical Character Recognition
Technology In National Statistical Offices
2What is Optical Character Recognition?
- It is a technology that recognises and captures
alphanumeric characters on a computer at high
speed. - It provides complete form processing and
documents capture solution. - It is sometimes called Optical Character Reader
(OCR) or Intelligent Character Reader (ICR).
3Why do National Statistical Offices require OCR?
- Most NSOs are moving from the traditional way of
doing things by adopting Optical Character
Recognition technology. Its use may offer the
following benefits/advantages - It allows the NSO to process information more
quickly, more accurately and more efficiently
thus allowing them to release and disseminate
data timeously to support the evidence-based
decision making process.
4Why require OCR? contd
- It reduces the data entry time and increases its
accuracy when compared to the use of manual data
entry operators. - It allows validation rules to be incorporated in
the system so as to validate and correct the
data. - Errors can be identified using different colours
that facilitate the review and correction
process. - Scanned forms are stored digitally thus
eliminating the need for physical storage of
questionnaires for these can be destroyed after
the initial scanning, recognition and repair.
5Why require OCR? contd
- The system stores data in a database thus
facilitating data analysis. - It reduces the number of data entry personnel.
6What are the disadvantages of OCR?
- The speed of gathering data in the field by
enumerators is severely reduced for the filling
in of OCR/ICR forms needs more care to write in
the specified boxes. - Has a severe limitation when it comes to human
handwriting. Variation in enumerator handwriting
can cause problems in form processing and may
thus decrease the character recognition rate. - Errors in filling of questionnaires decrease the
rate of recognition. - Printing quality can cause problems if it is too
dark or too light. This may reduce the
recognition rate of characters.
7Factors to consider when implementing OCR.
- Although OCR has advantages in speeding data
- processing, analysis and ultimately the release
of - data, adoption of this technology becomes an
- organisational consideration.
- The following considerations come to mind
- Does the organisation have the capacity to use
the technology, and if not, is it possible to
outsource skills, funding the exercise of
outsourcing and are there possibilities of
creating capacity in the immediate future. - How comparable is the quality of data obtained
through the use of OCR/ICR to that obtained
through the use of human labour particularly at
data entry.
8Factors to consider contd
- Differences in the error rate between OCR/ICR and
the traditional use of data entry personnel. - Cost implication of the technology as compared to
the use of human labour. In the South African
case, the planned use of OCR technology in the
Census 2001 was expected to reduce cost compared
with the 1996 Census by between 30 and 40
percent. - The above factors are basically querying,
whether Optical Character Recognition is an
appropriate technology in National Statistical
Offices.
9Factors to consider contd
- The need to clearly define the roles or
responsibilities of the District Office,
Provincial Office and Head Office. This entails
deciding where manual editing of questionnaires,
data entry and final analysis and production of
statistical data or information will be done. - Pilot testing questionnaires to evaluate
enumerator training, data entry by enumerators
and using OCR technology e.g. character
recognition. This activity requires funding and
the question to ask is Do National Statistical
Offices have the funds to carry out these
activities?
10How to obtain good results from scanning?
- There are three requirements
- quality of the form.
- appropriate preparation of field staff and their
supplies. - appropriate design of the quality control
activities.
11Quality of the form
- The quality of the form may be increased in one
of - the following ways
- Select adequate paper quality.
- Use paper heavier than 80 grams per square meters
to avoid paper crashes or over read the other
side of a single page. - Source a reliable print press.
- Select an appropriate drop out colour, usually
red to allow the system to pick up only the
meaningful information from an OCR form. - It advisable to use marks or ticks as much as
possible. - Avoid using open ended questions.
12Preparation of field staff and their supplies
- Emphasis should be placed on the following
- aspects
- Careful handling and filing of materials or
documents. This means that enumerators should
have appropriate supplies such as a documents
bag, several black pencils, correctors or erasers
among other supplies. - Training of field staff should pay attention on
aspects of how to write numeric or alphabetic
characters so as to achieve maximum character
recognition. Spend time emphasising scanning hand
writing.
13Field staff and their supplies contd
- Adequate instructions stating that each box
should contain only one character, characters
should not extend outside the designated boxes
and unnecessary lines of characters such as
points, strokes are prohibited, strokes should
not be ended with extensions, all lines should be
connected without breaks and all lines and dots
should be pressed with the same pressure. - Ensure that all answers in the questionnaire are
numeric codes.
14Field staff and their supplies contd
- Instructions should be given on reasons of error
reading by OCR, e.g. bad condition of the form
because it is dirty, folded or crumbled or forms
are incompletely filled.
15Quality control process
- A number of quality control processes have
- to be put in place to ensure the following
- that all questionnaires have been scanned
completely, with no omissions and duplications. - Quality assurance tests are done on the quality
of recognition to ensure that acceptable
recognition rates are maintained.
16 Sources
- http//www.afdb.org/pls/portal/docs/PAGE/ADB_ADMIN
_PG/DOCUMENTS/STATISTICS/JOURNALVOL1FULL.PDF - http//intranet.unescap.org/stat/pop-it/pop-guide/
capture_ch06.pdf - National Sample Census of Agriculture 2002/2003,
Volume 1 Technical and Operation Report,
September 2006.
17(No Transcript)