Optical Character Recognition - OCR

From IThelp @ UiB
Jump to: navigation, search
Information.gif Would you like to give us a Feedback on this page?



Optical Character Recognition (OCR) is technology which makes it possible to save text in scanned documents as text. This can be done at UiB by attaching a scanned document to an email to ocr@uib.no.

Special software can recognise letters, numbers and other characters in documents. This often makes the results of the scanned documents much more useful than saved as an image, and enables search, copy and paste, and further processing. UiB has an OCR service available for all students and employees which save the result as pdf. This document format can be used on Mac and Linux as well as Windows. However, pdf is not optimal for editing. Text processing will therefore preferably be done in another format and word processor (i.e. copy/pasting to Word/OpenOffice). You may also copy and paste directly into an email. Remember there are strict rules regarding useage of text and references at UiB.

How do I use OCR?

UiB students and employees may simply send an email to ocr@uib.no and attach a scanned document. A reply will automatically be sent to you with the OCR-processed pdf document attached. If the source text is on paper, you may use a PullPrint machine. If the document is already scanned, you may just attach it to an email.

Note: Italics, special characters, reduced contrast and small font may give a poorer quality. It may be a good idea to check the result, and perhaps also go through the text. This goes for both PullPrint and prescanned documents.


OCR from PullPrint

In short, what you do is to scan a document and send it as an email attachment to ocr@uib.no.

  1. Place the document in the top left corner of the scanner glass. Make sure it is parallel to the edges since the OCR software looks for horizontal text lines. Close the lid carefully.
  2. Log in to PullPrint using your student or employee card.
  1. Press E-mail on the display.
  2. Press the button for recipient and write ocr@uib.no (you do not need to add your own email address).
  3. Write a suitable subject for the message.
  4. Press Preferences and choose:
    • Black&white or color
    • Text, images or both
    • Resolution: Higher number gives more details and normally more precise character recognition. However, if you are to scan a lot of pages, you may save time and space if lower than maximum is sufficient.
  5. Press Send, and the scanning should start.
  6. Soon your inbox should have a new message titled

[OCR] + the subject you wrote on the scanner.

If you are not satisfied with the result, and have access to a scanner supporting higher resolution, you may achieve a better result using this and sending an attachment from your email.

OCR from already scanned document

  1. Start a new email message.
  2. Attach the prescanned document. (Preferable .pdf)
  3. Write a subject.
  4. Send the email to ocr@uib.no (you do not need to add your own email address).
  5. The OCR-precessed document should be in your inbox not long after.

Graphics formats like jpg or png may also be sent to ocr@uib.no, but pdf seems to give better results.