DTWAIN_GetOCRText

Top  Previous  Next

The DTWAIN_GetOCRText retrieves the text data after a selected OCR engine has processed an image file.

 

HANDLE DTWAIN_GetOCRText (

DTWAIN_OCRENGINE

Engine,

LONG

pageNumber,

LPTSTR

textBuffer,

LONG

bufferSize,

LPLONG

actualBufferSize,

LONG

Flags );

 

Parameters:

Engine

A selected OCR engine.

 

pageNumber

Page number of the text to return.

 

textBuffer

Specifies the pointer to the buffer that will be filled with the text, or NULL (0).

 

bufferSize

Size of the buffer, in number of characters.

 

actualBufferSize

On return, size of the actual text buffer that was generated after the OCR processing has completed.  NULL can also be specified to ignore this argument.

 

Flags

Flags to determine how to return the OCR text back to the caller (by handle or using a buffer).  Currently only the buffer method is supported.

 

Returns:

1 if successful, NULL otherwise.   Call DTWAIN_GetLastError on further information on the failure.

 

Character specific version

ANSI version:

DTWAIN_GetOCRTextA

Unicode version:

DTWAIN_GetOCRTextW

 

Comments:

DTWAIN_GetOCRText is used to get the text that was processed by the OCR engine after the call to DTWAIN_ExecuteOCR.

 

The Engine argument is the selected OCR engine.

 

The pageNumber is the page number of the returned text.  The pageNumber argument starts from 0 to n-1 where n is the number of total text pages returned from DTWAIN_ExecuteOCR.  

 

If DTWAIN_ExecuteOCR was called using a single page image file, pageNumber must be 0.  If DTWAIN_ExecuteOCR was called using a multipage image file, and more than one page of the image file was processed, pageNumber will always start at 0 up to the number of returned OCR-ed text pages, less 1.  For example, if the multi-page image file consists of 10 pages, and DTWAIN_ExecuteOCR was applied to pages 5 and 6, the call to DTWAIN_GetOCRText will consist of 2 pages of text, page 0 and page 1.

 

The textBuffer is the start of a character buffer that will hold the returned text on return.  If this value is NULL, no text is returned.  Usually a NULL value is used to denote that the caller only wants to know the number of characters that will be copied to textBuffer (see the actualBufferSize argument below).

 

The bufferSize is the number of characters that textBuffer points to.  This argument is ignored if textBuffer is NULL.  The maximum number of characters that will be copied from the OCR buffer to textBuffer will not exceed bufferSize (but could be less than bufferSize).

 

The actualBufferSize argument will be filled on return with the actual size of the text that the OCR engine place (or will place) in textBuffer.  If this argument is NULL, actualBufferSize is ignored.

 

The Flags argument are a set of flags that determine how to return the text back to the caller.  Currently, the only mode is to return the text in the buffer provided by textBuffer (DTWAINOCR_COPYDATA).   Below are the set of flags that are available.

 

Flags Value

Definition

DTWAINOCR_COPYDATA

Copies text data to the buffer provided by the caller.

 

If your application needs to set the size of the buffer equal to the actual size of the text, call DTWAIN_GetOCRText with the textBuffer set to NULL, and query the actualBufferSize argument on return.  Then call DTWAIN_GetOCRText again with the textBuffer sized appropriately using the value of actualBufferSize.

 

 

Example (Using C language):

 

/* Initiate the OCR processing */

#include <tchar.h>

/* ... */

if ( DTWAIN_ExecuteOCR( SomeEngine, "imagefile.bmp", 0, 0) )

{

    LPTSTR pBuf;

    LONG bufSize;

 

    /* Get the number of character of the OCR-ed text */

    DTWAIN_GetOCRText( SomeEngine, 0, 0, 0, &bufSize, DTWAINOCR_COPYDATA ); // first get the buffer size

 

    /* now create a character buffer of that size and call DTWAIN_GetOCRText again. */

    pBuf = (LPTSTR) malloc( bufSize * sizeof(TCHAR) );

    DTWAIN_GetOCRText ( SomeEngine, 0, pBuf, bufSize, NULL, DTWAINOCR_COPYDATA );

 

   /* Now text from OCR has been retrieved in pBuf */

   /*... */

     free( pBuf );

}

 

Note the use of TCHAR as the type in the call to malloc().  This ensures that this code will work for both ANSI/MBCS and Unicode versions of DTWAIN.

 

Prerequisite Function Call(s)

DTWAIN_SysInitialize

DTWAIN_InitOCRInterface

DTWAIN_SelectOCREngine, or DTWAIN_SelectOCREngineByName, or DTWAIN_SelectDefaultOCREngine

DTWAIN_ExecuteOCR