OCR KTP
OCR (Optical Character Recognition) KTP (Kartu Tanda Penduduk) is a machine learning-based solution to extract character information on a KTP image.
- 1.Image Requirement Check: The system checks the image requirement such as image size, resolution size, and image quality. The image size checking ensures the KTP size is not larger than 2 MB. The resolution size checking asserts the KTP object dimension is above 300 x 400 px to assure that its text is clear and recognizable.
- 2.KTP alignment: The system detects the KTP object position and align the KTP into frontal view to enhance text recognition.
- 3.Normalization and Template Matching: This process corrects the recognized text through normalization and template matching to check the possibility of matched keywords from our database.
The submitted image input should fulfill the minimum requirements below:
Image Setting | Requirement |
---|---|
Minimum camera pixel | Above 2 MP |
Image file size | The minimum size is 100 KB and the maximum is 2 MB |
Image compression recommendation | Bicubic, with minimum JPEG quality 80% |
Image dimension | The minimum dimension is 300 x 400 px and max dimension is below 4000x3000 px |
The segmentation system groups the parts of an image that belong to the same object. In the Nodeflux OCR KTP case, the segmentation model looks for pixels that belong to the KTP to know where OCR should be executed. The OCR performs more efficiently by restricting its scope to only the KTP region and ignoring the unnecessary background.
We use a deep learning segmentation model trained using various KTP data. Implementing segmentation process improves the accuracy and robustness of the OCR system.
Image Quality Assessment (IQA) evaluates the quality of an image into several quantized attributes, such as sharpness, brightness, and specularity. It works by applying filters that quantize the quality of an image.
This quantization denotes a value that informs whether an image is of acceptable quality or not. IQA returns True if the image fulfills the parametric condition and False if it is not.
Here are the details of each IQA attribute:
- Specularity: Indicates the algorithm find spotlights or glares. The value false indicates the presence of a spotlight/glare in the image.
- Brightness: Informs lighting conditions of the image. The value is true if the image is in ideal lighting conditions and false if the image is too dark or too bright.
- Sharpness: Describes the clarity of detail in the image. The value is false if the image is blurry.
The image quality information will be informed on the API response, please check the response structure below:
{
"job": {
"id": "<job_id>",
"result": {
"status": "success",
"analytic_type": "OCR_KTP",
"result": [
{
"nik": "104671030308920003",
"nama": "AGIAR PUTRI DIANA",
"agama": "ISLAM",
"rt_rw": "005/003",
"alamat": "BATUCEPER TIMUR",
"provinsi": "BANTEN",
"kecamatan": "BATU CEPER",
"pekerjaan": "KARYAWAN SWASTA",
"tempat_lahir": "LEBAK",
"jenis_kelamin": "PEREMPUAN",
"tanggal_lahir": "07-06-1994",
"berlaku_hingga": "SEUMUR HIDUP",
"golongan_darah": "B",
"kabupaten_kota": "KOTA TANGERANG",
"kelurahan_desa": "BATUCEPER",
"kewarganegaraan": "WNI",
"status_perkawinan": "BELUM KAWIN",
"image_quality": {
"sharpness": true,
"brightness": true,
"specularity": true
}
}
]
}
},
"message": "OCR_KTP Service Success",
"ok": true
}
Template matching is a process to compare the texts recognized by OCR with a set of templates. It is used to handle typos or miss-recognized characters of OCR. The process works by calculating the similarity between a recognized text with every word template using mathematical formulas. After that, it selects the template with the highest similarity as the correction for the text. These processes improve the accuracy of the OCR system.
KTP data | Mapping |
NIK | - |
Name | - |
Place, Date of birth | - |
Gender | √ (partially mapped to templates) |
Address | - |
Religion | √ (partially mapped to templates) |
Marital status | √ (partially mapped to templates) |
Occupation | √ (partially mapped to templates) |
Nationality | √ (partially mapped to templates) |
Validity Period | - |
City / Regency | - |
Province | √ (mapped to templates) |
RT / RW | - |
Kelurahan / Village | - |
District | - |