A Beginner’s Guide To Image Preprocessing Techniques Beginners

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 115

DownloadA Beginner’s Guide To Image Preprocessing Techniques Beginners-guide-image-preprocessing-techniques
Open PDF In BrowserView PDF
A Beginner’s Guide to
Image Preprocessing
Techniques

Intelligent Signal Processing and Data Analysis
SERIES EDITOR
Nilanjan Dey
Department of Information Technology, Techno India College of Technology,
Kolkata, India

Proposals for the series should be sent directly to one of the series editors
above, or submitted to:
Chapman & Hall/CRC
Taylor and Francis Group
3 Park Square, Milton Park
Abingdon, OX14 4RN, UK
Bio-Inspired Algorithms in PID Controller Optimization
Jagatheesan Kaliannan, Anand Baskaran, Nilanjan Dey and Amira S. Ashour
A Beginner’s Guide to Image Preprocessing Techniques
Jyotismita Chaki and Nilanjan Dey
https://www.crcpress.com/Intelligent-Signal-Processing-and-DataAnalysis/book-series/INSPDA

A Beginner’s Guide to
Image Preprocessing
Techniques

Jyotismita Chaki
Nilanjan Dey

MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks
does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion
of MATLAB® software or related products does not constitute endorsement or sponsorship by The
MathWorks of a particular pedagogical approach or particular use of the MATLAB® software.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2019 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed on acid-free paper
International Standard Book Number-13: 978-1-138-33931-6 (Hardback)
This book contains information obtained from authentic and highly regarded sources. ­Reasonable
efforts have been made to publish reliable data and information, but the author and publisher ­cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors
and publishers have attempted to trace the copyright holders of all material reproduced in this
publication and apologize to copyright holders if permission to publish in this form has not been
obtained. If any copyright material has not been acknowledged please write and let us know so we
may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC),
222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that
provides licenses and registration for a variety of users. For organizations that have been granted a
photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
Names: Chaki, Jyotismita, author. | Dey, Nilanjan, 1984- author.
Title: A beginner’s guide to image preprocessing techniques / Jyotismita
Chaki and Nilanjan Dey.
Description: Boca Raton : Taylor & Francis, a CRC title, part of the Taylor &
Francis imprint, a member of the Taylor & Francis Group, the academic
division of T&F Informa, plc, 2019. | Series: Intelligent signal
processing and data analysis | Includes bibliographical references and index.
Identifiers: LCCN 2018029684| ISBN 9781138339316 (hardback : alk. paper) |
ISBN 9780429441134 (ebook)
Subjects: LCSH: Image processing--Digital techniques.
Classification: LCC TA1637 .C7745 2019 | DDC 006.6--dc23
LC record available at https://lccn.loc.gov/2018029684
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com

Contents
Preface.......................................................................................................................ix
Authors.................................................................................................................. xiii
1. Perspective of Image Preprocessing on Image Processing..................... 1
1.1 Introduction to Image Preprocessing.................................................. 1
1.2	Complications to Resolve Using Image Preprocessing....................1
1.2.1 Image Correction...................................................................... 2
1.2.2 Image Enhancement.................................................................4
1.2.3 Image Restoration.....................................................................6
1.2.4 Image Compression.................................................................. 7
1.3 Effect of Image Preprocessing on Image Recognition...................... 9
1.4 Summary............................................................................................... 10
References........................................................................................................ 11
2. Pixel Brightness Transformation Techniques......................................... 13
2.1 Position-Dependent Brightness Correction...................................... 13
2.2 Grayscale Transformations................................................................. 14
2.2.1 Linear Transformation........................................................... 14
2.2.2 Logarithmic Transformation................................................. 17
2.2.3 Power-Law Transformation................................................... 19
2.3 Summary............................................................................................... 23
References........................................................................................................ 23
3. Geometric Transformation Techniques.................................................... 25
3.1 Pixel Coordinate Transformation or Spatial Transformation........ 25
3.1.1 Simple Mapping Techniques................................................. 26
3.1.2 Affine Mapping....................................................................... 29
3.1.3 Nonlinear Mapping................................................................ 29
3.2 Brightness Interpolation...................................................................... 31
3.2.1 Nearest Neighbor Interpolation........................................... 32
3.2.2 Bilinear Interpolation.............................................................34
3.2.3 Bicubic Interpolation.............................................................. 35
3.3 Summary............................................................................................... 36
References........................................................................................................ 37
4. Filtering Techniques..................................................................................... 39
4.1 Spatial Filter.......................................................................................... 39
4.1.1 Linear Filter (Convolution).................................................... 39
4.1.2 Nonlinear Filter....................................................................... 40
4.1.3 Smoothing Filter..................................................................... 40
4.1.4 Sharpening Filter....................................................................42
v

vi

Contents

4.2

Frequency Filter....................................................................................43
4.2.1 Low-Pass Filter........................................................................44
4.2.1.1 Ideal Low-Pass Filter (ILP).....................................44
4.2.1.2 Butterworth Low-Pass Filter (BLP)....................... 45
4.2.1.3 Gaussian Low-Pass Filter (GLP)............................ 46
4.2.2 High Pass Filter....................................................................... 47
4.2.2.1 Ideal High-Pass Filter (IHP)................................... 48
4.2.2.2 Butterworth High-Pass Filter (BHP)..................... 49
4.2.2.3 Gaussian High-Pass Filter (GHP).......................... 49
4.2.3 Band Pass Filter....................................................................... 50
4.2.3.1 Ideal Band Pass Filter (IBP).................................... 50
4.2.3.2 Butterworth Band Pass Filter (BBP)...................... 51
4.2.3.3 Gaussian Band Pass Filter (GBP)........................... 51
4.2.4 Band Reject Filter.................................................................... 52
4.2.4.1 Ideal Band Reject Filter (IBR)................................. 52
4.2.4.2 Butterworth Band Reject Filter (BBR)................... 52
4.2.4.3 Gaussian Band Reject Filter (GBR)........................ 53
4.3 Summary............................................................................................... 53
References........................................................................................................ 53

5. Segmentation Techniques............................................................................ 57
5.1 Thresholding........................................................................................ 57
5.1.1 Histogram Shape-Based Thresholding............................... 57
5.1.2 Clustering-Based Thresholding............................................ 59
5.1.3 Entropy-Based Thresholding................................................ 62
5.2 Edge-Based Segmentation..................................................................63
5.2.1 Roberts Edge Detector............................................................63
5.2.2 Sobel Edge Detector................................................................64
5.2.3 Prewitt Edge Detector............................................................64
5.2.4 Kirsch Edge Detector.............................................................64
5.2.5 Robinson Edge Detector........................................................65
5.2.6 Canny Edge Detector............................................................. 66
5.2.7 Laplacian of Gaussian (LoG) Edge Detector....................... 67
5.2.8 Marr-Hildreth Edge Detection............................................. 68
5.3 Region-Based Segmentation............................................................... 69
5.3.1 Region Growing or Region Merging................................... 69
5.3.2 Region Splitting....................................................................... 69
5.4 Summary............................................................................................... 69
References........................................................................................................ 70
6. Mathematical Morphology Techniques.................................................... 73
6.1 Binary Morphology............................................................................. 73
6.1.1 Erosion...................................................................................... 73
6.1.2 Dilation..................................................................................... 75
6.1.3 Opening.................................................................................... 76

Contents

vii

6.1.4 Closing...................................................................................... 76
6.1.5 Hit and Miss............................................................................77
6.1.6 Thinning..................................................................................77
6.1.7 Thickening............................................................................... 78
6.2 Grayscale Morphology........................................................................ 78
6.2.1 Erosion...................................................................................... 79
6.2.2 Dilation..................................................................................... 79
6.2.3 Opening.................................................................................... 79
6.2.4 Closing......................................................................................80
6.3 Summary...............................................................................................80
References........................................................................................................ 81
7. Other Applications of Image Preprocessing............................................83
7.1 Preprocessing of Color Images...........................................................83
7.2 Image Preprocessing for Neural Networks and
Deep Learning......................................................................................90
7.3 Summary............................................................................................... 94
References........................................................................................................ 95
Index����������������������������������������������������������������������������������������������������������������������� 99

Preface
Digital image processing is a widespread subject and is progressing
continuously. The development of digital image processing has been driven
by technological improvements in computer processors, digital imaging, and
mass storage devices. Digital image processing is used to extract valuable
information from images. In this procedure, it additionally deals with
(1) enhancement of the quality of an image, (2) image representation, (3)
restoration of the original image from its corrupted form, and (4) compression
of the bulk amounts of data in the images to increase the efficiency of image
retrieval. Digital image processing can be categorized into three different
categories. The first category involves the algorithm directly dealing with the
raw pixel values like edge detection, image denoising, and so on. The second
category involves the algorithm that employs results obtained from the first
category for further processing such as edge linking, segmentation, and so
forth. The third and last category involves the algorithm that tries to extract
semantic information from those delivered by the lower levels such as face
recognition, handwriting recognition, and so on. This book covers different
image preprocessing techniques, which are essential for the enhancement of
image data in order to reduce reluctant falsifications or to improves certain
image features vital for additional processing and image retrieval. This book
presents the different techniques of image transformation, enhancement,
segmentation, morphological techniques, filtering, preprocessing of color
images, and preprocessing for Deep Learning in detail. The aim of this book
is not only to present different perceptions of digital image preprocessing to
undergraduate and postgraduate students, but also to serve as a handbook
for practicing engineers. Simulation is an important tool in any engineering
field. In this book, the image preprocessing algorithms are simulated using
MATLAB®. It has been the attempt of the authors to present detailed examples
to demonstrate the various digital image preprocessing techniques.
This book is organized as follows:
• Chapter 1 gives an overview of image preprocessing. The different
fundamentals of image preprocessing methods like image correction,
image enhancement, image restoration, image compression, and the
effect of image preprocessing on image recognition are covered in this
chapter. Preprocessing techniques, used to correct the radiometric or
geometric aberrations, are introduced in this chapter. The examples
related to image correction, image enhancement, image restoration,
image compression, and the effect of image preprocessing on image
recognition are illustrated through MATLAB examples.

ix

x

Preface

• Chapter 2 deals with pixel brightness transformation techniques.
Position-dependent brightness correction is introduced in this chapter.
This chapter also gives an overview of different techniques used for
grayscale transformation like linear, logarithmic, and power–law or
gamma correction. Different types of linear transformations such as
identity transformation and negative transformation, different types
of logarithmic transformation like log transformations, and inverse
log transformations are also included in this chapter. Different image
enhancement techniques such as contrast stretching, histogram
equalization, and histogram specification are also discussed in this
chapter. The examples related to pixel brightness transformation
techniques are illustrated through MATLAB examples.
• Chapter 3 is devoted to geometric transformation techniques. Two
basic steps in geometric transformations like pixel coordinate
transformation or spatial transformation and brightness interpolation
are discussed in this chapter. Different simple mapping techniques
like translation, scaling, rotation, and shearing are included in this
chapter. Also, the affine mapping and different nonlinear mapping
techniques such as twirl, ripple, and spherical transformation are
discussed step by step. Various brightness interpolation methods like
nearest neighbor interpolation, bilinear interpolation, and bicubic
interpolation are included in this chapter. The examples related
to geometric transformation techniques are illustrated through
MATLAB examples.
• Chapter 4 discusses different spatial and frequency filtering
techniques. We explain in this chapter different spatial filtering
methods such as linear filter, nonlinear filter, and sharpening filter
smoothing, which includes smoothing linear filters and orderstatistics filters. Various frequency filters like low-pass filter, highpass filter, bandpass filter, and band-reject filter are also included. In
each category of frequency filter, three types of filters are explained:
Ideal, Butterworth, and Gaussian. The examples related to different
spatial and frequency-filtering techniques are illustrated through
MATLAB examples.
• The focus of Chapter 5 is on image segmentation. Different
segmentation techniques such as thresholding-based segmentation,
edge-based segmentation, and region-based segmentation are
explained in this chapter. Different methods to select the threshold
value like the histogram shape-based method, entropy-based
method, and clustering-based method—which includes k-means
and Otsu—are discussed in this chapter. Various edge-based
segmentations like Sobel, Canny, Prewitt, Robinson, Robert, kirsch,
LoG, and Marr-Hildreth are also explained step by step. Region
growing or merging, and region splitting methods are included

xi

Preface

in region-based segmentation. The examples related to image
segmentation techniques are illustrated through MATLAB examples.
• Chapter 6 provides an overview of mathematical morphology
techniques. Different methods of binary morphology and grayscale
morphology are discussed in this chapter. Binary morphology
techniques including erosion, dilation, opening, closing, hit-andmiss, thinning and thickening, as well as grayscale morphology
techniques including erosion, dilation, opening and closing are
explained. The examples related to mathematical morphology
techniques are illustrated through MATLAB examples.
• Chapter 7 deals with preprocessing of color images and preprocessing
for neural networks and Deep Learning. Preprocessing of color
images includes pseudo color processing, true color processing,
different color models, intensity modification, color complement, color
slicing, and tone correction. Other types of color image preprocessing
involve histogram equalization, segmentation of color image, and so
on. Preprocessing for neural networks and Deep Learning includes
unvarying aspect ratio, scaling of images, normalization of image
inputs, reduction of data dimension, and augmentation of image
data. The examples related of preprocessing techniques of color
images are illustrated through MATLAB examples.
Dr. Jyotismita Chaki
Jadavpur University
Dr. Nilanjan Dey
Techno India College of Technology
MATLAB® is a registered trademark of The MathWorks, Inc. For product
information, please contact:
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098 USA
Tel: 508 647 7000
Fax: 508-647-7001
E-mail: info@mathworks.com
Web: www.mathworks.com

Authors
Jyotismita Chaki, PhD, was appointed
as an Assistant Professor in the School of
Computer Engineering at Kalinga Institute
of Industrial Technology (KIIT) deemed to be
university, India. From 2012 to 2017, she was
a Research Fellow at Jadavpur University. Her
research interest includes image processing,
pattern recognition, computer vision and
machine learning. Dr. Chaki is a reviewer
of Journal of Visual Communication and
Image Representation, Elsevier; Biosystems
Engineering, Elsevier; Signal, Image and Video
Processing, Springer; Pattern Recognition
Letters, Elsevier; Applied Soft Computing, Elsevier and Computers and
Electronics in Agriculture, Elsevier.
Nilanjan Dey, PhD, is currently associated
with Department of Information Technology,
TechnoIndia College of Technology, Kolkata,
W.B., India. He holds an honorary position
of visiting scientist at Global Biomedical
Technologies Inc., California, and is a
research scientist at the Laboratory of Applied
Mathematical Modeling in Human Physiology,
Territorial Organization of Scientific and
Engineering Unions, Bulgaria. He is also an
associate researcher of Laboratoire RIADI,
University of Manouba, TUNISIA. He is an
associated member of Wearable Computing
Research lab, University of Reading, London,
in the United Kingdom.
His research topics are medical imaging, soft computing, data mining,
machine learning, rough sets, computer aided diagnosis, and atherosclerosis.
He has authored 35 books and 170 journals and 100 international conference
papers. He is the editor-in-chief of the International Journal of Ambient
Computing and Intelligence, US (Scopus, ESCI, ACM dl and DBLP listed), the
International Journal of Rough Sets and Data Analysis, and is the U.S. co-editorin-chief of the International Journal of Synthetic Emotions and the International
Journal of Natural Computing Research. He also is the U.S. series editor of
Advances in Geospatial Technologies (AGT) book series, and the U.S. series editor
xiii

xiv

Authors

of Advances in Ubiquitous Sensing Applications for Healthcare (AUSAH). He is
also the executive editor of the International Journal of Image Mining (IJIM) and
the associated editor of IEEE Access journal and the International Journal of
Service Science, Management, Engineering and Technology. He is a life member of
IE, UACEE, and ISOC. He has chaired many international conferences such as
the ITITS 2017—China, WS4 2017—London, INDIA 2017—Vietnam etc.

1
Perspective of Image Preprocessing
on Image Processing

1.1 Introduction to Image Preprocessing
Preprocessing is a typical name for procedures applied to both input and
output intensity images. These images are indistinguishable from the original
data taken by the sensors. Basically, image preprocessing is a method to
transform raw image data into a clean image data, as most of the raw image
data contain noise and contain some missing values or incomplete values,
inconsistent values, and false values [1]. Missing information means lacking
of certain attributes of interest or lacking of attribute values. Inconsistent
information means there are some discrepancies in the image. False
value means error in the image value. The purpose of preprocessing is an
enhancement of the image data to reduce reluctant falsifications or to improve
some image features vital for additional processing [2]. Some will contend that
image preprocessing is not a smart idea, as it alters or modifies the true nature
of the raw data. Nevertheless, smart application of image preprocessing can
offer benefits and take care of issues that finally produce improved global and
local feature detection [3]. Image preprocessing may have beneficial effects
on the excellence of feature extraction and the outcomes of image analysis
[4]. Image preprocessing is similar to the scientific standardization of a data
set, which is a general step in many feature descriptor techniques. Image
preprocessing is used to correct the degradation of an image. In that case, some
prior data or information is important such as information about the nature of
the degradation, information about the features of the image capturing device,
and the conditions under which the image was obtained. Figure 1.1 shows the
steps of image preprocessing during digital image processing.

1.2 Complications to Resolve Using Image Preprocessing
The following complications can be resolved by using image preprocessing
techniques.
1

2

A Beginner’s Guide to Image Preprocessing Techniques

FIGURE 1.1
Image preprocessing step in digital image processing.

1.2.1 Image Correction
Image corrections are generally grouped into radiometric and geometric
corrections. Some standard correction methods might be completed prior to the
data being delivered to the user [5]. These techniques incorporate a radiometric
correction to correct for the irregular sensor reaction over the entire image, and
a geometric correction to correct for the geometric misrepresentation owing to
different imaging settings such as oblique viewing [6]. Radiometric correction
means correcting the radiometric error caused owing to the noise in the
brightness values of the image. Some common radiometric errors are random
bad pixels or shot noise, line start/stop problems, line or column dropouts,
and line or column striping [7]. Radiometric correction is a preprocessing
method to rebuild physically aligned values by altering the spectral faults and
falsifications caused by the sensors themselves when the individual detectors
do not function properly, or are not properly calibrated for the sun’s direction
and or landscape [8]. For example, shot noise, which is generated when random
pixels are not recorded for one or more band, can be corrected by identifying
missing pixels. Missing pixel values can be regenerated by taking the average
of the neighboring pixels and filling in the value of the missing pixel. Figure
1.2 shows the preprocessed output.
Line start/stop problems occur when scanning detectors fail to start or are
out of sequence with other detectors, which results in displaced rows with the
pixel data at inappropriate locations along the scan line [9]. This can be solved
by determining the rows affected and offsetting and scripting a standard
offset for the affected rows. Figure 1.3 shows the preprocessing output.
Line or column dropout error occurs when an entire line does not contain
any information and results in blank lines or lines of same gray level value.

(A)
FIGURE 1.2
(A) Image with shot noise, (B) Preprocessed output.

(B)

3

Perspective of Image Preprocessing on Image Processing

(A)

(B)

FIGURE 1.3
(A) Image with line start/stop problem, (B) Preprocessed image.

This error can be corrected by averaging above and below pixel values to
record in each missing pixels, or fill in values from another image. Figure 1.4
shows the preprocessing output.
Line or column striping occurs when there are some stripes throughout the
entire image. This can be resolved by identifying the rows impacted through
analysis of a latitudinal geographic profile for the affected band. Figure 1.5
shows the preprocessing output.

(A)

(B)

FIGURE 1.4
(A) Image with line or column dropout, (B) Preprocessed output.

(A)
FIGURE 1.5
(A) Image with line striping, (B) Preprocessed output.

(B)

4

A Beginner’s Guide to Image Preprocessing Techniques

Geometric corrections contain modifications of geometric distortions
caused by sensor–Earth geometry differences, translation of the data to
real-world latitude and longitude on the Earth’s surface, motion of platform,
Earth curvature, and so on [10]. Geometric correction means putting the
pixels in their proper location. This type of correction is generally needed
to coregister images for change detection, make accurate distance and area
measurements, and correct the imagery distortion. Sometimes the scale of
the same object varies due to some change in capturing the image. Geometric
error also involves perspective distortion. Fixing these types of distortion
involves resampling of the image data [11]. This can be done by determining
the correct geometric model, which tells how to transform images, in order
to compute the geometric transformations and basically how to analyze the
geometric error and resample to produce new output image data. Image
row/column coordinates are transformed to real-world coordinates using
polynomials [12]. One must choose the proper order of the polynomial. The
higher the transformation order, the greater the number of variables in the
polynomials and the more the warping stretches and twists in the dataset.
The higher order polynomial can provide misleading RMS errors. First order
polynomials, called affine transforms, are the linear conversion used to shift
the origin of the image, as well as rescale and rotate it. Figure 1.6 shows the
outputs of affine transformation.
Second order polynomial is the nonlinear conversion used to correct the
camera lens distortion and correct for the Earth’s curvature. Figure 1.7 shows
the outputs of nonlinear transformation.
1.2.2 Image Enhancement
Image enhancement is mostly refining the sensitivity of information in
images for an improved input for automated image processing methods [13].
The primary goal of image enhancement is to adjust image attributes to make
them more appropriate for a given task. Through this method, one or more
attributes of the image are revised. Image enhancement is used to highlight
interesting details in images, remove noise from images, make images
more visually appealing, enhance otherwise hidden information, filter

(1A)

(1B)

(2A)

(2B)

(3A)

(3B)

FIGURE 1.6
(1A) Original position of the image, (1B) Shift of origin of the image, (2A) Original scaling of
the image, (2B) Rescaling output of the image, (3A) Original orientation of the image, and (3B)
Change of orientation of the image.

5

Perspective of Image Preprocessing on Image Processing

(1A)

(1B)

(2A)

(2B)

FIGURE 1.7
(1A) Ripple effect, (1B) Nonlinear correction output, (2A) Spherical effect, and (2B) Nonlinear
correction output.

important image features, and discard unimportant image features [14]. The
enhancement approaches are generally divided into the following two types:
spatial domain methods and frequency domain methods. In spatial domain
methods, image pixels are enhanced directly. The pixel values are altered to
obtain the desired enhancements. The spatial domain image enhancement
operation is expressed by using Equation 1.1:
S( x , y ) = T[I ( x , y )],

(1.1)

where I ( x , y ) is the input image, S( x , y ) is the processed image, and T is an
operator on I defined over some neighborhood of ( x , y ).
Some spatial domain image enhancement includes point processing, mask
processing, and so on. In point processing a neighborhood of 1 × 1 pixel is
considered. This is generally used to convert a color image to grayscale or
binary image and so forth. In mask processing, the neighborhood is larger
than a 1 × 1 pixel area. This is generally used in image sharpening, image
smoothing, and so on. Some of the spatial domain enhancements are shown
in Figure 1.8.
With frequency domain methods, first the image is transmitted into the
frequency domain. The enhancement procedures are performed on the
frequency domain of the image, and then it is again transferred to the spatial
domain. Figure 1.9 illustrates the procedure.
The frequency domain image enhancement operation is expressed by using
Equation 1.2:
F(u, v) = H (u, v)I (u, v),

(1.2)

where I (u, v) is the input image in the frequency domain, H (u, v) is the transfer
function, and F(u, v) is the enhanced image. These enhancement processes are
done in order to enhance some frequency parts of the image. Some frequency
domain image enhancements are shown in Figure 1.10.
Through image enhancement, the pixel intensity values of the input image
are altered according to the enhancement function applied to the input
values.

6

A Beginner’s Guide to Image Preprocessing Techniques

(I)

(II)

(A)

(B)

(A)

(B)

(A)

(B)

(IV)

(III)

(A)

(B)
(V)

(A)

(B)

FIGURE 1.8
Enhancement outputs in the spatial domain. (I) Conversion of true color image to grayscale
image: (A) True color image, (B) Grayscale Image; (II) Negative Transformation of a grayscale
image: (A) Original image, (B) Negative transformed image; (III) Contrast Enhancement: (A)
Original image, (B) Contrast enhanced image; (IV) Sharpening an image: (A) Original Image,
(B) Sharpen Image; (V) Smoothing an image: (A) Original image, (B) Smoothed image.

1.2.3 Image Restoration
The goal of image restoration methods is to decrease the noise effect or
corruption from the image and improve resolution loss. Image preprocessing
methods are done both in the image domain and the frequency domain.
Corruption may arise in many ways such as noise, motion blur, camera
misfocus, and so on [15]. Image restoration is not the same as image
enhancement, as the latter one is used to highlight features of the image used

FIGURE 1.9
Steps of enhancement of images in the frequency domain.

7

Perspective of Image Preprocessing on Image Processing

(I)

(II)

(A)

(B)

(A)

(B)

(III)

(A)

(B)

FIGURE 1.10
Enhancement outputs in the frequency domain. (I) The output of low pass filter: (A) Original
image, (B) Filtered image; (II) The output of high pass filter: (A) Original image, (B) Filtered
image; (III) The output of bandpass filter: (A) Original image, (B) Filtered image.

to make it more attractive to the viewer, but it is not essential in obtaining
representative data from a scientific point of view. With image enhancement
noise can be efficiently suppressed by losing some resolution, but this is not
satisfactory in many applications. Image restoration is useful in these cases.
Distorted pixels can be restored by the average value of the neighboring
pixels. Some outputs of the restored images are shown in Figure 1.11.
1.2.4 Image Compression
Image compression can be described as the procedure of encoding data using
a method that decreases the overall size of the image [16]. This reduction of
(I)

(II)

(A)

(B)

(A)

(B)

FIGURE 1.11
Outputs of restored images. (I) Output of restoration of blurred image: (A) Blurred image,
(B) Restored image; (II) Noise reduction: (A) Noisy image, (B) Image after noise removal.

8

A Beginner’s Guide to Image Preprocessing Techniques

data can be done when the original dataset holds some type of redundancy.
Image compression is used to reduce the total number of bits needed to
characterize an image. This can be accomplished by removing different
types of redundancy that occur in the pixel values. Generally, three basic
redundancies occur in digital images: (1) psycho-visual redundancy, which
corresponds to different intensities from image signals sensed by human
eyes. Therefore, removing some less important intensities may not be sensed
by human eyes; (2) interpixel redundancy, which corresponds to statistical
reliance among pixels, particularly between neighboring pixels; and (3)
coding redundancy, which occurs when the image is coded with every pixel
by a fixed length. There are many methods to deal with these aforementioned
redundancies. Compression methods can be classified into two categories:
lossy compression and lossless compression. Lossy compression can attain
high compression ratios such as 60:1 or higher as it permits some tolerable
degradation, but lossless compression can attain very low compression ratios
such as 2:1 as it can completely recover the original data. In applications
where the image quality is the ultimate requirement, lossless compression
is used—such as in medical applications in which no degradation on the
original image data are permitted owing to the accuracy requirements for
diagnosis. Figure 1.12 shows the block diagram of lossy compression.
Lossy compression is basically a three-stage compression technique
to remove the three types of redundancies discussed above. First, a
transformation is applied to remove the interpixel redundancy to pack
information effectively. Then quantization is applied to eliminate psychovisual redundancy to characterize the packed information with the fewest
bits. The quantized bits are then proficiently encoded to get much more
compression from the coding redundancy. Lossy decompression is a perfect
inverse technique of lossy compression.
Figure 1.13 shows the block diagram of lossless compression.
Lossless compression is usually a two-step compression technique. First,
transformation is applied to the original image to convert it to some other
format to reduce the interpixel redundancy. Then an entropy encoder is used

FIGURE 1.12
Block diagram of lossy compression.

9

Perspective of Image Preprocessing on Image Processing

FIGURE 1.13
Block diagram of lossless compression.

to eliminate the coding redundancy. Lossless decompression is a perfect
inverse technique of lossless compression.

1.3 Effect of Image Preprocessing on Image Recognition
Image preprocessing is used to enhance the image data so that useful features
can be extracted for image recognition. Image cropping is used to crop the
irrelevant parts from the image so that the region of interest of the image
is focused. Image morphological operations can be applied in some cases.
Image filtering is used to create new intensity values in the output image.
Smoothing methods are used to remove noise or other small irrelevant data
in the image [17]. Filters are also used to highlight the edges of an image.
Brightness and contrast of the image can also be adjusted to enhance the
useful features of the image [18]. The unwanted areas can be removed from
the binary image by using a polygonal mask. Images can also be transformed
to different color modes for extraction of different types of features. If the
whole scene is rotated, or the image is taken from the wrong perspective,
it is required to correct the geometry prior to feature extraction, as many
features are dependent on geometric variation [19]. Figure 1.14 shows that

(1A)

(1B)

(2A)

(2B)

FIGURE 1.14
(1A) Raw image, (1B) Extracted edge information from raw image, (2A) Preprocessed image,
and (2B) Extracted edge information from preprocessed image.

10

A Beginner’s Guide to Image Preprocessing Techniques

(A)

(B)

(C)

FIGURE 1.15
(A) Original gray image, (B) Binarized image, and (C) Separation of leaflets using morphological
erosion operation.

(A)

(B)

(C)

(D)

(E)

FIGURE 1.16
(A) The original image, (B) Binarized image, (C) Corrected orientation, (D) Corrected translation
factor, and (E) Preprocessed image after correcting the orientation and translation.

more edge information can be obtained from a preprocessed image than
from a raw image.
Suppose we have to count the number of leaflets from a compound leaf
image [20]. In this particular example, the preprocessing steps involve
binarization and some morphological operations. Figure 1.15 illustrates this.
To correct the orientation and translation factor, preprocessing can be applied
as shown in Figure 1.16.

1.4 Summary
Image preprocessing is an enhancement of the image data to reduce reluctant
falsifications or improve some image features vital for additional processing.
Preprocessing is generally used to correct the radiometric or geometric
errors, enhance the image, restore the image, and compress the image data.
Radiometric correction is used to correct for the irregular sensor reaction
over the entire image, and geometric correction is used to compensate for
the geometric misrepresentation due to different imaging settings such
as oblique viewing. Image enhancement is mainly used to adjust image
attributes to make it more appropriate for a given task. The goal of image
restoration methods is to decrease the noise effect or corruption from the

Perspective of Image Preprocessing on Image Processing

11

image and improve resolution loss. Image compression is used to decrease
the overall size of the image. Image preprocessing is used to enhance the
image data so that useful features can be extracted for image recognition.

References
1. Chatterjee, S., Ghosh, S., Dawn, S., Hore, S., & Dey, N. 2016. Forest type
classification: A hybrid NN-GA model based approach. In Information Systems
Design and Intelligent Applications (pp. 227–236). Springer, New Delhi.
2. Santosh, K. C., & Nattee, C. 2009. A comprehensive survey on on-line
handwriting recognition technology and its real application to the Nepalese
natural handwriting. Kathmandu University Journal of Science, Engineering, and
Technology, 5(I), 31–55.
3. Hore, S., Chakroborty, S., Ashour, A. S., Dey, N., Ashour, A. S., Sifaki-Pistolla,
D., Bhattacharya, T., & Chaudhuri, S. R. 2015. Finding contours of hippocampus
brain cell using microscopic image analysis. Journal of Advanced Microscopy
Research, 10(2), 93–103.
4. Dey, N., Roy, A. B., Das, P., Das, A., & Chaudhuri, S. S. 2012, November. Detection
and measurement of arc of lumen calcification from intravascular ultrasound
using harris corner detection. In Computing and Communication Systems (NCCCS),
2012 National Conference on (pp. 1–6). IEEE.
5. Santosh, K. C., Lamiroy, B., & Wendling, L. 2012. Symbol recognition using
spatial relations. Pattern Recognition Letters, 33(3), 331–341.
6. Dey, N., Ahmed, S. S., Chakraborty, S., Maji, P., Das, A., & Chaudhuri, S. S.
2017. Effect of trigonometric functions-based watermarking on blood vessel
extraction: An application in ophthalmology imaging. International Journal of
Embedded Systems, 9(1), 90–100.
7. Saha, M., Chaki, J., & Parekh, R. 2013. Fingerprint recognition using texture
features. International Journal of Science and Research, 2, 12.
8. Chakraborty, S., Mukherjee, A., Chatterjee, D., Maji, P., Acharjee, S., & Dey, N.
2014, December. A semi-automated system for optic nerve head segmentation
in digital retinal images. In Information Technology (ICIT), 2014 International
Conference on (pp. 112–117). IEEE.
9. Hossain, K., Chaki, J., & Parekh, R. 2014. Translation and retrieval of image
information to and from sound. International Journal of Computer Applications,
97(21), 24–29.
10. Dey, N., Roy, A. B., & Das, A. 2012, August. Detection and measurement of
bimalleolar fractures using Harris corner. In Proceedings of the International
Conference on Advances in Computing, Communications and Informatics (pp. 45–51).
ACM, Chennai, India.
11. Belaïd, A., Santosh, K. C., & d’Andecy, V. P. 2013. Handwritten and printed text
separation in real document. arXiv preprint arXiv:1303.4614.
12. Dey, N., Nandi, P., Barman, N., Das, D., & Chakraborty, S. 2012. A comparative
study between Moravec and Harris corner detection of noisy images using
adaptive wavelet thresholding technique. arXiv preprint arXiv:1209.1558.

12

A Beginner’s Guide to Image Preprocessing Techniques

13. Russ, J. C. 2016. The Image Processing handbook. CRC Press, Boca Raton, FL.
14. Araki, T., Ikeda, N., Dey, N., Acharjee, S., Molinari, F., Saba, L., Godia, E.,
Nicolaides, A., & Suri, J. S. 2015. Shape-based approach for coronary calcium
lesion volume measurement on intravascular ultrasound imaging and its
association with carotid intima-media thickness. Journal of Ultrasound in
Medicine, 34(3), 469–482.
15. Ashour, A. S., Samanta, S., Dey, N., Kausar, N., Abdessalemkaraa, W. B., &
Hassanien, A. E. 2015. Computed tomography image enhancement using
cuckoo search: A log transform based approach. Journal of Signal and Information
Processing, 6(03), 244.
16. Nandi, D., Ashour, A. S., Samanta, S., Chakraborty, S., Salem, M. A., & Dey, N.
2015. Principal component analysis in medical image processing: A study.
International Journal of Image Mining, 1(1), 65–86.
17. Hangarge, M., Santosh, K. C., Doddamani, S., & Pardeshi, R. 2013. Statistical
texture features based handwritten and printed text classification in south
indian documents. arXiv preprint arXiv:1303.3087.
18. Chaki, J., Parekh, R., & Bhattacharya, S. 2016. Plant leaf recognition using ridge
filter and curvelet transform with neuro-fuzzy classifier. In Proceedings of 3rd
International Conference on Advanced Computing, Networking and Informatics
(pp. 37–44). Springer, New Delhi.
19. Chaki, J., Parekh, R., & Bhattacharya, S. 2015. Plant leaf recognition using
texture and shape features with neural classifiers. Pattern Recognition Letters,
58, 61–68.
20. Chaki, J., Parekh, R., & Bhattacharya, S. In press. Plant leaf classification using
multiple descriptors: A hierarchical approach. Journal of King Saud UniversityComputer and Information Sciences, doi:10.1016/j.jksuci.2018.01.007.

2
Pixel Brightness Transformation Techniques
Pixel brightness can be revised by using pixel brightness techniques. The
transformation relies on the characteristics of a pixel itself. There are two
types of pixel brightness transformations: Position-dependent brightness
correction and grayscale transformation [1]. The position-dependent
brightness correction, or simply brightness correction, modifies the pixel
brightness value by considering the original brightness of the pixel and its
position in the image. Grayscale transformation modifies the brightness of
the pixel regardless of the position of the pixel.

2.1 Position-Dependent Brightness Correction
The quality of acquisition and digitization of an image is not dependent on the
pixel position in the image, however, in many practical cases this theory is not
valid [2]. There are different reasons for the degradation of the image quality:
first, the uneven sensitivity of the light sensors such as CCD camera elements,
vacuum-tube cameras, and so on; second, the nonhomogeneous property of
the optical system, that is, if the lens passes farther from the optical lens,
the lens decreases light more; and last, the uneven object illumination. With
brightness correction, the systematic degradation can be suppressed. The
ideal identity transfer function can be described by a multiplicative error
coefficient E(p, q). Let the original undegraded, or desired, image be G(p, q)
and the image containing degradation F(p, q)
F( p, q) = E( p, q)G( p, q).

(2.1)

If the reference image G(p, q) is captured with a known constant brightness
C, then the error coefficient E(p, q) can be obtained. If the image containing
degradation is Fc(p, q), the systematic brightness errors can be suppressed by
Equation 2.2:
G( p, q) =

F ( p , q) C × F ( p , q)
=
.
E( p, q)
Fc ( p, q)

(2.2)

This technique can be adopted if the degradation process of the image is stable.
Periodic calibration is needed for the device to find the error coefficient E(p, q).
13

14

A Beginner’s Guide to Image Preprocessing Techniques

This method indirectly adopts linearity of the transformation [3]. But as the
brightness scale is restricted to some interval, this technique is not true in
reality. Equation 2.1 can overflow. This indicates that the best reference image
has brightness that is far from both the minimum and maximum limits of the
brightness. If the gray scale has 256 levels of brightness, then the best image
has a persistent brightness level of 128. Most TV cameras permit us to control
the varying illumination settings, as they have automatic gain controllers.
This automatic gain control should be switched off first if systematic errors
are suppressed using error coefficients.

2.2 Grayscale Transformations
This transformation is not dependent on the pixel position in the image
[4]. Here, an input image I is transformed into G by using T. Where T is the
transformation. Let the pixel value of I and G be represented as PI and PG,
respectively. So, the pixel values are related by Equation 2.3:
PG = T PI .

(2.3)

Using Equation 2.3, the pixel value PI is mapped to PG by the transformation
function T. As we are dealing only with the grayscale transformation, the
output of this transformation is mapped to a grayscale range. The output is
mapped to the range [0, L − 1], where L  =  2m, m is the number of bits in the
image. For example, the range of pixel values of an 8-bit image will be [0, 255].
The following are three basic gray level transformations used in image
enhancement:
• Linear transformation
• Logarithmic transformation
• Powerlaw transformation.
Figure 2.1 shows the plots of different grayscale transformation functions.
Here, L represents the number of gray levels. The identity and negative
transformation function plots are the types of linear transformations; log
and inverse log transformation function plots are the types of logarithmic
transformation plots, and nth root and nth power transformation function
plots are the types of power-law transformations.
2.2.1 Linear Transformation
There are two types of linear transformation: identity transformation and
negative transformation [5]. In identity transformation each pixel value of
the input image is directly mapped to the pixel value of the output image. So

Pixel Brightness Transformation Techniques

15

FIGURE 2.1
Different grayscale transformation functions.

the result is same in the input and output image. Hence, it is called identity
transformation. The graph of identity transformation is shown in Figure 2.2.
This particular graph shows that between the input and output image there
is a straight transition line. This represents that for each input pixel value,
the output pixel value will remain the same. So, here the output image is the
replica of the input image. Linear transformation can be used to convert a
color image into gray scale. Let I(p, q) be the input image and G(p, q) the output
image. Then the linear transformation can be represented by Equation 2.4:
G( p, q) = I ( p, q).

FIGURE 2.2
Identity transformation plot.

(2.4)

16

A Beginner’s Guide to Image Preprocessing Techniques

(A)

(B)

FIGURE 2.3
(A) Input image, (B) Linear transformed image.

Figure 2.3 shows the input and output of linear transformation.
The second type of linear transformation is negative transformation
[6]. This is basically the inverse of the linear transformation. The negative
transformation of an image with gray levels within the range [0, L − 1] can be
obtained by subtracting each input pixel value from [L − 1] and mapping it
into the output image, which can be expressed by the Equation 2.5:
PG = L − 1 − PI .

(2.5)

This expression indicates the reversing of the gray level intensities of the
input pixels, therefore producing a negative image. The graph of negative
transformation is shown in Figure 2.4.

FIGURE 2.4
Negative transformation plot.

17

Pixel Brightness Transformation Techniques

(A)

(B)

FIGURE 2.5
(A) Input image, (B) Negative transformed image.

This technique is beneficial for improving gray or white details implanted
in the dark regions of an image. Figure 2.5 shows the input and output of
negative transformation.
In the above example, the input image is an 8-bit image. So, there are 256
levels of variations of gray. Putting this in Equation 2.6, we get
PG = 256 − 1 − PI = 255 − PI .

(2.6)

So, by applying the negative transformation, lighter pixels become dark and
darker pixels become light.
2.2.2 Logarithmic Transformation
This transformation can be used to brighten the intensity of a pixel in an
image [7]. There are various reasons to work with logarithmic intensities
rather than with the actual pixel intensity: the logged intensity values are
comparatively less dependent on the magnitude of the pixel values, the
skewness of the highly skewed values reduces while considering the logs,
and the variance estimation increases when using logarithmic values.
The visual inspection of data becomes easier by using logged intensities.
The raw data are frequently severely clomped together at low intensities
followed by a very long tail. Over 75% of the image information may lie in
the least 10% values of intensities. The details of such parts are difficult to
recognize. After the logarithmic transformation, the change of intensity
information is spread out more equally making it simpler to analyze. There
are two types of logarithmic transformation: log transformation and inverse
log transformation. The graph for log and inverse log transformation is
shown in Figure 2.6.

18

A Beginner’s Guide to Image Preprocessing Techniques

FIGURE 2.6
Logarithmic transformation plot.

The log transformation is used to brighten or increase the detail of the
lower intensity values of an image. This can be expressed by the Equation 2.7:
PG = c × log(PI + 1),

(2.7)

where c is a constant which is normally used to scale the scope of the log
transformation function to match the input area. For a 8-bit image, c  =  255/
log(1 + 255). It can be used to additionally increase the contrast—the higher
the c, the brighter the image will appear.
The value 1 is added to every one of the pixel values of the input image in light
of the fact that if there is a pixel intensity of 0 in the image, at that point log (0) is
equivalent to infinity. So 1 is included, to make the minimum value no less than 1.
During log transformation, the dark pixels in an image are extended compared
to the higher pixel values. The higher pixel values are somewhat compressed in
log transformation. This makes for the improvement of the image.
Figure 2.7 demonstrates the outcomes of log transformation of the original
image. We can see that when c  =  4, the image is the brightest and the

FIGURE 2.7
Results of log transformation.

Pixel Brightness Transformation Techniques

19

outspread lines are visible within the tree. These lines are not visible in the
original image, as there isn’t sufficient contrast in the lower intensities.
Inverse log transformation is opposite of the log transformation. It expands
bright regions and compresses the darker intensity level values.
2.2.3 Power-Law Transformation
This transformation is used to increase the contrast of the image [8]. There
are two types of power-law transformations: n-th power and n-th root
transformation. These transformations can be expressed by Equation 2.8:
PG = C × PIγ .

(2.8)

The symbol γ is called gamma and this transformation is also called
gamma correction. For different values of γ, various levels of enhancement of
the image can be obtained. The graph of power-law transformation is shown
in Figure 2.8.
Different monitors or display devices have their own gamma correction.
That is the reason they display their image at various intensity. This sort of
transformation is used for improving images for various kinds of monitors.
The gamma of various monitors is different. For instance, the gamma of
monitors lies between 1.8 and 2.5, which implies the image displayed on
monitor is dark. The same image with different γ values is shown in Figure 2.9.
Digital images have a finite number of gray levels [9]. Thus, grayscale
transformations should be possible using look-up tables. Grayscale
transformations are mostly used if the outcome is seen by a human. One
way to improve the contrast of the image is contrast stretching (also known
as normalization) [10]. Contrast stretching is a linear normalization that
expands an arbitrary interval of the intensities of an image and fits this

FIGURE 2.8
Power-law transformation plot.

20

A Beginner’s Guide to Image Preprocessing Techniques

FIGURE 2.9
Results of Gamma variation where C  =  2.

interval to another arbitrary interval. The initial step is to decide the limits
over which image intensity values will be expanded. These lower and upper
limits will be known as p and q, individually. For standard 8-bit grayscale
images, these limits are normally 0 and 255. Next, the histogram of the input
or original image is examined to decide the possible value limits (lower  =  a,
upper  =  b) in the unmodified image. If the input image covers the entire
possible set of values, direct contrast stretching will achieve nothing, but,
even then sometimes the majority of the picture information is contained
within a restricted range. This restricted range can be extended linearly with
original values, which lie outside the range, being set to the appropriate limit
of the extended output range. Then for every pixel, the original value PI is
mapped to output PG by using Equation 2.9:
 q − p 
+ p.
PG = (PI − a)
 b − a 

(2.9)

Figure 2.10 shows the result after contrast stretching. In contrast stretching,
there exists a one-to-one relationship of the intensity values between the
original or input image and the output image; that is, after contrast stretching
the input image can be restored from the output image.
Another transformation for contrast improvement is usually applied
automatically using histogram equalization, which is a nonlinear
normalization, expanding the range of the histogram with high intensities
and compressing the areas with low intensities [11]. The point is to discover
an image with equally distributed brightness levels over the whole brightness
scale. Histogram equalization improves contrast for brightness values close

21

Pixel Brightness Transformation Techniques

FIGURE 2.10
Contrast stretching results.

to histogram maxima, and decreases contrast near the minima. Figure 2.11
shows the result after histogram equalization. Once histogram equalization
is executed, there is no technique for getting back the original image.
Let the input histogram be denoted by Hp where p0 ≤ p ≤ pt. The intention
is to find a monotonic transform of grayscale q = T(p), for which the output
histogram Gq will remain uniform for the whole input brightness domain,
where q0 ≤ q ≤ qt. This monotonic property of T can be expressed by
Equation 2.10:
t

∑

t

Gqk =

k =0

∑H

pk

.

(2.10)

k =0

The equalized histogram Gqk corresponds to a uniform distribution function
F whose value is constant and can be expressed by Equation 2.11 for a N × N
image,
F=

N2
.
qt − q0

(2.11)

In the continuous case, the ideal continuous histogram is available and can
be expressed by Equation 2.12:
q

p

∫ G(s)ds = ∫ H(s)ds.
q0

FIGURE 2.11
Histogram equalization result.

p0

(2.12)

22

A Beginner’s Guide to Image Preprocessing Techniques

Substituting Equation 2.11 in Equation 2.12 we get
q

N

2

∫
q0

1
ds =
qt − q0

N 2 (q − q0 )
=
qt − q0

p

∫ H(s)ds
p0
p

∫ H(s)ds

(2.13)

p0

q −q
q = T ( p) = t 2 0
N

p

∫ H(s)ds + q .
0

p0

For discrete case, this is called cumulative histogram, which is approximated
by the sum in the digital images and can be expressed by Equation 2.14:
q = T ( p) =

qt − q0
N2

p

∑H(k) + q .
0

(2.14)

k = p0

Histogram specification, or histogram matching, can also be used to enhance
the contrast of an image [12]. Histogram specification, or histogram matching,
is a method that changes the histogram of one image into the histogram of
another image. This change can be effortlessly done by perceiving that if as
opposed to using an equally separated perfect histogram (as in histogram
equalization), one is specified explicitly. By this method, it is possible to impose
an arbitrary histogram of an image to another. First, choose the template
histogram. This can be done by determining a specific histogram shape, or
by calculating the histogram of a target image. Then, the histogram of the
image to be transformed is calculated. Afterwards, calculate the cumulative
aggregate of the template histogram. Then, calculate the cumulative aggregate
of the histogram of the image to be changed. Finally, map pixels from one
bin to another bin, as per the guidelines of histogram equalization. The
essential rule is that the actual cumulative aggregate cannot be less than the
cumulative aggregate of the template image. Figure 2.12 shows the result of
histogram specification.

FIGURE 2.12
Result of histogram specification.

Pixel Brightness Transformation Techniques

23

2.3 Summary
In image preprocessing, image information captured by sensors on a satellite
contains faults associated with geometry and brightness information of the
pixels. These errors are improved using suitable techniques. Image enhancement
is the adjustment of an image by altering the pixel brightness values to enhance
its visual effect. Image enhancement includes a collection of methods used to
improve the visual presence of an image, or to alter the image to a form better
matched for human or machine understanding. This chapter describes the image
enhancement methods by using pixel brightness transformation techniques.
Two types of pixel brightness transformation techniques are discussed in this
chapter: position dependent and independent, or grayscale transformation. The
position-dependent brightness correction modifies the pixel brightness value by
considering the original brightness of the pixel and its position in the image. But,
grayscale transformation alters the brightness of the pixel regardless the position
of the pixel. There are different variations in gray level transformation techniques:
linear, logarithmic, and power-law. The identity transformation, which is a type
of linear transformation, is mainly used to convert the color image into gray
scale. The second type of linear transformation, that is, negative transformation
can be used to enhance the gray or white details embedded into the dark region
of the image. By using this transformation lighter pixels become dark and
darker pixels become light. The logarithmic transformation is used to brighten
the intensity of a pixel in an image. The log transformation, which is a type of
logarithmic transformation, is used to brighten or increase the detail of the lower
intensity values of an image. The second type of logarithmic transformation,
that is, inverse log transformation is opposite to the log transformation. Powerlaw transformation, also known as gamma correction transformation, is used to
increase the contrast of an image. For different values of gamma, various levels
of enhancement of the image can be obtained. This sort of transformation is
used for improving images for various kinds of monitors. To enhance the image
contrast, different types of methods can be adopted like contrast stretching,
histogram equalization, and histogram specification. Contrast stretching is a
linear transformation and the original image can be retrieved from the contraststretched image. Histogram equalization is a nonlinear transformation and
doesn’t allow for the retrieval of the original image from the histogram-equalized
image. In case of histogram specification, the histogram of a template image can
be applied to the input image to enhance the contrast of the input image.

References
1. Umbaugh, S. E. 2016. Digital Image Processing and Analysis: Human and Computer
Vision Applications with CVIPtools. CRC Press, Boca Raton, FL.
2. Russ, J. C. 2016. The Image Processing Handbook. CRC Press, Boca Raton, FL.

24

A Beginner’s Guide to Image Preprocessing Techniques

3. Saba, L., Dey, N., Ashour, A. S., Samanta, S., Nath, S. S., Chakraborty, S.,
Sanches, J., Kumar, D., Marinho, R., & Suri, J. S. 2016. Automated stratification of
liver disease in ultrasound: An online accurate feature classification paradigm.
Computer Methods and Programs in Biomedicine, 130, 118–134.
4. Chaki, J., Parekh, R., & Bhattacharya, S. In press. Plant leaf classification using
multiple descriptors: A hierarchical approach. Journal of King Saud UniversityComputer and Information Sciences, doi:10.1016/j.jksuci.2018.01.007.
5. Bhattacharya, T., Dey, N., & Chaudhuri, S. R. 2012. A session based multiple
image hiding technique using DWT and DCT. arXiv preprint arXiv:1208.0950.
6. Kotyk, T., Ashour, A. S., Chakraborty, S., Dey, N., & Balas, V. E. 2015. Apoptosis
analysis in classification paradigm: A neural network based approach. In Healthy
World Conference—A Healthy World for a Happy Life (pp. 17–22). Kakinada (AP), India.
7. Ashour, A. S., Samanta, S., Dey, N., Kausar, N., Abdessalemkaraa, W. B., &
Hassanien, A. E. 2015. Computed tomography image enhancement using
cuckoo search: A log transform based approach. Journal of Signal and Information
Processing, 6(03), 244.
8. Francisco, L., & Campos, C. 2017, October. Learning digital image processing
concepts with simple scilab graphical user interfaces. In European Congress on
Computational Methods in Applied Sciences and Engineering (pp. 548–559). Springer,
Cham.
9. Chakraborty, S., Chatterjee, S., Ashour, A. S., Mali, K., & Dey, N. 2018. Intelligent
computing in medical imaging: A study. In Advancements in Applied Metaheuristic
Computing (pp. 143–163). IGI Global, Hershey, Pennsylvania.
10. Negi, S. S., & Bhandari, Y. S. 2014, May. A hybrid approach to image enhancement
using contrast stretching on image sharpening and the analysis of various
cases arising using histogram. In Recent Advances and Innovations in Engineering
(ICRAIE), 2014 (pp. 1–6). IEEE.
11. Dey, N., Roy, A. B., Pal, M., & Das, A. 2012. FCM based blood vessel segmentation
method for retinal images. arXiv preprint arXiv:1209.1181.
12. Wegner, D., & Repasi, E. 2016, May. Image based performance analysis of thermal
imagers. In Infrared Imaging Systems: Design, Analysis, Modeling, and Testing
XXVII (Vol. 9820, p. 982016). International Society for Optics and Photonics.

3
Geometric Transformation Techniques
Geometric transformations allow the removal of geometric distortion that
happens when an image is captured. For example, if one wants to match images
of a similar location taken after one year when the later image was perhaps
not taken from exactly the same location. To assess changes throughout the
year, it is required initially to accomplish a geometric transformation, and
afterward, subtract one image from the other. Geometric transformations are
often required where the digitized image may be misaligned [1].
There are two basic steps in geometric transformations:
• Pixel coordinate transformation or spatial transformation
• Brightness interpolation.

3.1 Pixel Coordinate Transformation or Spatial Transformation
Pixel coordinate transformation or spatial transformation of an image is a
geometric transformation of the image coordinate system, that is, the mapping
of one coordinate system onto another. This is characterized by methods of
spatial transformation which are mapping functions that builds up a spatial
correspondences between every point in the input and output images. Each
point in the output adopts the value of its equivalent point in the input image
[2]. The correspondence is established via the spatial transformation mapping
function to assign the output point onto the input image. It is frequently
required to do a spatial transformation to (1) align images captured with
different types of sensors or at different times, (2) correct the image distortion
caused by the lens and camera orientations, and (3) image morphing or other
special effects and so on [3].
An input image comprises known coordinate reference points. The output
image consists of the distorted data. The general mapping function can either
relate the output coordinate system to that of the input, or vice versa. Let
G(x′, y′) denote the input or original image, and I(x, y) be the deformed (or
distorted) image. We can relate corresponding pixels in the two images by
Equation 3.1:
I  G.

(3.1)
25

26

A Beginner’s Guide to Image Preprocessing Techniques

FIGURE 3.1
T: Forward mapping; T−1: Inverse mapping.

Two types of mapping can be done here:
• Forward Mapping: Map pixels of input image onto output image,
which can be represented by Equation 3.2:
G( x′ , y ′) = I ( x , y ).

(3.2)

• Inverse Mapping: Map pixels of output image onto the input image,
which can be represented by Equation 3.3:
I ( x , y ) = G( x′ , y ′).

(3.3)

General mapping example is shown in Figure 3.1.
3.1.1 Simple Mapping Techniques
Translation: Translation means moving the image from one position to another
[4]. Let the translation amount in the x and y-direction be tx and ty respectively.
Translation can be defined by the Equation 3.4:
x ′ = x + tx
y ′ = y + ty
OR

(3.4)

 x′  x  tx 
  =   +  
 y ′  y  ty .
Translation of a geometric shape, as well as an image, is shown in
Figure 3.2.
Scaling: Scaling means stretching or contracting an image based on some
scaling factors [5,6,7]. Let, sx and sy be the scaling factor in the x and y-direction.
Scaling can be defined by Equation 3.5:

27

Geometric Transformation Techniques

(A)

(B)

(C)

(D)

FIGURE 3.2
(A) Original position of the rectangle, (B) Final position of the rectangle after translation,
(C) Original position of an image, and (D) Final position of an image after translation.

x ′ = x ⋅ sx
y ′ = y ⋅ sy
OR
 x′  sx
  = 
 y ′ 0

(3.5)
0  x 
 ⋅ 
sy   y ,


sx > 1 represents stretching, sx < 1 represents contracting or shrinking, and
sx = 1 means that the size will remain the same.
Scaling of a geometric shape, as well as an image, is shown in Figure 3.3.
Rotation: Rotation means [5,6,7] to change the orientation of an image by an
angle of θ, which is defined by Equation 3.6:
x′ = x ⋅ cos(θ) − y ⋅ sin(θ)
y ′ = x ⋅ sin(θ) + y ⋅ cos(θ)
OR
 x′ cos(θ)
  = 
 y ′  sin(θ)

(3.6)
− sin(θ)  x 
 ⋅ 
cos(θ)   y .

Rotation of a geometric shape as well as of an image is shown in Figure 3.4.

(A)

(B)

(C)

(D)

FIGURE 3.3
(A) The original size of the rectangle, (B) Modified size of the rectangle, (C) Original size of the
image, and (D) Modified size of the image.

28

A Beginner’s Guide to Image Preprocessing Techniques

(A)

(B)

(C)

(D)

FIGURE 3.4
(A) Original orientation of rectangle, (B) Modified orientation of rectangle, (C) Original
orientation of image, and (D) Modified orientation of image.

Shearing: Images can be sheared along horizontal and vertical direction
[8]. For horizontal shears, pixels are relocated horizontally by a distance
increasing linearly with the (vertical) distance from the horizontal line,
moving to the right above the line and to the left below the line for positive
angles. Likewise, for vertical shear, pixels are relocated vertically by a distance
that increases linearly with the (horizontal) distance from the vertical line,
moving downward to the right of the line, and upward to the left of the
line for positive angles. Let, Shx and Shy be the shear amount in the x and
y-direction. Shear can be represented by Equation 3.7:
x′ = x + Shx ⋅ y
y ′ = y + Shy ⋅ x
OR

(3.7)

 x′ 1 Shx   x 
  = 
 ⋅  
 y ′ Shy
1  y .
Shear of a geometric shape, as well as an image, is shown in Figure 3.5.

(A)

(B)

(C)

(D)

FIGURE 3.5
(A) Original rectangle, (B) Horizontal sheared rectangle, (C) Original image, and (D) Horizontal
sheared image.

29

Geometric Transformation Techniques

3.1.2 Affine Mapping
All possible simple mapping or transformations are special cases of affine
mapping [9,10]. The affine transformation is the combination of simple
transformations. Affine mapping is a linear mapping method, which
conserves straight lines, planes, and points. Sets of parallel lines stay parallel
after an affine transformation.
The overall affine transformation is normally written in homogeneous
coordinates, as shown in Equation 3.8:
 x′
  = P × x  + Q.
 y ′
 y 

(3.8)

By defining only the Q matrix, this transformation turns to a pure translation
transformation, as shown in Equation 3.9:
1
P = 
0

t x 
0
 , Q =   .
ty 
1

(3.9)

By defining only the P matrix, this transformation turns into a pure rotation
transformation (for positive or clockwise rotation), as shown in Equation 3.10:
cos(θ)
P = 
 sin(θ)

0
− sin(θ)
 , Q =   .
0
cos(θ) 

(3.10)

Similarly, pure scaling can be defined by Equation 3.11:
 sx
P = 
 0

0
0 
 , Q =   .
0
sy 

(3.11)

Since the general affine transformation is characterized by six constants, it
is conceivable to express this transformation by determining the new output
image locations (x′, y′) of any three input image coordinate (x, y) pairs. In
general, several points are estimated and a least squares technique is used to
find the finest fitting transform.
3.1.3 Nonlinear Mapping
Twirl: In case of twirl, rather than using image color at (x′, y′), use image
colors at twirled (x, y) position [11]. Rotate or turn the image by an angle
θ at the anchor point or center (xc, yc). Progressively, turn the image as the
spiral distance S from the center increases up to Smax. The image remains

30

A Beginner’s Guide to Image Preprocessing Techniques

FIGURE 3.6
Twirl effect of an image.

unaffected outside of the radial distance Smax. Twirl can be defined by
Equation 3.12:
Dx = x′ − xc , Dy = y ′ − yc
S = Dx2 + Dy2
 S − S 
α = arctan(Dy , Dx ) + θ ⋅  max

 Smax 
xc + r ⋅ cos(α) if S ≤ Smax
x = 
x′
if S > Smax
 yc + r ⋅ sin(α) if S ≤ Smax
y = 
 y ′
if S > Smax .

(3.12)

The twirl effect of an image is shown in Figure 3.6.
Ripple: Ripple effects are like wave patterns, which are introduced in the
image along both the x and y-directions [12]. Let the amplitude of the wave
pattern in the x and y-direction is defined as Ax and Ay, respectively, and
the frequency of the wave in the x and y-direction is defined as Fx and Fy,
respectively. So, this effect can be expressed by the sinusoidal function, as
shown in Equation 3.13:
 2π ⋅ y ′ 
x = x′ + Ax ⋅ sin 

 Fx 
 2π ⋅ x′ 
 .
y = y ′ + Ay ⋅ sin 
 Fy 

(3.13)

31

Geometric Transformation Techniques

(A)

(B)

FIGURE 3.7
(A) Original image, (B) Ripple effect.

The ripple effect of an image is shown in Figure 3.7.
Spherical Transformation: This transformation zooms in the center of the
image. Let the center of the lens be (xc, yc), Lmax the lens radius, and τ is the
refraction index [13]. The spherical transformation is defined by Equation 3.14:
Dx = x′ − xc , Dy = y ′ − yc
S = Dx2 + Dy2
Z = L2max + S2



1
Dx


αx = 1 −  ⋅ sin−1 

 Dx2 + Z 2 
τ




Dy
1

αy = 1 −  ⋅ sin−1 

2
2 

τ
 Dy + Z 
Z ⋅ tan(αx ) if S ≤ Lmax
x = x′ − 
0
if S > Lmax
Z ⋅ tan(αy ) if S ≤ Lmax
y = y ′ − 
0
if S > Lmax .

(3.14)

The spherical transformation effect of an image is shown in Figure 3.8.

3.2 Brightness Interpolation
New pixel coordinates were found after the geometric transformation has
been performed [14]. The location of the new coordinate point usually does

32

A Beginner’s Guide to Image Preprocessing Techniques

(A)

(B)

(C)

(D)

FIGURE 3.8
(A) Original graph, (B) Spherical effect of graph, (C) Original image, and (D) Spherical effect
of image.

not get fitted on the discrete raster output image. Integer grid values are
required. Every pixel value in the output raster image can be obtained by
brightness interpolation of some noninteger neighboring samples. The
brightness interpolation is generally done by defining the brightness of
the original pixel in the input image that resembles the pixel in the output
discrete raster image. Interpolation is used when we need to estimate the
value of an unknown pixel by using some known data.
3.2.1 Nearest Neighbor Interpolation
This is the simplest interpolation approach [15]. This technique basically
determines the nearest neighboring pixel value and adopts its intensity value,
as shown in Figure 3.9.
Consider the following example (Figure 3.10).
Figure 3.9 shows that the 2D input matrix is 3 × 3 and it is interpolated
to 6 × 6. First, we must find the ratio of the input and output matrix size, as
shown in Equation 3.15:
Rrow =

3
3
, Rcol = .
6
6

FIGURE 3.9
Black pixels: Original pixels, Red pixels: Interpolated pixels.

(3.15)

33

Geometric Transformation Techniques

FIGURE 3.10
Nearest neighbor interpolation output.

Then, based on the output matrix size the row-wise and column-wise pixel
positions are normalized.

Row position

Col position

1 
 
1 
 
 2
[1 2 3 4 5 6]
=
= [0.5 1 1.5 2 2.5 3] =  
Rrow
 2
 3
 
 
 3
[1 2 3 4 5 6]
=
= [0.5 1 1.5 2 2.5 3] = [1 1 2 2 3 3].
Rcol
(3.16)

After that, the row-wise interpolation is performed on all columns. The
output of the first column after interpolation is shown in Figure 3.11.
The row-wise interpolation output is shown below:
5

5

6

6

7


7

8
8
9
9
4
4

10

10
11
11
12

12

FIGURE 3.11
Row-wise interpolation of the first column of the input matrix.

34

A Beginner’s Guide to Image Preprocessing Techniques

FIGURE 3.12
Nearest neighbor interpolation of an image.

Similarly, the column-wise interpolation for all rows is shown below:
5

6

7


5
6
7

8
9
4

8
9
4

10
11
12

10 

11 
12

The final nearest neighbor interpolated output matrix is shown below:
5

5

6

6

7


 8

5
5
6
6
7
7

8
8
9
9
4
4

8
8
9
9
4
4

10
10
11
11
12
12

10

10
11
11
12

12

The nearest neighbor interpolation output of an image is shown in Figure 3.12.
The position error of the nearest neighborhood interpolation is at most half
a pixel. This error is perceptible on objects with straight-line boundaries,
which may appear step-like after the transformation.
In nearest neighbor interpolation, each nearby pixel has similar
characteristics, hence, it becomes easier to add or remove the pixels as per
requirement. The major drawback of this method is unwanted artifacts, like
the sharpening of edges that may appear in an image while resizing, hence,
it is generally not preferred.
3.2.2 Bilinear Interpolation
This type of interpolation searches four neighboring points of the interpolated
point (x, y), as shown in Figure 3.13, and assumes that the brightness function
is linear in this neighborhood [16].

Geometric Transformation Techniques

35

FIGURE 3.13
Black pixels: Original pixels, Red pixel: Interpolated pixel.

Consider a discrete image function. The black circles represents the
known pixels of the image I, and the red circle is lying outside the known
samples. This interpolation is not linear but the product of two linear
functions. If the interpolated point lies on one of the edges of the cell
[(I(p, q) → I(p + 1, q)), (I(p + 1, q) → I(p + 1, q + 1)), (I(p + 1, q + 1) → I(p,
q + 1)), (I(p, q + 1) → I(p, q))], the function becomes linear. Otherwise, the
bilinear interpolation function is quadratic. The interpolated value of
(x, y) is considered as a linear combination of four known sample values,
that is, I(p, q), I(p + 1,q), I(p + 1, q + 1), I(p, q + 1). The influence of each
samples depends on the proximity to the interpolated point in the linear
combination.
I ( x , y ) = (1 − tx) * (1 − ty ) * I ( p, q) + tx * (1 − ty ) * I ( p + 1, q)
+ ty * (1 − tx) * I ( p, q + 1) + tx * ty * I ( p + 1, q + 1),

(3.17)

where = x − p, ty = y − q.
A minor reduction in resolution and blurring can happen while using
bilinear interpolation due to its averaging nature. The problem of step-like
straight boundaries with the nearest neighborhood interpolation is reduced
when using bilinear interpolation. The main advantage of using bilinear
interpolation is that it is fast and simple to implement.
3.2.3 Bicubic Interpolation
Improves the model of the brightness function by using sixteen neighboring
points for interpolation [17]. This interpolation fits a series of cubic polynomials
to the brightness values contained in the 4 × 4 array of pixels surrounding the
calculated address. First, interpolation is done along the x-direction using the
16 grid samples (black), as shown in Figure 3.14. Then, interpolation is done
along the other dimension (blue line) by using the interpolated pixels from
the previous step.

36

A Beginner’s Guide to Image Preprocessing Techniques

(B)

(A)

(C)

FIGURE 3.14
(A) known pixel (black) and interpolated pixel (red), (B) x-direction interpolation, (C) y-direction
interpolation.

(A)

(B)

(C)

(D)

FIGURE 3.15
(A) Original image, (B) Nearest-neighbor interpolation, (C) Bilinear interpolation, and
(D) Bicubic interpolation.

Bicubic interpolation does not suffer from the step-like boundary problem
of nearest neighborhood interpolation and copes with linear interpolation
blurring as well. Bicubic interpolation is often used in raster displays
that enable zooming with respect to an arbitrary point—if the nearest
neighborhood method were used, areas of the same brightness would
increase. Bicubic interpolation preserves fine details in the image very well.
The comparison between nearest-neighbor, bilinear, and bicubic
interpolation is shown in Figure 3.15.

3.3 Summary
Geometric transformation is actually the rearrangement of pixels of the
image. Coordinates of the input image is transformed into the coordinates
of the output image using some transformation function. The output
pixel intensity of a specified pixel position may not depend on the pixel
intensity of that particular input pixel, but is dependent on the position as
specified in the transformation matrix. There are two types of geometric

Geometric Transformation Techniques

37

transformation: pixel coordinate transformation and brightness interpolation.
Pixel coordinate transformation, or spatial transformation, of an image
is a geometric transformation of the image coordinate system, that is, the
mapping of one coordinate system onto another. Mapping can be forward
(map pixels of an input image onto an output image) or backward (map pixels
of an output image onto an input image). This type of transformation involves
some linear mapping like translation, scaling, rotation, shearing, and affine
transformation. Nonlinear mapping involves twirl, ripple, and spherical
transformation. The brightness interpolation is generally done by defining
the brightness of the original pixel in the input image that resembles the pixel
in the output discrete raster image. Brightness interpolation involves nearest
neighbor interpolation, bilinear interpolation, and bicubic interpolation.

References
1. Candemir, S., Borovikov, E., Santosh, K. C., Antani, S., & Thoma, G. 2015. Rsilc:
Rotation-and scale-invariant, line-based color-aware descriptor. Image and Vision
Computing, 42, 1–12.
2. Chaki, J., Parekh, R., & Bhattacharya, S. 2015. Plant leaf recognition using texture
and shape features with neural classifiers. Pattern Recognition Letters, 58, 61–68.
3. Gonzalez, R. C., Woods, R. E. 2016. Digital Image Processing 3rd edition,
Prentice-Hall, New Jersey. ISBN-9789332570320, 9332570329.
4. Chaki, J., Parekh, R., & Bhattacharya, S. In press. Plant leaf classification using
multiple descriptors: A hierarchical approach. Journal of King Saud UniversityComputer and Information Sciences, doi:10.1016/j.jksuci.2018.01.007.
5. Li, J., Yu, C., Gupta, B. B., & Ren, X. 2018. Color image watermarking scheme based
on quaternion Hadamard transform and Schur decomposition. Multimedia Tools
and Applications, 77(4), 4545–4561.
6. Chakraborty, S., Chatterjee, S., Ashour, A. S., Mali, K., & Dey, N. 2018. Intelligent
computing in medical imaging: A study. In Advancements in Applied Metaheuristic
Computing (pp. 143–163). IGI Global.
7. Chaki, J., Parekh, R., & Bhattacharya, S. 2016, December. Recognition of plant
leaves with major fragmentation. In Computational Science and Engineering:
Proceedings of the International Conference on Computational Science and
Engineering (Beliaghata, Kolkata, India, October 4–6, 2016) (p. 111). CRC Press,
Boca Raton, FL.
8. Chaki, J., Parekh, R., & Bhattacharya, S. 2015, July. Recognition of whole and
deformed plant leaves using statistical shape features and neuro-fuzzy classifier.
In Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International
Conference on (pp. 189–194). IEEE, Kolkata, India.
9. Vučković, V., Arizanović, B., & Le Blond, S. 2018. Ultra-fast basic geometrical
transformations on linear image data structure. Expert Systems with Applications,
91, 322–346.

38

A Beginner’s Guide to Image Preprocessing Techniques

10. Santosh, K. C., Lamiroy, B., & Wendling, L. 2011, August. DTW for matching
radon features: A pattern recognition and retrieval method. In International
Conference on Advanced Concepts for Intelligent Vision Systems (pp. 249–260).
Springer, Berlin, Heidelberg.
11. Sonka, M., Hlavac, V., & Boyle, R. 2014. Image Processing, Analysis, and Machine
Vision. Cengage Learning, Stamford, USA.
12. Fu, K. S. 2018. Special Computer Architectures for Pattern Processing. CRC Press,
Boca Raton, FL.
13. Gilliam, C., & Blu, T. 2018. Local all-pass geometric deformations. IEEE
Transactions on Image Processing, 27(2), 1010–1025.
14. Chaki, J., Parekh, R., & Bhattacharya, S. 2016. Plant leaf recognition using ridge
filter and curvelet transform with neuro-fuzzy classifier. In Proceedings of 3rd
International Conference on Advanced Computing, Networking and Informatics (pp.
37–44). Springer, New Delhi.
15. Jiang, N., & Wang, L. 2015. Quantum image scaling using nearest neighbor
interpolation. Quantum Information Processing, 14(5), 1559–1571.
16. Wegner, D., & Repasi, E. 2016, May. Image based performance analysis of thermal
imagers. In Infrared Imaging Systems: Design, Analysis, Modeling, and Testing XXVII
(Vol. 9820, p. 982016). International Society for Optics and Photonics, Baltimore,
Maryland, United States.
17. Dong, C., Loy, C. C., He, K., & Tang, X. 2016. Image super-resolution using
deep convolutional networks. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 38(2), 295–307.

4
Filtering Techniques
Filtering is a method for enhancing or altering an image [1]. There are mainly
two types of filtering:
• Spatial Filtering
• Frequency Filtering

4.1 Spatial Filter
In spatial filtering, the processed pixel value for the existing pixel is
dependent on both itself and neighboring pixels [2]. Therefore, spatial
filtering is a neighboring procedure, where the value of any particular pixel
in the output image is calculated by applying some algorithm to the values
of the neighboring pixels of the corresponding input pixel [3]. A pixel’s
neighborhood is defined by a set of surrounding pixels relative to that pixel.
Some types of spatial filtering are discussed below.
4.1.1 Linear Filter (Convolution)
The result of the linear filtering [4] is the summation of products of the mask
coefficients with the equivalent pixels exactly beneath the mask, as shown in
Figure 4.1.
Linear filtering can be expressed by Equation 4.1:
I ( x , y ) = [ M(−1, 1) * I ( x − 1, y + 1)]
+ [ M(0, 1) * I ( x , y + 1)] + [ M(1, 1) * I ( x + 1, y + 1)]
+ [ M(−1, 0) * I ( x − 1, y )] + [ M(0, 0) * I ( x , y )]
+ [ M(1, 0) * I ( x + 1, y )] + [ M(−1, −1) * I ( x − 1, y − 1)]
+ [ M(0, −1) * I ( x , y − 1)] + [ M(1, −1) * I ( x + 1, y − 1)].

(4.1)

The mask coefficient M(0, 0) overlaps with image pixel value I(x, y),
representing that the mask center is at (x, y) when the calculation of the sum

39

40

A Beginner’s Guide to Image Preprocessing Techniques

(A)

(B)

FIGURE 4.1
(A) I: Image pixel positions and M: Mask Coefficients, (B) Mask of image pixels.

of products occurred. For a mask of size p × q, p and q are odd numbers
and represented as p = 2m + 1, q = 2n + 1, where m and n are nonnegative
integers. Linear filtering of an image I of size p × q, with a filter mask of size
p × q, is given by the Equation 4.2:
m

LF( x , y ) =

n

∑∑ M(a, b) ∗ I(x + a, y + b).

(4.2)

a=−m b=−n

4.1.2 Nonlinear Filter
Nonlinear spatial filtering also works on neighborhoods, as discussed in the
case of linear filtering [5]. The only difference is the nonlinear filtering is
based conditionally on the values of the neighboring pixels of a relative pixel.
4.1.3 Smoothing Filter
Smoothing filters are mainly used to reduce noise of an image and for blurring
[6,7]. Blurring is used to remove unimportant information from an image
prior to feature extraction, and is used to connect small breaks in curves or
lines. Blurring is also used to reduce noise from an image. A smoothing filter
is also useful for highlighting gross details. Two types of smoothing spatial
filters exist:
• Smoothing Linear Filters
• Order-Statistics Filters
A smoothing linear filter is basically the mean of the neighborhood pixels
of the filter mask. Therefore, this filter is sometimes called “mean filter” or
“averaging filter.” The concept entails substituting the value of every single
pixel in an image with the mean of the neighborhood pixels defined by the
filter mask. Figure 4.2 shows a 3 × 3 standard mean and weighted mean
smoothing linear filter:

41

Filtering Techniques

(A)

(B)

FIGURE 4.2
(A) Standard mean smoothing linear filter, (B) Weighted mean smoothing linear filter.

Filtering an I(m × n) image with a weighted averaging filter of size m × n
is given by Equation 4.3:

∑ ∑ M( a , b ) * I ( x + a , y + b ) .
SF( x , y ) =
∑ ∑ M( a , b )
m

a=−m

n

b=−n
m

n

a=−m

b=−n

(4.3)

The output of a smoothing linear filter is shown in Figure 4.3.
Order-statistics smoothing filters are basically nonlinear spatial filter­
[8­–10]. The response of this filter is constructed by ordering or ranking the
pixels enclosed in the image area covered by the filter. Then, the value of
the center pixel is replaced with the value calculated by the ordering or
ranking result. This type of filter is also known as “median filter.” The
median filter is used to reduce the salt and pepper type noise from an image
while preserving edges [11–13]. This filter works by moving a window of a
particular size over each and every pixel of the image, and replaces each
pixel value with the median of the neighboring pixel values. To calculate the
median, first the pixel values beneath the window are sorted into numerical
order and then the considered pixel value is replaced with the median pixel
value of the sorted list.
Consider the example shown in Figure 4.4. A 3 × 3 window is used in this
example.
Figure 4.5 shows the output of a median filter when applied to salt and
pepper noise image.

FIGURE 4.3
Smoothing linear filter output.

42

A Beginner’s Guide to Image Preprocessing Techniques

(1A)

(1B)

(2A)

(2B)

FIGURE 4.4
(1) Keeping the border value unchanged: (1A) Input Image values, (1B) Output after smoothing;
(2) Boundary values are also filtered by extending the border values: (2A) Input Image values,
(2B) Output after smoothing.

FIGURE 4.5
Median filter output.

4.1.4 Sharpening Filter
The primary goal of this filter is to enhance the fine detail in an image or to
highlight the blurred detail [14]. Sharpening can be performed by using spatial
derivatives, which can be applied in areas of flat regions or constant gray level
regions, at the step and end of discontinuities or ramp discontinuities, and
along gray-level discontinuities or ramps. These discontinuities can be lines,
noise points, and edges.
The first order partial spatial derivatives of a digital image I(x, y) can be
expressed by using Equation 4.4:
∂I
= I ( x + 1, y ) − I ( x , y ) and
∂x

∂I
= I ( x , y + 1) − I ( x , y ).
∂y

(4.4)

First order partial derivative must be (1) zero in flat regions, (2) nonzero at
the step and gray level ramp discontinuities, and (3) nonzero along ramps.

43

Filtering Techniques

FIGURE 4.6
Sharpen image output.

The second order partial spatial derivatives of a digital image I(x, y) can be
expressed by using Equation 4.5:
∂ 2I
= I ( x + 1, y ) + I ( x − 1, y ) − 2I ( x , y )
∂x 2
∂ 2I
= I ( x , y + 1) + I ( x , y − 1) − 2I ( x , y ).
∂y 2

(4.5)

Second order partial derivative must be: (1) zero in flat regions, (2) nonzero at
the step and gray level ramp discontinuities, (3) zero along ramps of constant
slope.
The first order derivative is nonzero along the entire discontinuity or ramp,
but the second order derivative is nonzero only at the step and gray level
ramp discontinuities. A first order derivative is used to make the edge thick,
and a second-order derivative is used to enhance or highlight fine details
such as thin edges and lines, including noise.
Figure 4.6 shows the result of a sharpening filter.

4.2 Frequency Filter
Frequency filters are used to process an image in the frequency domain [15].
The image is converted to frequency domain by using a Fourier transform
function. After frequency domain processing, the image is retransformed into
the spatial domain by inverse Fourier transform. Reducing high frequencies
in the spatial domain converts the image into a smoother one, while reducing
low frequencies highlights the edges of the image [16]. All frequency filters

44

A Beginner’s Guide to Image Preprocessing Techniques

can also be implemented in the spatial domain, and frequency filters are
computationally not costly to accomplish filtering in the spatial domain.
Frequency filtering is also more suitable if there is no direct kernel that can be
created in the spatial domain, in which case they may also be more effective.
All spatial domain images have an equivalent frequency representation.
The high frequency corresponds to pixel values that rapidly vary across the
image like leaves, text, texture, and so forth. Low frequency corresponds to
the homogeneous part of the image.
Frequency filtering is founded on the Fourier Transform. The operator
generally takes a filter function and an image in the Fourier domain. This
image is then multiplied in a pixel-by-pixel fashion with the filter function,
and can be expressed by Equation 4.6:
1
F( u , v ) =
PQ

P−1 Q−1

∑∑I(x, y)e

 ux vy 
− j 2π + 
 P Q 

.

(4.6)

x=0 y =0

Here I(x, y) is the input image of dimension P × Q in the Fourier domain and
F(u, v) is the filtered image [u = 0, … , P − 1 and v = 0, …, Q − 1]. To convert
the frequency domain image into the spatial domain, F(u, v) is retransformed
by using the inverse Fourier Transform, as shown in Equation 4.7:
I(x, y) =

1
PQ

P−1 Q−1

∑∑

F(u, v)e

 ux vy 
− j 2π + 
 P Q 

.

(4.7)

x=0 y =0

Since the multiplication in the Fourier space is identical to convolution in
the spatial domain, all frequency filters can be implemented theoretically
as a spatial filter. Different types of frequency filters are discussed in the
following subsections.
4.2.1 Low-Pass Filter
A low-pass filter is a filter that passes or allows low-frequency signals, and
suppresses signals with higher frequencies than the cutoff or threshold
frequency [17]. Based on the specific filter design, the actual amount of
suppression varies for each frequency. A low-pass filter is generally used
to smooth an image. The standard forms of low-pass filters are Ideal,
Butterworth, and Gaussian low-pass filters.
4.2.1.1 Ideal Low-Pass Filter (ILP)
This is the simplest low-pass filter that suppresses all high-frequency
components of the Fourier Transform that are greater than a specified

45

Filtering Techniques

(A)

(B)

FIGURE 4.7
(A) Filter displayed as an image, (B) Graphical representation of ideal low-pass filter.

cutoff frequency F0. This transfer function of the filter can be defined by
Equation 4.8:
1
P(u, v) = 
0

if F(u, v) ≤ F0
.
if F(u, v) > F0

(4.8)

The image and graphical representation of an ideal low-pass filter are shown
in Figure 4.7.
Because of the structure of the ILP mask, ringing occurs in the image when
an ILP filter is applied to an image. ILP filter yields a blurred image, as shown
in Figure 4.8.
4.2.1.2 Butterworth Low-Pass Filter (BLP)
This filter is used to eliminate high frequency noise with the least loss of
image data in the specified pass band with order d. The transfer function of
order d and with cutoff frequency F0 can be expressed by using Equation 4.9:
P(u, v) =

1
2d

1 + [ F(u, v)/F0 ]

FIGURE 4.8
ILP filter output with different values of F0.

.

(4.9)

46

A Beginner’s Guide to Image Preprocessing Techniques

(A)

(B)

FIGURE 4.9
(A) Filter displayed as an image, (B) Graphical representation of BLP filter.

The image and graphical representation of a BLP filter are shown in Figure 4.9.
The output of the BLP filter is shown in Figure 4.10.
4.2.1.3 Gaussian Low-Pass Filter (GLP)
The transfer function of a GLP filter is expressed in Equation 4.10:
P(u, v) = e−F

2

( u ,v )/2σ 2

.

(4.10)

Here, σ is the standard deviation and a measure of spread of the Gaussian
curve. If σ is replaced with the cutoff radius F0, then the transfer function of
GLP is expressed as in Equation 4.11:
P(u, v) = e

− F 2 ( u ,v )/2 F02

.

(4.11)

The image and graphical representation of a GLP filter is shown in
Figure 4.11.
The output of the GLP filter is shown in Figure 4.12.

FIGURE 4.10
Output of BLP filter with various cutoff radii.

47

Filtering Techniques

(A)

(B)

FIGURE 4.11
(A) Filter displayed as an image, (B) Graphical representation of GLP filter.

FIGURE 4.12
Output of GLP filter at different cutoff radius.

(2A)
(1A)

(2B)

(1B)

FIGURE 4.13
(1) Connecting text input-output: (1A) Input Image, (1B) Output of low-pass filter; (2) Blemishes
reduction input–output: (2A) Input Image, (2B) Output of low-pass filter.

A low-pass filter can be used to connect broken text as well as reduce
blemishes [18], as shown in Figure 4.13.
4.2.2 High Pass Filter
A high-pass filter suppresses frequencies lower than the cutoff frequency, but
allows or passes high frequencies well [19]. A high-pass filter is generally used

48

A Beginner’s Guide to Image Preprocessing Techniques

to sharpen an image and to highlight the edges and fine details associated
with the image. Different types of high-pass filters are Ideal, Butterworth,
and Gaussian high-pass filter. All high-pass filters (HPF) can be represented
by their relationship to the low-pass filters (LPF), as shown in Equation 4.12:
HPF = 1 − LPF.

(4.12)

4.2.2.1 Ideal High-Pass Filter (IHP)
The transfer function of an IHP filter can be expressed by Equation 4.13,
where F0 is the cutoff frequency or cutoff radius:
0
P(u, v) = 
1

if F(u, v) ≤ F0
.
if F(u, v) > F0

(4.13)

The image and graphical representation of an IHP filter is shown in Figure 4.14.
The output of the IHP filter is shown in Figure 4.15.

(A)

(B)

FIGURE 4.14
(A) Filter displayed as an image, (B) Graphical representation of IHP filter.

FIGURE 4.15
Output of IHP filter.

49

Filtering Techniques

(A)

(B)

FIGURE 4.16
(A) Image representation of BHP, (B) Graphical representation of BHP.

4.2.2.2 Butterworth High-Pass Filter (BHP)
The transfer function of BHP filter can be defined by Equation 4.14 where p is
the order and F0 is the cutoff frequency or cutoff radius:
P(u, v) =

1
2n

1 + [ F0 /F(u, v)]

.

(4.14)

The image and graphical representation of BHP filter is shown in Figure 4.16.
The output of the BHP filter is shown in Figure 4.17.
4.2.2.3 Gaussian High-Pass Filter (GHP)
The transfer function of GLP filter is expressed in Equation 4.15, with the
cutoff radius F0:
P(u, v) = 1 − e

− F 2 ( u ,v )/2 F02

.

(4.15)

The image and graphical representation of GLP filter is shown in Figure 4.18.
The output of the GHP filter is shown in Figure 4.19.

FIGURE 4.17
Output of BHP filter.

50

A Beginner’s Guide to Image Preprocessing Techniques

(A)

(B)

FIGURE 4.18
(A) Image representation of GHP, (B) Graphical representation of GHP.

FIGURE 4.19
Output of GHP filter.

4.2.3 Band Pass Filter
A band pass suppresses very high and very low frequencies, but preserves an
intermediate range band of frequencies [20]. Band pass filtering can be used to
highlight edges (attenuating low frequencies) while decreasing the noise amount
at the same time (suppressing high frequencies). To obtain the band pass filter
function, the low-pass filter function is multiplied with the high-pass filter
function in the frequency domain, where the cutoff frequency of the high pass
is lower than that of the low pass. So, in theory, a band pass filter function can be
developed if the low-pass filter function is available. The different types of band
pass filter are Ideal band pass, Butterworth band pass, and Gaussian band pass.
4.2.3.1 Ideal Band Pass Filter (IBP)
The IBP allows the frequency within the pass band and removes the very
high and very low frequency. An IBP filter within a frequency range FL, … ,
FH is defined by Equation 4.16:
1
P(u, v) = 
0

if FL ≤ F(u, v) ≤ FH
.
otherwise

(4.16)

Figure 4.20 shows the image and the effect of applying the IBP filter with
different pass bands.

51

Filtering Techniques

(A)

(B)

(C)

(D)

(E)

FIGURE 4.20
(A) Image of IBP filter, (B) Original image, (C) Output of IBP filter (FL = 30, FH = 100), (D) Output
of IBP filter (FL = 30, FH = 50), and (E) Output of IBP filter (FL = 10, FH = 90).

4.2.3.2 Butterworth Band Pass Filter (BBP)
This filter can be obtained by multiplying the transfer function of a low
and high Butterworth filter. If FL is the low cutoff frequency, FH is the high
cutoff frequency, and p is the order of the filter then the BBP filter can be
defined by Equation 4.17. The range of frequency is dependent on the order
of the filter:
BLP (u, v) =

1
2p

1 + [ F(u, v)/FL ]

BHP (u, v) = 1 −

1

(4.17)

2p

1 + [ F(u, v)/FH ]

BBP (u, v) = BLP (u, v) * BHP (u, v).
Figure 4.21 shows the image and the effect of applying the BBP filter with
different pass bands and order = 2.
4.2.3.3 Gaussian Band Pass Filter (GBP)
This filter can be obtained by multiplying the transfer function of a low and
high Gaussian filter. If FL is the low cutoff frequency, FH is the high cutoff
frequency, and p is the order of the filter then the GBP filter can be defined
by Equation 4.18:

(A)

(B)

(C)

(D)

(E)

FIGURE 4.21
(A) Image of BBP filter, (B) Original image, (C) Output of BBP filter (FL = 30, FH = 50), (D) Output
of BBP filter (FL = 30, FH = 150), and (E) Output of BBP filter (FL = 70, FH = 200).

52

A Beginner’s Guide to Image Preprocessing Techniques

(A)

(B)

(C)

(D)

(E)

FIGURE 4.22
(A) Image of GBP filter, (B) Original image, (C) Output of GBP filter (FL = 30, FH = 50), (D) Output
of GBP filter (FL = 10, FH = 90), and (E) Output of GBP filter (FL = 70, FH = 90).

GLP (u, v) = e

− F 2 ( u ,v )/2 F02
− F 2 ( u ,v )/2 F 2

0
GHP (u, v) = 1 − e
GBP = GLP (u, v) ∗ GHP (u, v) where FL > FH .

(4.18)

Figure 4.22 shows the image and the effect of applying the GBP filter with
different pass bands.
4.2.4 Band Reject Filter
Band-reject filter (also called band-stop filter) is just the opposite of the
bandpass filter [21]. It attenuates frequencies within a range of a higher and
lower cutoff frequency. Different types of band reject filters are Ideal band
reject, Butterworth band reject, and Gaussian band reject.
4.2.4.1 Ideal Band Reject Filter (IBR)
In this filter, the frequencies within the pass band are attenuated and the
frequencies outside of the given range are passed without attenuation.
Equation 4.19 defines an IBR filter with a frequency cutoff F0, which is the
center of the frequency band, and where W is the width of the frequency band:

0
P(u, v) = 

1

W
W
≤ F(u, v) ≤ F0 +
2
2.
otherwise
if F0 −

(4.19)

4.2.4.2 Butterworth Band Reject Filter (BBR)
In a BBR filter, frequencies at the center of the band are completely blocked.
Frequencies at the edge of the frequency band are suppressed by a fraction
of maximum value. If F0 is the center of the frequency, W is the width of the

53

Filtering Techniques

frequency band, and p is the order of the filter, then a BBR filter can be defined
by Equation 4.20:
P(u, v) =

1
1 +  F(u, v)W /( F(u, v)2 − F02 )

2p

.

(4.20)

4.2.4.3 Gaussian Band Reject Filter (GBR)
Here, the transition between the filtered and unfiltered frequency is very
smooth. If F0 is the center of the frequency and W is the width of the frequency
band, then GBR filter can be defined by Equation 4.21:
P(u, v) = e

−[( F ( u ,v )2 −F02 /F ( u ,v )W )]2

.

(4.21)

4.3 Summary
Filtering is generally used to enhance the image detail. Several types of filters
are discussed in this chapter. Mainly there are two types of filter: spatial
and frequency. Spatial filtering is used to process the pixel value for the
existing pixel, which is dependent on both itself and neighboring pixels.
There are several types of spatial filter like linear, nonlinear, smoothing, and
sharpening. Smoothing filter is used to blur the image and sharpening filter
is used to highlight the blurred detail. Frequency filters are used to process
the image in frequency domain. Different types of frequency filters are lowpass filters, which are used to blur the image, or high pass filters, which are
used to highlight edges and sharpening. Band pass filtering can be used to
highlight edges (attenuating low frequencies) and decrease the noise amount
at the same time (suppressing high frequencies), while band reject filters are
the opposite of band pass filters.

References
1. Araki, T., Ikeda, N., Dey, N., Acharjee, S., Molinari, F., Saba, L., Godia, E.,
Nicolaides, A., & Suri, J. S. 2015. Shape-based approach for coronary calcium
lesion volume measurement on intravascular ultrasound imaging and its
association with carotid intima-media thickness. Journal of Ultrasound in
Medicine, 34(3), 469–482.

54

A Beginner’s Guide to Image Preprocessing Techniques

2. Gonzalez, R. C., Woods, R. E. 2016. Digital Image Processing 3rd edition,
Prentice-Hall, New Jersey. ISBN-9789332570320, 9332570329.
3. Chaki, J., Parekh, R., & Bhattacharya, S. 2016. Plant leaf recognition using ridge
filter and curvelet transform with neuro-fuzzy classifier. In Proceedings of 3rd
International Conference on Advanced Computing, Networking and Informatics (pp.
37–44). Springer, New Delhi.
4. Santosh, K. C., Candemir, S., Jaeger, S., Karargyris, A., Antani, S., Thoma, G. R., &
Folio, L. 2015. Automatically detecting rotation in chest radiographs using
principal rib-orientation measure for quality control. International Journal of
Pattern Recognition and Artificial Intelligence, 29(02), 1557001.
5. Ashour, A. S., Beagum, S., Dey, N., Ashour, A. S., Pistolla, D. S., Nguyen, G. N.,
et al. 2018. Light microscopy image de-noising using optimized LPA-ICI filter.
Neural Computing and Applications, 29(12), 1517–1533.
6. Hangarge, M., Santosh, K. C., Doddamani, S., & Pardeshi, R. 2013. Statistical
texture features based handwritten and printed text classification in south
indian documents. arXiv preprint arXiv:1303.3087.
7. Dey, N., Ashour, A. S., Beagum, S., Pistola, D. S., Gospodinov, M., Gospodinova,
Е. P., & Tavares, J. M. R. 2015. Parameter optimization for local polynomial
approximation based intersection confidence interval filter using genetic
algorithm: An application for brain MRI image de-noising. Journal of Imaging,
1(1), 60–84.
8. Santosh, K. C., & Mukherjee, A. 2016, April. On the temporal dynamics of
opinion spamming: Case studies on yelp. In Proceedings of the 25th International
Conference on World Wide Web (pp. 369–379). International World Wide Web
Conferences Steering Committee, Montréal, Québec, Canada.
9. Garg, A., & Khandelwal, V. 2018. Combination of spatial domain filters for
speckle noise reduction in ultrasound medical images. Advances in Electrical
and Electronic Engineering, 15(5), 857–865.
10. Nandi, D., Ashour, A. S., Samanta, S., Chakraborty, S., Salem, M. A., & Dey, N.
2015. Principal component analysis in medical image processing: A study.
International Journal of Image Mining, 1(1), 65–86.
11. Kotyk, T., Ashour, A. S., Chakraborty, S., Dey, N., & Balas, V. E. 2015. Apoptosis
analysis in classification paradigm: A neural network based approach. In
Healthy World Conference—A Healthy World for a Happy Life (pp. 17–22). Kakinada
(AP), India.
12. Santosh, K. C. 2010. Use of dynamic time warping for object shape classification
through signature. Kathmandu University Journal of Science, Engineering and
Technology, 6(1), 33–49.
13. Dhanachandra, N., Manglem, K., & Chanu, Y. J. 2015. Image segmentation using
K-means clustering algorithm and subtractive clustering algorithm. Procedia
Computer Science, 54(2015), 764–771.
14. Chakraborty, S., Chatterjee, S., Ashour, A. S., Mali, K., & Dey, N. 2018. Intelligent
computing in medical imaging: A study. In Advancements in Applied Metaheuristic
Computing (pp. 143–163). IGI Global, doi:10.4018/978-1-5225-4151-6.ch006.
15. Pardeshi, R., Chaudhuri, B. B., Hangarge, M., & Santosh, K. C. 2014, September.
Automatic handwritten Indian scripts identification. In 14th International
Conference on Frontiers in Handwriting Recognition (ICFHR), 2014 (pp. 375–380). IEEE.
16. Santosh, K. C., & Nattee, C. 2007. Template-based nepali natural handwritten
alphanumeric character recognition. Science & Technology Asia, 12(1), 20–30.

Filtering Techniques

55

17. Najarian, K., & Splinter, R. 2016. Biomedical signal and image processing. CRC Press,
Boca Raton, FL.
18. Low-pass filter example [https://www.slideshare.net/SuhailaAfzana/imagesmoothing-using-frequency-domain-filters (Last access date: June 10, 2018)]
19. Makandar, A., & Halalli, B. 2015. Image enhancement techniques using highpass
and lowpass filters. International Journal of Computer Applications, 109(14).
20. Semmlow, J. L., & Griffel, B. 2014. Biosignal and medical image processing. CRC
Press, Boca Raton, FL.
21. Konstantinides, K., & Rasure, J. R. 1994. The Khoros software development
environment for image and signal processing. IEEE Transactions on Image
Processing, 3(3), 243–252.

5
Segmentation Techniques
Image segmentation is the procedure of separating an image into several
parts [1–3]. This is normally used to find objects or other significant
information in digital images. There are various techniques to accomplish
image segmentation discussed here.

5.1 Thresholding
Thresholding is a procedure of transforming an input grayscale image
into a binarized image, or image with a new range of gray level, by using a
particular threshold value [4,5]. The goal of thresholding is to extract some
pixels from the image while removing others. The purpose of thresholding is
to mark pixels that belong to foreground pixels with the same intensity and
background pixels with different intensities.
Threshold is not only related to the image processing field. Rather threshold
has the same meaning in any arena. A threshold is basically a value having
two set of regions on its either side, that is, above the threshold or below the
threshold. Any function can have a threshold value [6]. The function has
different expressions for below the threshold value and for above the threshold
value. For an image, if the pixel value of the original image is less than or
below a particular threshold value it will follow a specific transformation or
conversion function, if not, it will follow another. Threshold can be global or
local. Global threshold means the threshold is selected from the whole image.
Local or adaptive threshold is used when the image has uneven illumination,
which makes it difficult to segment using a single threshold. In that case, the
original image is divided into subimages, and for each subimage a particular
threshold is used for segmentation [7]. Figure 5.1 shows the segmentation
output with local and global threshold.
5.1.1 Histogram Shape-Based Thresholding
The histogram method presumes that there is some average value for the
foreground or object pixels and background, but the reality is that the real
pixel values have some deviation around these average values [8,9]. In that case,
selecting an accurate image threshold value is difficult and computationally
57

58

A Beginner’s Guide to Image Preprocessing Techniques

(A)

(B)

(C)

(D)

FIGURE 5.1
(A) Input image with uneven illumination, (B) and (C) Global thresholding result, (D) Local
thresholding result.

expensive. One comparatively simple technique is the iterative method to find
a specific image threshold, which is also robust against noise. The steps of the
iterative method is as follows:
Step 1: An initial threshold (T) is selected arbitrarily by any other
desired method.
Step 2: The image I(x, y) is segmented into foreground or object pixels
and background pixels:
Object pixels (OP) ← {I ( x , y ) : I ( x , y ) ≥ T }
Background pixels (BP) ← {I(( x , y ) : I ( x , y ) < T }.

(5.1)

Step 3: The average of each pixel set is calculated.
AOP ← Average of OP
ABP ← Average of BP
Step 4: A new threshold is formed, which is the average of AOP and ABP:
Tnew ←

( AOP + ABP )
.
2

(5.2)

Step 5: In step 2, use the new threshold obtained in step 4. Repeat till
the new threshold matches the one before it.
Assume that the gray level image I(x, y) is composed of a light object in a
dark background, in such a way that background and object, or foreground
gray level pixels, can be grouped into two dominant modes. One clear way to
extract the object pixels from the background is to select a threshold T, which
divides these two modes. Then any pixel (x, y) where I(x, y) ≥ T is called an
object pixel, otherwise, the pixel is called a background pixel. Example:
If two dominant modes describe the image histogram, it is called a
bimodal histogram. Here, only one threshold is sufficient for segmenting or
partitioning the image. Figure 5.2 shows the bimodal histogram of an image
and the segmented image.

59

Segmentation Techniques

FIGURE 5.2
The bimodal histogram and the segmented image.

If for instance, an image is composed of two or more types of dark objects in
a light background, three or more dominant modes are used to characterize
the image histogram, which is denoted as a multimodal histogram. Figure 5.3
shows the multimodal histogram of an image and the segmented image.
5.1.2 Clustering-Based Thresholding
K-means Thresholding Method: The steps of K-means algorithm for selecting the
threshold is as follows [10]:
Step 1: Class centers (K) are initialized:
 ( j − ( j / 2))(Gmax − Gmin ) 
,
C j0 = Gmin + 


k

(5.3)

where j = 1,2,…,k; Cj0 is the first class center of jth class; Gmin and Gmax
are the minimum and maximum gray value of the sample space.
Step 2: Assign every point of the sample space to its nearest class center
based on Euclidean Distance:
Dj , i = abs(Gi − C j ),

(5.4)

where j = 1,2,…,k; i = 1,2,…,P; Dj,i is the distance from an ith point to
the jth class, and P is the total number of points in the sample space.
Step 3: Compute the (K) new class centers from the average of the points
that are assigned to it:
Cnew =

1
∑ Gj ,
Pi

FIGURE 5.3
The multimodal histogram and the segmented image.

(5.5)

60

A Beginner’s Guide to Image Preprocessing Techniques

FIGURE 5.4
The output of image segmentation with different k values.

where j = 1,2,…,k and Pi is the total number of points that are assigned
to the ith class in step 2.
Step 4: Repeat step 2 for change in the class center; otherwise stop the
iteration.
Step 5: The threshold is calculated by the mean of the kth class center
and (k − 1) class center:
T=

1
(Ck + Ck −1 ).
2

(5.6)

The result of the image segmentation is shown in Figure 5.4.
Otsu-Clustering Thresholding Method: This method is used to select a threshold
value by minimizing the within class variances of two clusters [11,12]. The
within-class variance can be expressed by Equation 5.7:
σw2 (T ) = Pb (T )σb2 (T ) + Pf (T )σ 2f (T ),

(5.7)

where Pf and Pb are the probability of foreground and background class
occurrences; T is the initial threshold value, which is randomly selected
by some algorithm, and σ 2f and σb2 are the variances of foreground and
background clusters.
The probability of foreground and background class occurrences can be
denoted by Equation 5.8:
T

Pb (T ) =

∑ p(G)
G=0

L−1

Pf (T ) =

∑ p(G),

(5.8)

G=T +1

where G is the gray level values {0,1,…,L − 1} and p(G) is the probability mass
function of G.

61

Segmentation Techniques

The variances of foreground and background clusters are defined by
Equation 5.9:
T

σb2 (T ) =

∑(G − M (T ))
b

2

G=0

p(G)
Pb (T )

L−1

p(G)
σ (T ) =
,
(G − M f (T ))
Pf (T )
G=T +1
2
f

∑

(5.9)

2

where Mb and Mf are the means of background and foreground clusters
respectively and can be defined by Equation 5.10:
T

M f (T ) =

∑G× p(G)
G=0

L−1

Mb (T ) =

∑ G× p(G).

(5.10)

G=T +1

A lot of computations are involved in computing the within class variance
for each of the two classes for every possible threshold. Thus, the betweenclass variance is computed by subtracting the within class variance from the
total variance:
2
2
σ between
(T ) = σtotal
(T ) − σw2 (T )

= Pb (T )[µb (T ) − Mtotal ]2 + Pf (T )[µ f (T ) − Mtotal ]2 .

(5.11)

2
and Mtotal can be expressed by the Equation 5.12:
σtotal
L−1

2
σtotal
=

∑(G − M

total

)2 p(G)

G=0
L−1

Mtotal =

∑G× p(G).

(5.12)

G=0

The main advantage of this method is its simple computation.
Figure 5.5 shows the segmented output using a different number of clusters.

FIGURE 5.5
Segmented output using the Otsu clustering thresholding method.

62

A Beginner’s Guide to Image Preprocessing Techniques

5.1.3 Entropy-Based Thresholding
This method is created based on the probability distribution function of the
gray level histogram [13,14]. Two entropies can be calculated: one for black
pixels and the other for white pixels:
255

∑ g (i ) = 1
i=0

t

∑

Eb (t) = −

i=0

g (i )

∑

t

∑

255

255

∑

Ew (t) = −

i =t +1

j=0

g( j)

* log

g (i )
g( j)

g (i )

∑

* log

j =t +1

t
j=0

∑

(5.13)

g( j)

g (i )
255

g( j)

,

j =t +1

where g(i) is the normalized histogram.
The optimal single threshold value is selected by maximizing the entropy
of black and white pixels, and can be depicted by Equation 5.14:
T = Arg max Eb (t) + Ew (t).

(5.14)

t = 0…255

p optimal threshold values can be found by Equation 5.15:
{T1 , … , Tp } = Arg max E(−1, t1 ) + E(t1 , t2 ) +  + E(tp , 255),
t1 <  ∀1 ≤ i ≤ 3.

fi (x , y ) = 
2 

otherwisse
 Oi ( x , y ),

FIGURE 7.10
Cube of width W.

(7.7)

90

A Beginner’s Guide to Image Preprocessing Techniques

FIGURE 7.11
Output of color slicing.

This means all the colors outside the cube of width W will be represented by
some insignificant color, but inside the cube the original colors are retained.
Figure 7.11 shows the output of the color slicing where only the red shades
are kept.
The next type of preprocessing transformation of the color image is tone
correction [11]. This is again analogous with the intensity enhancement, or
contrast enhancement, of the grayscale image. A color image may have a
flat tone, light tone, or dark tone. These tones represent the distribution of
different color intensities within the color image or RGB image. The form of
transformation function used to correct the tone of flat, light, and bark-toned
image is shown in Figure 7.12.
In case of light tone image wide range of intensities in the input image is
mapped to narrow range intensities in the output image so that the output
image become dark. In case of dark tone image narrow range of intensities in
the input image is mapped to wide range intensities in the output image so
that the output image become light.
Other types of color image preprocessing involve histogram equalization,
segmentation of color images, and so on Figure 7.13 through 7.16.

7.2 Image Preprocessing for Neural
Networks and Deep Learning
Deep learning has really become a main research area in the past few
years [12]. Deep learning uses neural networks, which need a large number
of training data and are comprised of many of hidden layers. These models
are used in speech, vision, image, video, language processing, and so forth.
For an image, providing the image pixel values directly into a neural network
may cause numerical overflows [13,14]. Also, some objective and activation

91

Other Applications of Image Preprocessing

FIGURE 7.12
Tone correction.

(A)
FIGURE 7.13
(A) Original image, (B) Image output after histogram equalization.

(B)

92

A Beginner’s Guide to Image Preprocessing Techniques

(A)

(B)

FIGURE 7.14
(A) Original image, (B) Segmented output using 5 bins.

(A)

(B)

FIGURE 7.15
(A) Original Image, (B) Image output after filtering or masking. Here only the red shades are
kept in color and rest of the image is desaturated.

(A)
FIGURE 7.16
(A) Original image, (B) Image Otsu thresholding output.

(B)

Other Applications of Image Preprocessing

93

FIGURE 7.17
Cropping of image data.

functions are not compatible with all kinds of input. The wrong arrangement
produces a poor result during the learning phase of a neural network [15–17].
To construct an efficient neural network model, cautious attention is required
to build the network architecture as well as the input data format. The most
common image data input factors are the number of images, image width,
image height, number of levels per pixel, and number of channels. For an RGB
image, there are three channels of data representing the colors (pixel intensity
values) in Red, Green, and Blue channels, which range between 0 and 255 [18,19].
A number of preprocessing steps are needed prior to using this in any
Deep Learning project. Some of the most common preprocessing steps are
discussed below.
Unvarying Aspect Ratio: Most of the neural networks presume that the input
image is square in shape. So, it is essential to check every image to ensure
it is square or not [20], and cropped properly, as shown in Figure 7.17. While
cropping, usually the center part is kept.
Scaling of Images: After making all images square in shape, scaling of each
image is properly done [21,22]. For example, suppose the image is of size
250 × 250 pixels and we have to obtain an image with a height and width
of 100 pixels. Therefore, the height and width of each image are scaled by a
factor of 0.4 (100/250). The same applies for up-scaling.
Normalization of Image Inputs: Image data normalization [23,24] is a vital
step, which confirms a similar data distribution for every input data. This
helps to converge the network faster while training it. In image processing,
normalization helps to change the pixel intensity range. There are three types
of normalization: data rescaling, data standardization, and data stretching.
Data rescaling is further divided into linear and nonlinear rescaling.
The linear data scaling can be represented by Equation 7.8:
I Norm = (I − I Min )

I NewMax − I NewMin
+ I NewMin ,
I Max − I Min

(7.8)

where IMax and IMin are the maximum and minimum intensities of the original
image, and INewMax and INewMin are the maximum and minimum intensities
of the normalized image. For example, suppose the image has the intensity

94

A Beginner’s Guide to Image Preprocessing Techniques

range 30–120 and the desired range is 0–255. First, 30 is subtracted from every
pixel intensity. Then each pixel intensity is multiplied with 255/90, making
the range between 0 and 255.
The nonlinear data scaling is represented by Equation 7.9, which follows a
sigmoid function:
I Norm = (I NewMax − I NewMin )

1
+ I NewMin ,
1 + e−((1−β )/α )

(7.9)

where β denotes the intensity around which the range is centered, and α
denotes the width of the input intensity.
Data standardization is the second way to normalize image data, where the
average of the data is subtracted from the image and divided by its standard
deviation. The spreading of such data looks like a Gaussian curve with
mean = 0, and a standard deviation (std) = 1. Data standardization can be
represented by Equation 7.10:
I Norm =

I − I Mean
.
I Std

(7.10)

Data stretching is the third way to normalize image data, where the data
are braced to a maximum and minimum value, and can be represented by
using Equation 7.11:
I Norm [I < c] = c
I Norm [I > d] = d.

(7.11)

Here, image data values greater than d are set to d, and the same occurs
inversely with c.
Reduction in Dimension: Sometimes the three channels of an RGB image [25]
are collapsed into a single grayscale channel. Reduction in the dimension
of image data is often needed when the neural network performance is
permitted to be dimension-invariant.
Augmentation of Image Data: The next preprocessing technique [26] includes
augmenting the image data with disturbed versions of the present images.
Rotation, scaling, and other affine transformations are usually used to
augment image data. This prevents the neural network from recognizing
unwanted characteristics present in the disturbed version of image data.

7.3 Summary
The need of preprocessing of color images in the field of Deep Learning is
discussed in this chapter. Color image processing includes pseudo color and
full color or true color processing. The purpose of pseudo color processing is

Other Applications of Image Preprocessing

95

to color a grayscale image by assigning different colors to different intensity
ranges of a gray level image. In the case of an RGB image, colors are added
to the R, G, and B channels separately, and the combination of R, G, and
B channels allows for the interpretation of a pseudo color image. Through
pseudo color images, we can visualize different intensities of the image region
with a different color, which would be almost flat in the grayscale image.
Thus, using the pseudo color image, intensities of the image are much more
interpretable or distinguishable than for a grayscale image. In the full-color
image, the actual color of the image is considered. In such types of images, the
colors can be specified by using different color models like RGB (Red-GreenBlue), HSI (Hue-Saturation-Intensity), CMY (Cyan-Magenta-Yellow), and so
on. Different preprocessing transformation operations can be performed on
these color models such as intensity modification, color complement, color
slicing, tone correction, histogram equalization, segmentation of the color
image, and so forth.

References
1. Ghosh, A., Sarkar, A., Ashour, A. S., Balas-Timar, D., Dey, N., & Balas, V. E. 2015.
Grid color moment features in glaucoma classification. Int J Adv Comput Sci Appl,
6(9), 1–14.
2. Dey, N., Ashour, A. S., Chakraborty, S., Samanta, S., Sifaki-Pistolla, D., Ashour,
A. S., & Nguyen, G. N. 2016. Healthy and unhealthy rat hippocampus cells
classification: A neural based automated system for Alzheimer disease
classification. Journal of Advanced Microscopy Research, 11(1), 1–10.
3. Chaki, J., Parekh, R., & Bhattacharya, S. 2017, March. An efficient fragmented
plant leaf classification using color edge directivity descriptor. In International
Conference on Computational Intelligence, Communications, and Business Analytics
(pp. 197–211). Springer, Singapore.
4. Li, Z., Shi, K., Dey, N., Ashour, A. S., Wang, D., Balas, V. E., … & Shi, F. 2017.
Rule-based back propagation neural networks for various precision rough set
presented KANSEI knowledge prediction: A case study on shoe product form
features extraction. Neural Computing and Applications, 28(3), 613–630.
5. Bhattacharya, T., Dey, N., & Chaudhuri, S. R. 2012. A session based multiple
image hiding technique using DWT and DCT. arXiv preprint arXiv:1208.0950.
6. Candemir, S., Borovikov, E., Santosh, K. C., Antani, S., & Thoma, G. 2015. Rsilc:
Rotation-and scale-invariant, line-based color-aware descriptor. Image and Vision
Computing, 42, 1–12.
7. Chaki, J., & Parekh, R. 2011. Plant leaf recognition using shape based features
and neural network classifiers. International Journal of Advanced Computer Science
and Applications, 2(10) 41–47.
8. Benavent, X., Dura, E., Vegara, F., & Domingo, J. 2012. Mathematical morphology
for color images: An image-dependent approach. Mathematical Problems in
Engineering, 2012(678326) 1–18.

96

A Beginner’s Guide to Image Preprocessing Techniques

9. Sonka, M., Hlavac, V., & Boyle, R. 2014. Image Processing, Analysis, and Machine
Vision. Cengage Learning, Stamford, USA.
10. Fu, K. S. 2018. Special Computer Architectures for Pattern Processing. CRC Press,
Boca Raton, FL.
11. Kotyk, T., Ashour, A. S., Chakraborty, S., Dey, N., & Balas, V. E. 2015. Apoptosis
analysis in classification paradigm: A neural network based approach. In Healthy
World Conference—A Healthy World for a Happy Life (pp. 17–22). Kakinada (AP),
India.
12. Dong, C., Loy, C. C., He, K., & Tang, X. 2016. Image super-resolution using
deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 38(2), 295–307.
13. Nimmy, S. F., Sarowar, M. G., Dey, N., Ashour, A. S., & Santosh, K. C. 2018.
Investigation of DNA discontinuity for detecting tuberculosis. Journal of Ambient
Intelligence and Humanized Computing, 1–15.
14. Chaki, J., Parekh, R., & Bhattacharya, S. 2015. Plant leaf recognition using texture
and shape features with neural classifiers. Pattern Recognition Letters, 58, 61–68.
15. Li, Z., Dey, N., Ashour, A. S., Cao, L., Wang, Y., Wang, D., & Shi, F. 2017.
Convolutional neural network based clustering and manifold learning method
for diabetic plantar pressure imaging dataset. Journal of Medical Imaging and
Health Informatics, 7(3), 639–652.
16. Chaki, J., Parekh, R., & Bhattacharya, S. In press. Plant leaf classification using
multiple descriptors: A hierarchical approach. Journal of King Saud UniversityComputer and Information Sciences, doi:10.1016/j.jksuci.2018.01.007.
17. Halder, C., Obaidullah, S. M., Santosh, K. C., & Roy, K. 2018. Content independent
writer identification on Bangla script: A document level approach. International
Journal of Pattern Recognition and Artificial Intelligence, 32(9), 1856011.
18. Chatterjee, S., Sarkar, S., Hore, S., Dey, N., Ashour, A. S., Shi, F., & Le, D. N. 2017.
Structural failure classification for reinforced concrete buildings using trained
neural network based multi-objective genetic algorithm. Structural Engineering
and Mechanics, 63(4), 429–438.
19. Chaki, J., Parekh, R., & Bhattacharya, S. 2016, January. Plant leaf recognition
using a layered approach. In Microelectronics, Computing and Communications
(MicroCom), 2016 International Conference on (pp. 1–6). IEEE.
20. Chatterjee, S., Hore, S., Dey, N., Chakraborty, S., & Ashour, A. S. 2017. Dengue
fever classification using gene expression data: A PSO based artificial neural
network approach. In Proceedings of the 5th International Conference on Frontiers
in Intelligent Computing: Theory and Applications (pp. 331–341). Springer,
Singapore.
21. Chaki, J., Parekh, R., & Bhattacharya, S. 2015, July. Recognition of whole and
deformed plant leaves using statistical shape features and neuro-fuzzy classifier.
In Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International
Conference on (pp. 189–194). IEEE.
22. Samanta, S., Ahmed, S. S., Salem, M. A. M. M., Nath, S. S., Dey, N., & Chowdhury,
S. S. 2015. Haralick features based automated glaucoma classification using back
propagation neural network. In Proceedings of the 3rd International Conference on
Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014 (pp. 351–
358). Springer, Cham.
23. Santosh, K. C., & Nattee, C. 2007. Template-based nepali natural handwritten
alphanumeric character recognition. Science & Technology Asia, 12(1), 20–30.

Other Applications of Image Preprocessing

97

24. Hore, S., Chatterjee, S., Sarkar, S., Dey, N., Ashour, A. S., Balas-Timar, D., &
Balas, V. E. 2016. Neural-based prediction of structural failure of multistoried
RC buildings. Structural Engineering and Mechanics, 58(3), 459–473.
25. Maji, P., Chatterjee, S., Chakraborty, S., Kausar, N., Samanta, S., & Dey, N.
2015, March. Effect of Euler number as a feature in gender recognition system
from offline handwritten signature using neural networks. In Computing for
Sustainable Global Development (INDIACom), 2015 2nd International Conference on
(pp. 1869–1873). IEEE.
26. Bhattacherjee, A., Roy, S., Paul, S., Roy, P., Kausar, N., & Dey, N. 2016. Classification
approach for breast cancer detection using back propagation neural network:
A study. In Biomedical Image Analysis and Mining Techniques for Improved Health
Outcomes (pp. 210–221). IGI Global, Hershey, Pennsylvania.

Index
B
Binarized Image, 57
Binary Morphology, 73
closing, 76
dilation, 75
erosion, 73
hit and miss, 77
opening, 76
thickening, 78
thinning, 77
Brightness Interpolation, 31
bicubic, 35
bilinear, 34
nearest neighbor, 32
C
Clustering, 59
k-means, 59
otsu, 60
CMY, 85
Color Image, 83
complement, 87
pseudo color, 83
slicing, 88
tone correction, 90
true color, 85
Compression, 7
lossless, 8
lossy, 8
Contrast Stretching, 19

laplacian of gaussian (LoG) edge
detector, 67
marr-hildreth edge detection, 68
prewitt edge detector, 64
roberts edge detector, 63
robinson edge detector, 65
sobel edge detector, 64
F
Filter, 39
frequency, 43
band pass, 50
band reject, 52
high pass, 47
low pass, 44
spatial, 39
linear, 39
non-linear, 40
sharpening, 42
smoothing, 40
G
Gamma Correction, 19
Grayscale Morphology, 78
closing, 80
dilation, 79
erosion, 79
opening, 79
H

D
Deep Learning, 90
data augmentation, 94
data normalization, 93

Histogram Equalization, 20
Histogram Matching, 22
HSI, 85
I

E
Edge Detection, 63
canny edge detector, 66
kirsch edge detector, 64

Image Correction, 2
Image Enhancement, 4
Image Restoration, 6
Intensity Modification, 86

99

100

Index

L

T

Line or Column Dropout Error, 2
Line or Column Striping, 3
Line Start/Stop Problem, 2

Thresholding, 57
clustering based, 59
entropy-based, 62
histogram shape-based, 57
Transformation, 13, 25
affine, 29
grayscale, 14
linear, 14
logarithmic, 17
power – law, 19
red, green, blue, 84
ripple, 30
rotation, 27
scaling, 26
shearing, 28
spatial, 25
spherical, 31
translation, 26
twirl, 29

M
Mapping, 26
forward, 26
inverse, 26
P
Pixel Brightness, 13
Pixel Coordinate Transformation, 25
Position-Dependent Brightness
Correction, 13
R
Radiometric Correction, 2
Region-Based Segmentation, 69
RGB, 84



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Author                          : Jyotismita Chaki,Nilanjan Dey
Modify Date                     : 2018:10:22 18:02:31+05:30
Create Date                     : 2018:10:17 12:09:01+05:30
EBX PUBLISHER                   : CRC Press (CAM)
Page Layout                     : SinglePage
Tagged PDF                      : Yes
Page Count                      : 115
Page Mode                       : UseOutlines
Has XFA                         : No
XMP Toolkit                     : Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03
Metadata Date                   : 2018:10:22 18:02:31+05:30
Creator Tool                    : Adobe InDesign CC (Windows)
Instance ID                     : uuid:7580e15e-4f58-402b-a621-c0307905008c
Original Document ID            : xmp.did:2aecd8bd-cbd3-4539-b083-9de383973cbe
Document ID                     : xmp.id:b9c63436-f0bb-2f44-8046-9646814b6d28
Rendition Class                 : proof:pdf
History Action                  : converted
History Parameters              : from application/x-indesign to application/pdf
History Software Agent          : Adobe InDesign CC (Windows)
History Changed                 : /
History When                    : 2018:10:17 12:09:01+05:30
Derived From Instance ID        : xmp.iid:815203a2-f88b-b84c-94bc-f97f9fe7f77f
Derived From Document ID        : xmp.did:2aecd8bd-cbd3-4539-b083-9de383973cbe
Derived From Original Document ID: xmp.did:2aecd8bd-cbd3-4539-b083-9de383973cbe
Derived From Rendition Class    : default
Format                          : application/pdf
Title                           : A Beginner’s Guide to Image Preprocessing Techniques
Creator                         : Jyotismita Chaki and Nilanjan Dey
Producer                        : Adobe PDF Library 10.0.1
Trapped                         : False
EXIF Metadata provided by EXIF.tools

Navigation menu