1. Introduction
Patients suffer from malaria when their blood contains malaria parasites. The parasites are transmitted into the body through a bite of infected female Anopheles mosquitos, and they multiply in the red blood cells (RBCs) and spread to other cells. According to the World Health Organization (WHO), malaria is a public health issue that spreads to most countries, mainly tropical countries. Up to 90 countries were endemic malaria areas and more than four hundred thousand dead people in 2017. The most vulnerably infected, 61%, were children under 5 years (
Anon 2018).
One of the most extensively used malaria diagnosis techniques is the malaria rapid diagnostic test. It provides less than a 10% false positive (FP) rate. In addition, the test is easy to use and offers quantitative diagnoses within <30 min. However, this technique cannot determine the number and type of parasites (
Moody 2002).
Another malaria diagnosis technique, the cornerstone of parasite-based malaria diagnosis, is a manual light microscopic examination. This technique encourages further analysis by observing a micron part on the slide using a zooming microscope. However, the diagnosis result of this technique depends on the examiner's expertise and thoroughness. In addition, this technique is laborious because the examiner usually takes 30 min to find and count the parasite in a single thin blood smear (
Maqsood et al. 2021).
Hundreds of blood smear films are examined every week in the endemic area. Therefore, it carries massive resources and economic demand (
Maqsood et al. 2021). Moreover, some previous studies found that manual microscopy assessments produced subjective results due to different experience of the examiner (
Thimasarn et al. 2002;
Mitiku et al. 2003; Bates et al. 2004;
Tek et al. 2009). This situation is even worse because of lack of experienced examiners in the rural areas. Furthermore, the lack of health facilities and resources in rural areas might negatively affect blood smear film quality, as there are many presented artifacts and noises. Automated malaria detection methods use minimum human intervention. Therefore, the automated detection systems are more objective, reliable, and efficient than manual examinations.
Today, many automated malaria parasite detection methods with excellent performances have been proposed. Generally, there are three groups of methods for automating malaria parasite detection, namely, traditional methods, modern (artificially intelligent) methods, and hybrid methods. The traditional methods involve human knowledge in the malaria parasite detection process. The traditional methods mostly use complex image processing techniques to produce the object candidates and extracting object features using hand-engineered features, namely, color, intensity, shape, and texture (
Frean 2010;
Moon et al. 2013;
Poostchi et al. 2018;
Nugroho 2019a,
2019b). There are three main steps in the traditional method. The first step is to segment RBCs from microscopic thin blood smear images using various image segmentation techniques. The second step is to compute a set of features applying hand-engineered techniques, and then feed these features into a classifier. The final step is to classify the infected RBCs.
Several public datasets have been introduced to support the development of automated malaria detection and segmentation systems.
Quinn et al. (2014) presented a malaria public dataset. The dataset included 1182 microscopic images of thick blood smears captured by a smartphone camera. It consisted of 948 malaria-infected images with 7628 parasites. However, the parasites only consist of
Plasmodium falciparum. An early-generation malaria dataset was introduced by
Loddo et al. (2018). The dataset consisted of 229 microscopic images captured from thin blood smear films using Leica optical laboratory microscope. It contained 483 malaria-infected blood cells of 48 000 blood cells. The dataset included all malaria parasite species and their life stages. However, the parasite class distribution is significantly unbalanced. One of the classes consists of 695 parasites, and the other four classes consist of less than five parasites. The other large malaria dataset is introduced by
Sultani et al. (2022). They introduced a malaria dataset captured with multimagnification scales in 1257 thin blood smears with 3624 parasites. However, this dataset only contains
Plasmodium vivax. Recently,
Nugroho et al. (2022) presented a new dataset for malaria parasite detection and segmentation in thin blood smears. This dataset includes 559 microscopic images with 691 malaria parasites. It was captured from hundreds of thin blood smears collected from rural areas in Indonesia. The advantage of this dataset is that it includes all malaria parasite species and their life stages with better class distribution than
Loddo et al. (2018).
Maysanjaya et al. (2016) applied Otsu method to find the global threshold value for segmenting
P. Vivax, one kind of malaria parasite. They used a combination of the red channel of RGB and the saturation channel of HSV color space as pre-processing to suppress the noise. The method performance achieved an accuracy of 0.93.
Dave and Upla (2017) also applied the original Otsu technique to find the global threshold value for RBCs. In addition, they used some rule-based methods to segment the parasites. Unfortunately, this study does not state the malaria parasite types in their dataset. However, both these studies use a dataset containing only 30 microscopy images with well-separated RBCs. To find a global threshold value,
Memeu et al. (2013) proposed a double Otsu method to segment the dark objects, such as parasites, white blood cells (WBCs), and artifacts. The segmented areas were then classified by an artificial neural network to find the parasites. As a result, they obtained an accuracy of 0.95. Moreover, the proposed method achieved a fast implementation in computational time.
Currently, deep learning (DL) approaches have successfully eliminated the issues of traditional methods, such as hand-crafted feature extraction, unreliability, and complex rule base. Instead, DL models have hidden layers whose task is to extract the features automatically (
Nautre et al. 2020). However, DL approaches need an extensive dataset for training their model to obtain a good performance. In addition, collecting datasets in the medical field is more difficult and limited compared to non-medical fields. Therefore, many studies have introduced some image augmentation algorithms to overcome the problem of small data or even imbalanced data. The algorithms are also based on image processing and DL techniques. Traditional image augmentation methods based on image processing include geometric transformations (
Nugroho and Nurfauzi 2021a;
Nugroho et al. 2021), affine transformations (
Nugroho and Nurfauzi 2021a;
Nugroho et al. 2021), Euclidean geometry (
Nugroho and Nurfauzi 2021a), blurring (
Nugroho and Nurfauzi 2021a), contrast enhancement (
Nugroho and Nurfauzi 2021a;
Nugroho et al. 2021), and color transformation (
Nugroho and Nurfauzi 2021b).
Faster R-CNN, a famous DL architecture for object detection using a two-stage detector (
Ren et al. 2017), was applied to detect and classify malaria parasites in thin blood smears (
Hung et al. 2018). This method was evaluated on a public dataset containing 1300 microscopic thin blood smear images. However, the database only contains one parasite variant (
P. Vivax) with four stages (ring, trophozoite, schizont, and gametocyte). The proposed method (
Hung et al. 2018) had low results in detection results for trophozoite and ring classes. Moreover, due to the similarity in shape and intensity between malaria parasites and artifacts, this model produced a high rate of FP results for both classes.
Recently, YOLO, the most popular algorithm for object detection has shown good performance in malaria parasite detection.
Chibuta and Acar (2020) applied YOLO v3 to detect malaria parasites in two thick blood smear datasets. Their study found that the YOLO-based algorithm had a better performance for malaria detection compared to the previous one (SW + CNN). In another study,
Abdurahman et al. (2021) compared YOLO v3, YOLO v4, SSD, and faster R-CNN to detect malaria parasites on thick blood smears. The results showed that both YOLO versions performed better than the other two architectures.
DL for semantic segmentation is a technique to segment objects on an image. Some familiar architectures have been proposed, including FCN (
Long et al. 2015), U-Net (
Ronneberger et al. 2015), and DeepLab (
Chen et al. 2018a).
Ronneberger et al. (2015) proposed an architecture called U-Net, which has become famous because it performs well in many medical image applications, such as breast tumor segmentation (
Robin et al. 2021) and brain tissue segmentation in MR images (
Woo and Lee 2021).
Nautre et al. (2020) also applied U-Net to segment malaria parasites on thin blood smear by combining a preprocessing technique for malaria parasite film called GGB (green–green–blue) normalization. They also compared with other color spaces, such as RGB and HSV. The result shows that using GGB normalization is the best combination. They achieved a good performance on the accuracy of 0.995. However, they only inform accuracy, which does not represent oversegmentation and undersegmentation even when the image contains small objects compared to its background.
He et al. (2016) proposed an improved U-Net called Res-UNet, which won first place in the ImageNet competition. The architecture resulted from combining U-Net and ResNet with some modifications was aimed to solve the deep gradient degradation problem. The modification consisted of the following process: (1) designing the convolutional layer, residual unit, and polling layer by adopting the basic concept of ResNet; (2) designing a feature extractor on upsampling and downsampling layers using residual concept motivated by ResNet; (3) applying a linear interpolation technique in the deconvolution step; and (4) adjusting the number of output classes according to user needs.
ResFCN-18 is an improved FCN architecture (
Zhu et al. 2020). The large input image required three times downsampling of the feature map and deconvolution layer to get small image patches. Therefore, this architecture modification was based on the feature extractor models on the feature maps. They used three feature extractors. One of them applied ResNet-18 architecture. A recent semantic segmentation model is DeepLabV3, proposed by
Chen et al. (2018b). This model used a new feature map extractor named atrous convolution. This convolution model aims to minimize spatial information lost during extracting feature extractor by convoluting and pooling using the traditional manner.
As mentioned earlier, some significant issues to solve in this study are described below.
1.
Some DL models for object detection and semantic segmentation have promised results in many applications. However, the applications in malaria parasite fields are still infrequently explored. Therefore, this study explores some recognized DL architectures in object detection and semantic segmentation on our challenging dataset.
2.
We proposed a new hybrid method by combining optimized threshold and DL-based techniques so that the method can run well for malaria parasite detection and segmentation in thin blood smears images. We also compared our proposed method with the famous architectures in DL for object detection and semantic segmentation.
3. Results and discussion
This study proposes a hybrid scheme to detect and segment malaria parasites by combining double Otsu optimization, machine learning, and DL techniques. A method of optimizing the double Otsu to find the global threshold value for parasites is introduced in this work to generate patch candidates. This strategy aims to overcome the limitation of our dataset, including a small dataset containing many artifacts. This section presents results of each step of our scheme. The proposed scheme consists of four steps, which are extraction, balancing, classification of parasite patch candidates (parasite detection), and parasite segmentation.
In extracting parasite patch candidates, we proposed an optimization of the previous global threshold value to improve the accuracy. The global threshold value was originally proposed by
Memeu et al. (2013) to segment objects with a similar intensity to the parasite. However,
Memeu et al. (2013) applied the first Otsu to obtain the first global threshold value (
T0) for determination of RBC's intensity. Moreover, the second Otsu was applied to obtain the second global threshold value (
T1) for determination of the parasite's intensity. Consequently, these rules are unsuitable for our dataset, with a wide RBC intensity range. As a result, implementing
Memeu et al. (2013) on our dataset yielded a high FP of 43 678, as shown in
Table 1. An example of segmented objects obtained by
T1 is shown in
Fig. 8a indicated by green contours. However, the segmented objects are not only the parasites but also some clumped RBCs. It is because the second threshold value (
T1) is still within the intensity range of the RBC, as shown in
Fig. 8b. As a result, the generated patches are massive, as shown in
Fig. 8c.
After using our optimized method, the sensitivity can be increased from 98% to 99.6%. In order words, there were only three parasites missed out of 691 parasites. Moreover, our proposed method successfully suppressed the number of FPs from 43 678 to 3555 but is still higher than by
Nugroho et al. (2017). The objects can be segmented more accurately (
Fig. 8d) because the optimized second threshold value (
T2) lies at the edge of parasite's and RBC's intensities, as shown in
Fig. 8e. As a result, the generated patches successfully show more accurate parasites and significantly reduce FP parasite patches.
The generated patches as the output of this process still contained many FPs. Therefore, we needed to eliminate it using a DL technique for image classification. Furthermore, because the generated patches contained too many FP compared to the parasites, we should balance them to avoid the trained model being biased towards the majority class. Here, we used SMOTE (
Chawla et al. 2011) to balance both classes. The image in the minority class, the parasite, was augmented to the number of patches in the FP class, 3555. The samples of the patch augmentation results are shown in
Fig. 9.
After both classes were balanced, we classified them using DL for image classification techniques. This study explored six models of DL for image classification, namely, GoogleNet (
Szegedy et al. 2015), DenseNet121(
Huang et al. 2017), MobileNet V2 (
Sandler et al. 2018), MnasNet (
Tan et al. 2019), ShuffleNet v2 (
Ma et al. 2018), and ResNet50 (
He et al. 2016). This step aimed to find the best DL model for FP reduction or the end step of parasite detection in thin blood smear film. These comparative study results are shown in
Fig. 10. F1-score is the most considered performance in selecting the models because it combines specificity and precision by taking their harmonic means. Thus, we assume that DenseNet has the best performance in recognizing parasites and artifacts.
The proposed hybrid scheme combines double Otsu optimization, machine learning, and DL techniques to improve the performances of parasite detection and segmentation. This scheme was also compared with two familiar models in DL for object detection, namely, Yolo v5 light/small version (
van Rijthoven et al. 2018) and Faster R-CNN with FPN ResNet50 (
Ren et al. 2017). We limited the performance evaluation to three, i.e., sensitivity or recall, precision, and F1-score. The performance evaluation results of the data training and testing are shown in
Figs. 11a and
11b. Referring to
Fig. 11b, the proposed method has significantly better results than faster R-CNN (
Ren et al. 2017) and relatively better results than Yolo v5s (
van Rijthoven et al. 2018) on the data testing. However, the proposed method is significantly better than both methods in the data training. These results indicate that the proposed method performs better in detecting malaria parasites in our dataset with a limited number of images containing many artifacts. It is difficult to distinguish between parasites and artifacts when observing a small area due to the fact that certain types of tiny parasites possess similar characteristics to artifacts, which is the key reason behind this challenge. Hence, the proposed method enlarges the area of patches up to RBC size when a patch is more petite than RBC size.
The next step was parasite segmentation. This step aimed to get the morphology of parasites in more detail to support the advanced malaria parasite studies. After we got the parasite location, we segmented it using DL models for semantic segmentation. We explored five contemporary architectures to segment a parasite on the patch or ROI, and then we compared it without using ROI. The architectures were UNet (
Ronneberger et al. 2015), Res-UNet (
He et al. 2016), ResFCN (
Zhu et al. 2020), DeepLabV3 (
Chen et al. 2018a), and DeepLabV3+ (
Chen et al. 2018a). The comparison results are depicted in
Fig. 12. The proposed scheme performs better in all parameter comparisons than the previous. It implies that finding ROI or object detection in the beginning step of parasite segmentation improves the performances of all algorithms compared without finding ROI. Analyzing an image pixel by pixel is more difficult than analyzing it patch by patch since certain parasites show similar characteristics as artifacts when observed pixel by pixel. Our hypotheses are proven in the comparison results.