In the previous post, we reviewed the intrinsic challenges we face when recognizing insects regardless of technology, now we will review the challenges that arise when choosing a specific technology.
As a technological paradigm we will assume that we use smartphone cameras to take photos of insects and an artificial neural system will automatically recognize the insects in these images. The first decision is desirable because there are 3.5 billion smartphone users in the world and this population is growing at a rate of 8% every year (source: statista). The second decision is reasonable because Deep Learning has revolutionized the recognition of images, sound and text in the last decade giving amazing capabilities to computers such as creating paintings, playing Go to beat the world champion, driving cars in cities, among others. Obviously, if we change the technology we use, the following extrinsic challenges could change.
Temporal and active perception. Humans recognize insects using a temporal sequence of visual impressions and we use our bodies to adjust our point of view, the insect's posture, controlling lighting and zoom, that is, we employ temporal and active perception mechanisms. Unfortunately, today's technology is limited to timeless and passive perception. Based on a single image, it is requested to recognize an insect without being able to control the characteristics of the image automatically. This is a strategic subtask that seriously influences the performance of the recognition system and it is left to the user.
Insect motion. This is a problem because if the insect moves fast, it is more difficult to capture a good image.
No bug images and unknown bugs. In the specialized literature , there is a very marked tendency to assume that the image to be analyzed always has an insect. Furthermore, it is also assumed that image insects are always within the catalog of insects of the recognition system. In other words, they do not consider it is possible that an image does not have any insect or the insect is not known for the system. In practice, these situations occur many times and must be considered in the system.
Single insect, multiple insects or insect nest. For example, if we want to recognize begbugs, these insects can present themselves in three visually different ways in the images as shown in Fig. 1: an individual bedbug in the image, a few individuals in the same image, or as nest of bedbugs. This implies that the recognition system must consider these three situations increasing the complexity of the problem.
Image variability. The same insect in different images could be viewed very differently by changes in illumination, pose, viewpoint, background and occlusion. This is a very well known problem in the pattern recognition community. It has been mitigated in the last decade by the use of deep learning models with large numbers of layers, large number of image samples to train these models and GPUs to process this training.
Images on real conditions. Even though there are many images available of common insects accessible on the internet, there are still not so many images taken on real conditions, that is, taken by ordinary people using smartphones, somewhat blurred, somewhat far away from the insect, in other words, with technical flaws in the photography. Scientific papers still abound that use “scientific” images of insects under laboratory conditions where lighting, background, point of view, pose and occlusion  are controlled (See Fig. 2). On the other hand, large image databases such as ImageNet, Google Open Images Dataset and iNat Challenge 2021 that include images of insects use many "artistic" images, they are taken outside the laboratory without manipulating the insect, but still many are images where the insect appears very well focused and of a good size (See Fig. 2). Pictures taken by ordinary people on real conditions are required. These images demand the following challenges to be solved:
Small insects in images. It is needed to locate the insect in the image when the insect appears very small relative to the size of the image (See Fig. 2). Most of the insect recognition systems reported in the specialized literature assume that the insect looks good in size and is well centered.
Few images for training. There are still few such images, but deep learning models require large datasets.
 Martineau, M., Conte, D., Raveaux, R., Arnault, I., Munier, D., & Venturini, G. (2017). A survey on image-based insect classification. Pattern Recognition, 65, 273-284.