Deep learning-based small object detection: A survey

Qihan Feng; Xinzheng Xu; Zhixiao Wang; Qihan Feng; Xinzheng Xu; Zhixiao Wang

doi:10.3934/mbe.2023282

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 4: 6551-6590. doi: 10.3934/mbe.2023282

Previous Article Next Article

Survey

Deep learning-based small object detection: A survey

1.
College of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
2.
Mine Digitization Engineering Research Center of the Ministry of Education, Xuzhou 221116, China

Academic Editor: Vladimir Mityushev

Received: 18 October 2022 Revised: 21 December 2022 Accepted: 26 December 2022 Published: 02 February 2023

Small object detection (SOD) is significant for many real-world applications, including criminal investigation, autonomous driving and remote sensing images. SOD has been one of the most challenging tasks in computer vision due to its low resolution and noise representation. With the development of deep learning, it has been introduced to boost the performance of SOD. In this paper, focusing on the difficulties of SOD, we analyze the deep learning-based SOD research papers from four perspectives, including boosting the resolution of input features, scale-aware training, incorporating contextual information and data augmentation. We also review the literature on crucial SOD tasks, including small face detection, small pedestrian detection and aerial image object detection. In addition, we conduct a thorough performance evaluation of generic SOD algorithms and methods for crucial SOD tasks on four well-known small object datasets. Our experimental results show that network configuring to boost the resolution of input features can enable significant performance gains on WIDER FACE and Tiny Person. Finally, several potential directions for future research in the area of SOD are provided.

Keywords:

Citation: Qihan Feng, Xinzheng Xu, Zhixiao Wang. Deep learning-based small object detection: A survey[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6551-6590. doi: 10.3934/mbe.2023282

Related Papers:

[1]	Gireesh Shrimali . Do clean energy (equity) investments add value to a portfolio?. Green Finance, 2019, 1(2): 188-204. doi: 10.3934/GF.2019.2.188
[2]	Juraj Pekár, Ivan Brezina, Marian Reiff . Green investments: Portfolio selection based on risk measure and ESG indicators. Impact of environmental indicators on portfolio selection. Green Finance, 2025, 7(2): 223-246. doi: 10.3934/GF.2025009
[3]	Memoona Kanwal, Hashim Khan . Does carbon asset add value to clean energy market? Evidence from EU. Green Finance, 2021, 3(4): 495-507. doi: 10.3934/GF.2021023
[4]	Tadhg O'Mahony . State of the art in carbon taxes: a review of the global conclusions. Green Finance, 2020, 2(4): 409-423. doi: 10.3934/GF.2020022
[5]	Soheila Senfi, Reza Sheikh, Shib Sankar Sana . A portfolio selection using the intuitionistic fuzzy analytic hierarchy process: A case study of the Tehran Stock Exchange. Green Finance, 2024, 6(2): 219-248. doi: 10.3934/GF.2024009
[6]	Muhammad Zubair Mumtaz, Naoyuki Yoshino . Aftermarket performance of green IPOs and portfolio allocation. Green Finance, 2023, 5(3): 321-342. doi: 10.3934/GF.2023013
[7]	Óscar Suárez-Fernández, José Manuel Maside-Sanfiz, Mª Celia López-Penabad, Mohammad Omar Alzghoul . Do diversity & inclusion of human capital affect ecoefficiency? Evidence for the energy sector. Green Finance, 2024, 6(3): 430-456. doi: 10.3934/GF.2024017
[8]	Nabil Haque, Sungida Rashid . Host country characteristics attracting climate projects through public-private partnerships. Green Finance, 2019, 1(4): 405-428. doi: 10.3934/GF.2019.4.405
[9]	Amanda-Leigh O'Connell, Johan Schot . Operationalizing transformative capacity: State policy and the financing of sustainable energy transitions in developing countries. Green Finance, 2024, 6(4): 666-697. doi: 10.3934/GF.2024026
[10]	Mukul Bhatnagar, Sanjay Taneja, Ercan Özen . A wave of green start-ups in India—The study of green finance as a support system for sustainable entrepreneurship. Green Finance, 2022, 4(2): 253-273. doi: 10.3934/GF.2022012

Abstract

1. Introduction

Society is facing sustainability challenges that require systemic changes in the economic, environmental and social systems. Firms address these challenges through the creation of new ideas, services or solutions to social and environmental problems (Lin and Darnall, 2014). A focal firm willing to tackle these challenges needs to decide whether to make, buy or collaborate with other actors to develop sustainability solutions (Husted et al., 2010; Husted, 2003; Husted and de Sousa-Filho, 2017). Since firms lack the resources, competences or legitimacy to act alone, they address these challenges collaboratively, which leads to the development of two distinct types of partnership: inter-firm and cross-sector (Selsky and Parker, 2016; Clarke and Crane, 2018; Delmas et al., 2011; Lin, 2012a, 2012b, 2016).

Inter-firm alliances are formed with their suppliers, customers and other businesses including their competitors; while cross-sector partnerships are formed with organisations from other sectors (public, private, third) including NGOs, governments, local authorities, inter-governmental organisations and research institutions (Kolk, 2014). With a few exceptions (Gutiérrez et al., 2015; Schmutzler et al., 2013; Wassmer et al., 2017; Kolk et al., 2008; van Tulder and Da Rosa, 2012), research thus far has focused either on inter-firm or on cross-sector partnerships at the dyad level. However, overall, a focal firm develops and manages an extensive portfolio of various types of alliance and partnership to tackle different sustainability issues (Wassmer et al., 2012). Research suggests that we still know little about such portfolios (Wassmer et al., 2012; Gutiérrez et al., 2015; Schmutzler et al., 2013).

This paper draws on the alliance portfolio literature within the realms of corporate strategy and innovation to develop the notion of "sustainable alliance portfolios (SAPs)". First and foremost, this paper highlights the need to study SAPs, since research on SAPs is necessary to enhance our understanding of inter-organisational sustainability action. Second, SAPs show what a focal firm does about different sustainability issues; therefore, the study of SAPs gives an overview of the sustainability actions of a focal firm. Third, SAPs also demonstrate a possibility to create synergies within a portfolio as opposed to dyadic or triadic relations (Wassmer, 2008). If SAPs are managed well, then a focal firm can implement their learnings from the individual partnership across its portfolio (Wassmer et al., 2017; Dzhengiz and Malik, 2020; Dzhengiz, 2020). Finally, an SAP holds great potential to demonstrate how complex a focal firm's cognition is when it comes to sustainability. From the logic of "tell me who your friends are, and I will tell you who you are", the characteristics of the SAP are likely to be the outcome of a firms' cognition about sustainability, and hence its organisational value frames (OVFs).

OVFs impact the priorities of a focal firm in creating economic, environmental and social value; in other words, the extent to which sustainability is integrated into the business (Le Ber and Branzei, 2011; Laasch, 2018). OVFs also explain with whom a focal firm chooses to partner and why (Nooteboom et al., 2007). In the context of corporate sustainability, we can assume two frames that dominate the sustainability cognition: business case, and paradoxical (Hahn et al., 2015). The business case frame motivates a firm to address the creation of social and environmental value to the extent that it creates economic value (Carroll and Shabana, 2010; Hockerts, 2015). A paradoxical frame, on the other hand, motivates a firm to juxtapose different values and accommodate the tensions between them without prioritising one over the others (Hahn et al., 2015).

This paper applies the literature on commercial alliance portfolios to the context of sustainability and bridges the gap between OVFs and SAPs. It theorises how configuration, management and development of SAPs vary at firms with the business case and paradoxical frames. Development of this theoretical link enables us to explain the origins and limits of cognitive diversity in SAPs. It also allows us to explain how the management of SAPs is different from that of a single partnership. Finally, this theoretical link shows us the co-evolution of OVFs and SAPs, as a focal firm transitions towards sustainability. Overall, this paper generates propositions that provide new theoretical insights and creates an alternative space to discuss the origins of sustainability-oriented partnerships; therefore, it contributes to theory on sustainability-oriented partnerships, organisational cognition in corporate sustainability context and alliance portfolios.

The remainder of this paper is organised as follows. The theory section introduces three areas of literature: alliance portfolios (configuration, management and development), organisational cognition and OVFs in the corporate sustainability context, and the mechanism of cognitive homophily. In the following section, the impact of the business case and paradoxical frames on the configuration, management and development of SAPs is theorised. Finally, the conclusion section summarises the contributions of this paper and offers insights for future research.

2. Theory

2.1. Alliance portfolios

Strategic alliances are "voluntary arrangements between firms involving exchange, sharing, or co-development of products, technologies, or services" (Gulati, 1998, ). They are formed to acquire knowledge, develop new skills or capabilities (Kogut, 1988; Hamel, 1991; Inkpen and Crossan, 1995; Lin and Darnall, 2014, 2010; Baranova and Meadows, 2016) or gain legitimacy (Dacin, 2007; Herlin, 2013; Weidner et al., 2016). Research shows that a focal firm enters many alliances with various partners, developing an alliance portfolio which is the agammaegate of all strategic agreements (Wassmer, 2008; George et al., 2001; Gulati, 1998). A single alliance is formed either to exploit existing technologies or to explore new ones. An alliance portfolio, on the other hand, is a space where the firms address the tensions between exploitation and exploration and balance them to achieve ambidexterity at the firm level (Lavie, 2007; Lavie et al., 2011; Wassmer et al., 2017). Therefore, alliance portfolios hold strategic importance for a firm's competitive strategy. Wassmer (2008) reviewed the literature on alliance portfolios and highlighted three research areas: configuration, management, and development.

2.1.1. Alliance portfolio configuration

An alliance portfolio configuration refers to structural characteristics such as portfolio size or portfolio heterogeneity (or diversity) and relational characteristics such as strength or quality of relationship (Wassmer, 2008; Hoffmann, 2007; Lucena and Roper, 2016; Jiang et al., 2010; Oerlemans et al., 2013; Cui and O'Connor 2012; Cobena et al., 2017). Studies have often focused on the impact of these structural and relational characteristics on performance. For instance, some found an inverted U-shape relationship between alliance portfolio size and innovation performance, moderated by the level of centralisation and formalisation (related to portfolio management) of the alliance portfolio (Faems et al., 2012).

Perhaps the most discussed structural characteristic of an alliance portfolio is homogeneity or heterogeneity (Cui and O'Connor, 2012; Lucena and Roper, 2016; Wuyts and Dutta, 2012). We can understand portfolio heterogeneity based on the type of diversity it contains in terms of resources, organisational functions, different forms of governance, represented industries, geographical or national background of partners, partner types (organisational forms of partners) or organisational cognition (Jiang et al., 2010; Oerlemans et al., 2013).

This paper focuses on the heterogeneity of organisational cognition; in other words, cognitive diversity (Nooteboom et al., 2007). Cognitive diversity refers to the variance of belief systems in an alliance portfolio (Nooteboom et al., 2007). Studies found that cognitive distance had an inverted U-shaped relationship with innovation performance since too much distance made inter-organisational learning difficult and yet too little distance did not stimulate partners to learn from each other as their knowledge pools were too similar (Nooteboom, 2007, 2009; Wuyts et al., 2004; Nooteboom et al., 2007; Penney, 2018).

The relational configuration of an alliance portfolio is about the qualitative nature of the relationships between the partners. There are three types of partners that a focal firm can engage: friends, acquaintances, and strangers (Li et al., 2008). Friends are "potential alliance partners with whom a firm has developed strong-form of trust through multiple previous interactions" or those that are perceived "trustworthy, independently of whether or not exchange vulnerabilities or governance mechanisms exist" (Li et al., 2008). When firms experience market uncertainty, they try to reduce this uncertainty by adding new relationships into their portfolio with their already existing partners, meaning their friends (Beckman et al., 2004). However, it does not mean that firms are locked into their existing relationships forever. Firms "tend to explore and broaden their alliance networks when they are experiencing very high degrees of firm-specific uncertainty and low levels of market uncertainty" (Beckman et al., 2004). In these circumstances, they engage with acquaintances and even strangers. Acquaintances refers to partners that the focal firm knows about but may not yet have developed a greater trust with (Li et al., 2008). Finally, strangers are "potential alliance partners that are unknown to each other" where, among other things, trust is at its weakest due to each party's lack of knowledge about the other (Li et al., 2008). In an alliance portfolio, a focal firm would have different quantities of friends, acquaintances and strangers. The composition of friends, acquaintances and strangers would evolve as the firm evolves in line with the changes in its external and internal environment.

2.1.2. Alliance portfolio management

Firms pursue multiple goals through simultaneous alliances, and their portfolios enable them to "create a more substantial experience base to accelerate their learning on how to design and manage strategic alliances" (Wassmer, 2008). The management of alliance portfolios involves the development of tools and systems to manage different alliances simultaneously and routines and practices to capture, codify and share alliance-related knowledge (Kale et al., 2002; Kale and Singh, 2007; Schreiner et al., 2009; Hoffmann, 2007). Therefore, an essential aspect of alliance portfolio management is where alliance portfolios are coordinated, as it impacts how the firm builds capabilities to manage alliances in its portfolio.

A dedicated alliance function is a departmental unit that is responsible for managing all of a firm's interactions (Kale and Singh, 2007, 2009; Findikoglu and Lavie, 2018). Having a dedicated alliance function may help a firm by providing a "focal point for capturing and storing alliance management lessons and best practices", enhancing "visibility and awareness of the firm's alliances among external stakeholders" and giving legitimacy to demand for the "internal resources necessary for alliance success" (Kale and Singh, 2009, 2007). While some studies found that a dedicated alliance function positively impacts alliance success due to its enhanced approach to initiating, modifying and coordinating alliances, others highlighted that a dedicated function might limit a firm's flexibility due to extensive formalisation (Kale et al., 2002; Wassmer, 2008; Findikoglu and Lavie, 2018). A recent study found that "the dedicated function enhances the ability to leverage firm-specific routines but limits the ability to successfully employ partner-specific routines" (Findikoglu and Lavie, 2018). In sum, while a dedicated unit may help to manage alliance portfolios more efficiently, paradoxically, it may reduce the possibility of partner-specific organisational learning (Findikoglu and Lavie, 2018).

2.1.3. Alliance portfolio development

Alliance portfolios develop as a result of formations and terminations of individual alliances, also referred to as alliance portfolio reconfigurations (Wassmer, 2008). Studies showed that the development of an alliance portfolio may be driven by a firm's responses to changes in its external environment (Lavie and Singh, 2011), such as technological discontinuities (Asgari et al., 2017), changes in the market and competition (Ozcan, 2018) or a firm's internal competence or resource needs (Hoffmann, 2007; Wassmer et al., 2017; Chiambaretto and Wassmer, 2019). In sum, alliance portfolios co-evolve with firm-level strategy (Lavie and Singh, 2011; Chiambaretto and Fernandez, 2016; Chiambaretto and Wassmer, 2019). Here, co-evolution refers to the intertwined relationship between the change in a firm's strategy and its alliance portfolio. Two elements drive this co-evolution: internal needs and changes in the external environment (Lorenzoni and Lipparini, 1999; Chiambaretto and Fernandez, 2016; Chiambaretto and Wassmer, 2019). A firm's internal needs determine what resources it must acquire and utilise, which in turn influences firm strategy and leads to changes in the portfolio (Chiambaretto and Wassmer, 2019); changes in the external environment, such as changes in the dynamics of competition or industry norms, also impact firm strategy, which also influences portfolio reconfigurations (Lorenzoni and Lipparini, 1999).

The co-evolution of the portfolio with firm strategy is not independent of those who are managing it; on the contrary, their aspirations about the firm's strategy and their bounded rationality and cognitive limitations impact this co-evolution (Ozcan and Eisenhardt, 2009; Kavusan and Frankort, 2019). A recent study proposed that alliance portfolio reconfigurations may be driven by the dominant logic which can be understood as the "common way for managers to view their business and allocate resources, manifested in the expectations and assumptions managers hold about a particular business context" (Penney, 2018). This study proposed that those "managers [that] match alliance portfolio diversity to the firm's dominant logic(s), will experience greater operational synergies, learning opportunities, and performance", and if they fail to do so, "performance suffers, and the firm will need to either add a new dominant logic or change the existing logic" (Penney, 2018).

2.2. Organisational value frames in corporate sustainability

Organisational cognition, also referred to as organisational schemata (Rerup and Feldman, 2011), shared interpretation system (Weick, 1995) or organisational focus (Nooteboom et al., 2007), is about how the organisation thinks and interprets situations or external events. This interpretation depends on frames that are defined as "a set of shared assumptions, values, and frames of references that give meaning to everyday activities and guide how organisational members think and act" (Rerup and Feldman, 2011). Indeed, these frames would vary depending on the issue or area of the business.

Organisational value frames (OVFs) are specific frames about "interpretations of value which comprise the organising principles of what is valued and valuable" (Kaplan and Murray, 2008). OVFs matter in studying alliance portfolios, as they motivate firms' agency to build collaborations, guide organisational action towards partners and justify beliefs regarding this action (Le Ber and Branzei, 2011; Watson et al., 2018).

OVFs also matter in the corporate sustainability context; because they impact whether a firm prioritises creating or capturing economic, environmental or social value (Laasch, 2018; Dzhengiz and Hockerts, 2019). While managerial and organisational cognition in corporate sustainability is still in its development phase (Grewatsch and Kleindienst, 2018; Le Ber and Branzei, 2011; Gröschl et al., 2017; Hahn and Aragón-Correa, 2015; Joseph et al., 2019; Maon et al., 2008), a seminal paper proposed two frames: business case and paradoxical frames (Hahn et al., 2015). Frames other than business case and paradoxical exist (Dzhengiz and Hockerts, 2019). For instance, business frames represent the unsustainability of decision-makers as it is characterised by a very narrow focus on economic outcomes (Sharma and Jaiswal, 2017), or instrumental environmental and social frames demonstrate the environmental or social value creation objectives of ecopreneurs or social enterprises (Dzhengiz, 2018). However, since this paper focuses specifically on for-profits in the context of corporate sustainability, only business case and paradoxical frames will be discussed here. Hahn et al. (2015) originally developed these frames at the individual level. By assuming interpretive dominance (Prahalad and Bettis, 1986) and taking a similar approach to that of Grewatsch and Kleindienst (2018), business case and paradoxical frames are applied to the organisation-level phenomenon. Based on Hahn et al. (2015), what follows is the content and structure of these frames.

2.2.1. Business case frames

Business case frames are based on an alignment logic which motivates the organisation to eliminate tensions between economic, environmental and social value objectives (Hahn et al., 2015). When faced with tensions in aligning non-financial values with their financial ones, firms with this frame will dismiss the non-financial ones (Hahn et al., 2015; Joseph et al., 2019). This frame has a low degree of differentiation and integration of different frames (environmental and social) that guide sustainability interpretations. Therefore, these organisations pursue sustainability initiatives to create environmental and social value while reducing cost through eco-efficiency or socio-efficiency and enhancing income by generating market returns and accessing new markets (Dyllick and Hockerts, 2002; Hockerts, 2015). That is, they pursue only those sustainability initiatives that would also contribute to their economic bottom line.

2.2.2. Paradoxical frames

Paradoxical frames are based on an integration logic that acknowledges "the tension between opposing task elements, yet understand[s] that combining opposing task elements tempers the undesirable side effects of each element alone and leads to new solutions that integrate both elements" (Miron-Spektor et al., 2011). Paradoxical frames have a complex structure due to a high degree of differentiation because they recognise all economic, environmental and social bottom lines distinctively (Hahn et al., 2015). At the same time, they have a high degree of integration since they take into account the interconnectedness between these bottom lines (Hahn et al., 2015). Therefore, with their highly complex frames, these organisations are expected to create environmental, social, and economic value simultaneously without prioritising one over the other (Gao and Bansal, 2012; Joseph et al., 2019; Hahn et al., 2015).

2.3. Cognitive homophily and value homophily

Cognitive homophily refers to the mechanism that drives "cooperation between similar strangers to maximise the chance of successful cooperative interactions because similar individuals [are] more likely to share relevant behavioural tendencies" (Haun and Over, 2015). While evolutionary psychology utilises the concept of cognitive homophily to explain the cooperation between similar individuals, cognitive homophily also applies to the same phenomenon in groups and organisations (Haun and Over, 2015).

Cognitive homophily is about the similarity between partners in terms of resources, network positions, status and values in the partnership context (Chung et al., 2000; Mitsuhashi and Greve, 2009; Knoben et al., 2019; Ahuja et al., 2009). For instance, a recent study showed that some INGOs were "more likely to collaborate when they [had] similar (closer) founding dates, when they [were] headquartered in the same geographic region, when they were headquartered in the same global hemisphere (north/south), when they [had] the same status, and when they [had] common funding partners" (Atouba and Shumate, 2015).

In this study, value homophily is considered critical in sustainability collaborations. Value homophily is "the idea that it is more rewarding to interact with others who hold similar values" since "others who see things as we do are more likely than dissimilar others to be empathetic and to provide us with positive feedback" (Ingram and Morris, 2007). In the context of corporate sustainability, value homophily means a focal firm selecting partners that have similar OVFs. While it is expected to see that value homophily yields positive relational outcomes, it is, at the same time, the self-confirmation bias of "actors that seek relationships with others sharing their beliefs about [sustainability] issues" (Henry and Dietz, 2012).

3. Theorising sustainable alliance portfolios

SAPs are the collection of both inter-firm alliances and cross-sector partnerships that a focal firm forms to generate product, process or organisational innovations to address environmental and social sustainability challenges (Schmutzler et al., 2013; Gutiérrez et al., 2015; Dzhengiz and Malik, 2020). That is, in addition to having inter-firm alliances in their portfolios as described in the alliance portfolio section, SAPs also include cross-sector partnerships (Dzhengiz, 2018). Cross-sector partnerships are long-term interactions between organisations from at least two sectors to address social or environmental problems between businesses and government (Stadtler, 2014; Lin, 2016) or civil society members such as NGOs (den Hond et al., 2012), and sometimes in a tripartite form that combines all sectors (Clarke and Crane, 2018; Kolk, 2014). While they are of strategic importance to firms, only a few alliance portfolio studies addressed cross-sector partnerships within a broader alliance portfolio (Gutiérrez et al., 2015; Schmutzler et al., 2013; Kale and Singh, 2009; Kolk et al., 2008; van Tulder and Da Rosa, 2012).

Inter-firm partnerships are formed to provide economic value that may also bring environmental and social value, often without contradicting the business case. Different from them, cross-sector partners may "enact contradictory value creation logics" (Le Ber and Branzei, 2011), since conflicts may "arise when the operationalisations of two values are seen as incompatible" (Garst et al., 2019). This value-based distance between partners may yield issues in the initial engagement phase for building trust, in the implementation phase, making communication and operation difficult, and finally in the impact assessment phase when parties evaluate the outcomes of these partnerships differently (Reficco and Márquez, 2009; van Tulder et al., 2015; Athanasopoulou and Selsky, 2012).

3.1. Sustainable alliance portfolio configuration

Value homophily would impact different outcomes in terms of the structural and relational configuration of the SAP.

In this paper, a homogenous portfolio is one that contains the low value-based distance between partners. Firms with business case frames have a low degree of complexity due to low differentiation and integration of sustainability bottom lines. Their narrower perception would mean that across all spaces, the firm is focused on creating a business case. Therefore, to firms with this OVF, homophily applies by motivating them to engage with organisations sympathetic to the creation of a business case. For instance, a study finds that managers in such firms "divide the NGO collective between those they think they can work with and those who, no matter what position companies adopt, will always oppose them" (Lucea, 2010). Therefore, it is likely that these firms will choose partners that they have a low value-based distance with and create more homogenous portfolios in comparison to those with paradoxical frames.

A heterogeneous portfolio contains high value-based cognitive distance between partners. When the complexity of frames increases, firms would be able to incorporate environmental and social frames in different organizational spaces. In spaces where firms can maintain the existence of environmental and social frames, they can connect with others that carry instrumental environmental and social frames as well (Hahn et al., 2017; Hahn and Aragón-Correa, 2015). Therefore, to firms with paradoxical frames, value homophily applies by motivating them to engage with various organisations with different value frames. For instance, these firms would not experience a great value distance with environmental or social NGOs since they also associate themselves with such frames. Therefore, it is likely that they create more heterogeneous portfolios than those with business case frames.

Dzhengiz (2018), in the context of electricity utilities, finds that electric utilities with business case frames partnered with other businesses to a much greater degree. Whereas those with paradoxical frames demonstrated a portfolio configuration in which cross-sector partners such as environmental NGOs had a more critical role. To further illustrate this, let us consider the green transition of a black energy company to a green one (Dzhengiz and Malik, 2020). Dzhengiz and Malik (2020) finds that as a company integrates sustainability to its core business, and hence its organisational value frames moves from business case to more paradoxical and complex, the diversity of partnerships in the portfolio increases.

Proposition 1a: Due to the impact of value homophily and the content and structure of their frames, the SAPs of firms with paradoxical frames are more heterogeneous than the SAPs of firms with business case frames.

Firms with the business case and paradoxical frames also differ with regards to their SAPs' relational configuration. Firms with business case frames are less interested in the engagement with stakeholders that have no financial impact on their business (Watson et al., 2018). They collect less information about sustainability issues with limited breadth and detail of scanning and aim to respond to sustainability challenges with pragmatic solutions (Hahn et al., 2015). In addition to value homophily, a firm as such would mostly engage with friends to enhance efficiency, reduce transaction costs and time spent finding new partners and avoid the ambiguity of partnering with the unknown (Beckman et al., 2004; Li et al., 2008). Friends, in this context, consist of industry peers, and some environmental or social NGOs that they have already worked with to offset their environmental impact or improve their reputation by corporate philanthropy. These firms may also have some acquaintances. These acquaintances would be organisations that they have not yet worked with formally in an alliance setting. But, if they did, the partners would see the potential for improving their alliance and social legitimacy (Weidner et al., 2016). For instance, a firm such as this may partner with WWF or OXFAM, who are known for their corporate partnerships with global MNCs (Šimunović et al., 2018).

Firms with paradoxical frames, on the other hand, would engage with a diverse set of stakeholders, scan environmental and social issues with high breadth and detail and aim to respond with prudent solutions (Hahn et al., 2015). A firm such as this would go beyond partnering with friends and acquaintances and may add strangers to their SAP. In this context, strangers may be radical NGOs that rarely partner with businesses and often take an advocacy role such as the World Rainforest Movement (Šimunović et al., 2018). While these organisations may be unknown to a focal firm and perceived as a radical organisation by others, they may not even necessarily perceive their stranger partners as "strange" due to the complexity of their OVFs.

Proposition 1b: Due to the impact of value homophily and the content and structure of their frames, firms with paradoxical frames would have more strangers and acquaintances in their SAP, when compared to firms with business case frames.

3.2. Sustainable alliance portfolio management

SAPs can also be viewed as spaces in which the tensions between different OVFs are either eliminated or juxtaposed (Wassmer, 2008; Wassmer et al., 2017).

Firms with business case frames would have SAPs that consists of friends with similar OVFs and a few acquaintances with less similar OVFs. Due to their alignment logic that aims to eliminate tensions, separating friends from acquaintances would provide operational efficiencies and help create synergies across similar partnerships in these firms. Besides, since they prioritise efficiency and cost-saving when coordinating, they would generally enjoy mechanistic coordination (Schneider et al., 2014). Through this logic, their inter-firm alliances would also be coordinated from a dedicated alliance portfolio function. In contrast, their cross-sector partnerships would be coordinated from a dedicated corporate sustainability function.

Proposition 2a: Firms with business case frames would prefer mechanistic coordination of alliance portfolios and sustainability management; managing inter-firm partners in a dedicated alliance unit, while managing cross-sector partners in a dedicated corporate sustainability or stakeholder engagement unit.

For firms with paradoxical frames, SAPs consist of seemingly contradictory and inconsistent OVFs due to the mix of friends, acquaintances, and strangers. Due to their logic of integration and juxtaposition, these firms would enjoy organic coordination mechanisms that provide greater flexibility, knowledge exchange and cross-functional integration (Schneider et al., 2014). While these organisations would still benefit from a dedicated corporate sustainability function which may serve as "an internal change-management consultancy, a repository of how-to knowledge and relationship managers" (Grayson and Arevalo, 2011), they would manage inter-firm and cross-sector alliances across all functions. For instance, the purchasing function may work through the facilitation of a corporate sustainability function to address sustainable supply chain issues (Wilhelm et al., 2016), while engaging with customers in the marketing and sales functions for sustainable consumption initiatives (Mariadoss et al., 2011). Overall, these companies would demonstrate collaborative business models and embed alliance management into different functions (Rohrbeck et al., 2013; Pedersen et al., 2018).

Proposition 2b: Firms with paradoxical frames would prefer organic coordination of alliance and sustainability management; managing alliances and partnerships across all functions in a collaborative business model.

3.3. Sustainable alliance portfolio development

In this paper, it is assumed that firms would follow an evolutionary path from narrower to broader frames. The evolution of OVFs at a focal firm would originate from the point of rejection of corporate sustainability, which is an even narrower frame than the business case frames, from business frames (Sharma and Jaiswal, 2017; Branzei et al., 2000; Dzhengiz and Hockerts, 2019). It is assumed that firms' cognition would evolve from these business frames, to business case frames and finally to the paradoxical frames (Dzhengiz and Hockerts, 2019). This evolution would be intertwined with the evolution of SAPs, triggered at first by external and then by internal events. The structural and relational configuration of SAPs would co-evolve with OVFs, as can be gathered from proposition 1a and 1b and updated in proposition 3a and 3b. In this section, the co-evolution OVFs with relationships specific to cross-sector partnerships are discussed.

External events often induce a change in OVFs, as firms sense changes around them and adjust themselves to maintain their legitimacy (Dacin et al., 2007; Le Ber and Branzei, 2011). The first shift (from business to business case frames) is largely induced by the increased legitimacy of sustainability in the external environment (Hoffman, 2001). For instance, Hoffman (2001) showed that many firms shifted from business to business case in the chemical industry when they perceived a significant mismatch between their frames and the environment, and realised that not shifting would reduce their legitimacy (Dacin et al., 2007; Vurro et al., 2011). For example, if a chemical company receives criticisms about the use of a certain pesticide damaging the environment, they may simply reject the criticisms and keep their position, thus attempting to remain legitimate using framing activities that demonstrate a justification for the use of the pesticide (Scherer et al., 2013). However, if they perceive that the public demand requires a more proactive action, then they may shift their frame towards announcing the phase-out of the pesticide (Scherer et al., 2013), hence a shift from business to business case.

In a similar vein, Valente (2012) called this an extraneous phase in which the focal firm would attribute "issues to external factors beyond their locus of control". At most, to maintain legitimacy, they may engage with philanthropic activities with an NGO for charitable donations (Austin, 2000). When they realise the reputational benefits of this philanthropic engagement, they will start shifting towards business case frames. As a result, they become more willing to add other philanthropic alliances to their SAP while terminating partnerships with some reactive coalitions and unsustainable partners (Penney, 2018), which in turn would reconfigure their SAP (Wassmer, 2008). As their SAP co-evolves with their OVFs, some philanthropic cross-sector partnerships may gradually shift to transactional relationships (Austin, 2000).

Transactional relationships entail resource exchanges, such as cause-related marketing, event sponsorship and other service arrangements (Austin, 2000; Selsky and Parker, 2010). These relationships require more frequent interactions than philanthropic ones, hence allowing firms with business case frames to spend more time with their partners (Austin, 2000; Selsky and Parker, 2010). At the alliance level, partners compare each other's frames and negotiate for a shared frame that emerges through interactions; their frames would fuse, or a new frame would emerge (Seidl and Werle, 2017; Le Ber and Branzei, 2011). This negotiation process exposes firms to different OVFs (Klitsie et al., 2018). Thanks to boundary spanners, different OVFs permeate within the firms (Sharma and Good, 2013), get combined with the dominant frames and lead to a cultural transformation (Le Ber and Branzei, 2011).

Such transformation, however, is often coupled with internal events (Valente, 2012), such as hiring new senior managers from different backgrounds (Guerci and Carollo, 2015; Joseph et al., 2019). For instance, at Puma, such a shift was coupled with the change in the cognitive frames of their previous CEO who induced a shift in OVFs due to the complexity of their cognitive frames (Gröschl et al., 2017). The second shift (from business case to paradoxical) would also impact some transactional relationships moving them towards transformational relationships that allow firms to create a merged identity with different partners (Austin, 2000). In these relationships, the boundaries between organisations can become blurry (Selsky and Parker, 2010), which enhances the maintenance of paradoxical frames.

Such observations have already been made in the previous literature and that scholars have already proposed that alliance relationships evolve in the lifetime of a partnership (Austin, 2000; Austin and Seitanidi, 2012a, 2012b). Similarly, others have also explained how not only partnership relations but also frames may shift overtime at firms (Gröschl et al., 2017) and within partnerships (Le Ber and Branzei, 2011). Recently, some scholars have proposed that as the firms' sustainability orientation would change, so would their partnership motivations and that there are evolutionary dynamics even in the partnership motivations of firms (Riandita, 2020). Herein, these observations are synthesised. Hence the below is proposed as an extension of already existing literature:

Proposition 3a: The structural configuration of SAPs would evolve from more homogenous to heterogenous as firms evolve from business case to paradoxical.

Proposition 3b: The relational configuration of SAPs would evolve from an SAP dominated by mostly friends and some acquaintances to a more diverse SAP with friends, acquaintances and strangers as firms evolve from business case to paradoxical.

Proposition 3c: The relationship with cross-sector partners would evolve from more philanthropic and transformational to a diverse set of relationships including philanthropic, transactional and transformational relations as firms evolve from business case to paradoxical.

Proposition 3d: Firms' organisational value frames co-evolve with their SAPs.

4. Conclusion

This paper provided a current review of alliance portfolio and OVF literature in the corporate sustainability context, developed the notion of sustainable alliance portfolios (SAPs) and demonstrated how organisational value frames (OVFs) might impact configuration, management and development of these portfolios. Overall, three propositions were developed describing the impact of two frames—business case and paradoxical frames—on SAPs as summarised in Table 1.

Table 1. Organisational Value Frames and Sustainable Alliance Portfolios: A Summary of the Propositions.

	Business case Frame	Paradoxical Frame
SAP Structural Configuration ● Homogeneity/ Heterogeneity ● Relational Configuration ● Relationship with Cross-Sector Partners	Low heterogeneity Friends & acquaintances Philanthropic & transactional	High heterogeneity Friends, acquaintances & strangers Philanthropic, transactional & transformational
SAP Management (functional units)	Dedicated alliance function Dedicated corporate sustainability function	Integrated alliance management (collaborative business model)
SAP Development: Co-evolution between the SAP & OVFs

| Show Table

DownLoad: CSV

This paper contributes to the scholarly conversation in three research areas: sustainability-oriented partnerships (Lin and Darnall, 2014), OVFs (Le Ber and Branzei, 2011; Laasch, 2018) and alliance portfolios (Gutiérrez et al., 2015; Schmutzler et al., 2013). For instance, previous studies showed the role of partner diversity on the development of proactive environmental strategies (Lin, 2012a); however, they lacked an explanation of the origins and the extent of partner diversity, which this study brings forward. While previous studies showed how partners' value frames shift, fuse, or co-evolve at the partnership level (Le Ber and Branzei, 2011; Klitsie et al., 2018), this study demonstrated how the value frames of a focal firm co-evolve with their SAPs. Previous studies proposed that successfully managing SAPs helps the firm address tensions between economic, environmental and social value creation and enhance sustainability performance (Wassmer et al., 2017). This study went beyond by showing that the successful management of SAPs is contingent on the focal firm's value frames. A few studies highlighted the importance of studying SAPs and provided evidence from specific empirical contexts such as the bottom of the pyramid or environmental partnerships (Gutiérrez et al., 2015; Schmutzler et al., 2013; Wassmer et al., 2017). Finally, the organisational cognition perspective is missing in the broader alliance portfolio literature. Therefore, this paper also invites business and management scholars to engage in the cognitive roots of alliance portfolio formation.

This study, however, is not without limitations. Due to its conceptual nature, the propositions built are only a starting point for future research. First, future research should test the relationship between SAP management and OVFs, demonstrating which departmental units are involved in managing SAPs and to what extent OVFs impact this involvement. Such studies could be developed through cross-sectional questionnaires. Future research should show whether the management locations of SAPs impact different performance outcomes. The management of SAPs depends not only on the location but also the extent of knowledge-sharing between different individuals that manage alliances within the SAP, and the codification of this shared knowledge (Kauppila, 2015; Duysters et al., 2012; Koza and Lewin, 2000). Hence learning and knowledge management perspectives can be used to shed light on how these aspects are managed within the SAP (Dzhengiz, 2020).

Second, future research should demonstrate how OVFs impact the structural and relational configuration and re-configuration of SAPs. Only a few studies have shed light on this area of SAP development and change (Schmutzler et al., 2013; Gutiérrez et al., 2015; Dzhengiz, 2018; Dzhengiz and Malik, 2020; Riandita, 2020). These studies can help us explain the inter-relationships between OVFs, firm strategy and SAPs. In particular, longitudinal studies can be helpful to provide the co-evolutionary nature of SAPs and OVFs.

Third, although out of the scope of this paper, the other area of research on alliance portfolios is portfolio outcomes (Wassmer, 2008). Studies in this vein are increasing as the scholarly community tries to understand the role of SAPs in performance (Albino et al., 2012). Future studies should demonstrate how OVFs mediate the impact of SAPs on sustainability performance.

Fourth, this study proposed that the diversity in SAPs depend on OVFs. Moreover, it suggested that those with business case frames, in comparison to those with paradoxical frames, would have less heterogeneous SAPs. Future research should test the relationship between SAP configuration and OVFs quantitatively.

Fifth, the development of partnership portfolios for sustainability appears to follow a path-dependency. As other studies have identified, firms are, indeed embedded into their already existing ties with various organizations (Burchell and Cook, 2013). In addition to this relational embeddedness, however, there is also the cognitive embeddedness. The cognitive biases and value homophily would affect the partnership formation process. Existing studies often highlighted the heterogeneity and diversity of portfolios; however, the degree of heterogeneity has usually not received much problematisation. Future research should take a critical approach to such portfolios and problematise both the sustainability and the diversity assumption of these portfolios.

Acknowledgments

The author is thankful to the audience and organisers of Corporate Responsibility Research Conference 2019 and Cross-sector Social Interactions (CSSI) Conference 2020 for their valuable feedback on the earlier versions of this manuscript. The author is also thankful for Khaleel Malik, Mike Hodson and Kai Hockerts for their friendly feedback on the earlier versions of this manuscript.

Conflicts of interest

The author declares no conflict of interest.

References

[1]	S. Agarwal, J. O. D. Terrail, F. Jurie, Recent advances in object detection in the age of deep convolutional neural networks, preprint, arXiv: 1809.03193.
[2]	R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, (2014), 580–587. https://doi.org/10.1109/CVPR.2014.81
[3]	R. Girshick, Fast R-CNN, in 2015 IEEE International Conference on Computer Vision (ICCV), (2015), 1440–1448. https://doi.org/10.1109/ICCV.2015.169
[4]	S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2016), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031 doi: 10.1109/TPAMI.2016.2577031
[5]	K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN, IEEE Trans. Pattern Anal. Mach. Intell., 42 (2020), 386–397. https://doi.org/10.1109/TPAMI.2018.2844175 doi: 10.1109/TPAMI.2018.2844175
[6]	J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You Only Look Once: Unified, real-time object detection, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 779–88. https://doi.org/10.1109/CVPR.2016.91
[7]	J. Redmon, A. Farhadi, YOLOv3: An incremental improvement, preprint, arXiv: 1804.02767.
[8]	J. C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, preprint, arXiv: 2207.02696.
[9]	K. Kang, H. Li, J. Yan, X. Zeng, B. Yang, T. Xiao, et al., T-CNN: tubelets with convolutional neural networks for object detection from videos, IEEE Trans. Circuits Syst. Video Technol., (2017), 2896–2907. https://doi.org/10.1109/TCSVT.2017.2736553 doi: 10.1109/TCSVT.2017.2736553
[10]	T. Yin, X. Zhou, P. Krahenbuhl, Center-based 3d object detection and tracking, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 11784–11793. https://doi.org/10.1109/CVPR46437.2021.01161
[11]	J. Dai, K. He, J. Sun, Instance-aware semantic segmentation via multi-task network cascades, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 3150–3158. https://doi.org/10.1109/CVPR.2016.343
[12]	B. Hariharan, P. Arbeláez, R. Girshick, J. Malik, Hypercolumns for object segmentation and fine-grained localization, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 447–456. https://doi.org/10.1109/CVPR.2015.7298642
[13]	B. Hariharan, P. Arbeláez, R. Girshick, J. Malik, Simultaneous detection and segmentation, in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII 13, (2014), 297–312. https://doi.org/10.1007/978-3-319-10584-0_20
[14]	C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., Going deeper with convolutions, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 1–9. https://doi.org/10.1109/CVPR.2015.7298594
[15]	H. Wang, F. He, Z. Peng, T. Shao, Y. L. Yang, K. Zhou, et al., Understanding the robustness of skeleton-based action recognition under adversarial attack, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 14656–14665. https://doi.org/10.1109/CVPR46437.2021.01442
[16]	L. Wang, Z. Tong, B. Ji, G. Wu, TDN: Temporal difference networks for efficient action recognition, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 1895–1904. https://doi.org/10.48550/arXiv.2012.10071
[17]	D. Li, Z. Qiu, Y. Pan, T. Yao, H. Li, T. Mei, Representing videos as discriminative sub-graphs for action recognition, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 3310–3319. https://doi.org/10.48550/arXiv.2201.04027
[18]	C. F. R. Chen, R. Panda, K. Ramakrishnan, R. Feris, J. Cohn, A. Oliva, et al., Deep analysis of cnn-based spatio-temporal representations for action recognition, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 6165–6175. https://doi.org/10.1109/CVPR46437.2021.00610
[19]	S. Jha, C. Seo, E. Yang, G. P. Joshi, Real time object detection and trackingsystem for video surveillance system, Multimed. Tools Appl., 80 (2021), 3981–3996. https://doi.org/10.1007/s11042-020-09749-x doi: 10.1007/s11042-020-09749-x
[20]	M. A. Farooq, A. A. Khan, A. Ahmad, R. H. Raza, Effectiveness of state-of-the-art super resolution algorithms in surveillance environment, in Conference on Multimedia, Interaction, Design and Innovation, 1376 (2021), 79–88. https://doi.org/10.48550/arXiv.2107.04133
[21]	X. Zheng, X. Li, K. Xu, X. Jiang, T. Sun, Gait identification under surveillance environment based on human skeleton, preprint, arXiv: 2111.11720.
[22]	F. Wu, Q. Wang, J. Bian, H. Xiong, N. Ding, F. Lu, et al., A survey on video action recognition in sports: datasets, methods and applications, preprint, arXiv: 2206.01038.
[23]	C. J. Roros, A. C. Kak, maskGRU: Tracking small objects in the presence of large background motions, preprint, arXiv: 2201.00467.
[24]	Y. B. Can, A. Liniger, D. P. Paudel, L. Van Gool, Structured bird's-eye-view traffic scene understanding from onboard images, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 15641–15650. https://doi.org/10.1109/ICCV48922.2021.01537
[25]	S. Hampali, S. Stekovic, S. D. Sarkar, C. S. Kumar, F. Fraundorfer, V. Lepetit, Monte carlo scene search for 3d scene understanding, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 13804–13813. https://doi.org/10.1109/CVPR46437.2021.01359
[26]	J. Hou, B. Graham, M. Niessner, S. Xie, Exploring data-efficient 3d scene understanding with contrastive scene contexts, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 15587–15597. https://doi.org/10.1109/CVPR46437.2021.01533
[27]	Y. Liu, R. Wang, S. Shan, X. Chen, Structure inference net: object detection using scene-level context and instance-level relationships, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 6985–6994. https://doi.org/10.1109/CVPR.2018.00730
[28]	M. Schön, M. Buchholz, K. Dietmayer, MGNet: monocular geometric scene understanding for autonomous driving, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 15784–15795. https://doi.org/10.1109/ICCV48922.2021.01551
[29]	K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90
[30]	S. H. Gao, M. M. Cheng, K. Zhao, X. Y. Zhang, M. H. Yang, P. Torr, Res2Net: a new multi-scale backbone architecture, in IEEE Trans. Pattern Anal. Mach. Intell., 43 (2021), 652–662. https://doi.org/10.1109/TPAMI.2019.2938758
[31]	K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556.
[32]	A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, et al., MobileNets: efficient convolutional neural networks for mobile vision applications, preprint, arXiv: 1704.04861.
[33]	M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. C. Chen, MobileNetV2: inverted residuals and linear bottlenecks, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 4510–4520. https://doi.org/10.48550/arXiv.1801.04381
[34]	K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015), 1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824 doi: 10.1109/TPAMI.2015.2389824
[35]	T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 936–944. https://doi.org/10.1109/CVPR.2017.106
[36]	W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, et al., SSD: single shot multibox detector, in European Conference on Computer Vision, (2016), 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
[37]	C. Zhu, Y. He, M. Savvides, Feature selective anchor-free module for single-shot object detection, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 840–849.
[38]	H. Law, J. Deng, CornerNet: Detecting objects as paired keypoints, in European Conference on Computer Vision, (2018), 765–781. https://doi.org/10.1007/978-3-030-01264-9_45
[39]	Z. Tian, C. Shen, H. Chen, T. He, FCOS: fully convolutional one-stage object detection, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 9626–9635. https://doi.org/10.1109/ICCV.2019.00972
[40]	X. Zhou, D. Wang, P. Krähenbühl, Objects as points, preprint, arXiv: 1904.07850.
[41]	C. Eggert, S. Brehm, A. Winschel, D. Zecha, R. Lienhart, A closer look: small object detection in faster R-CNN, in 2017 IEEE International Conference on Multimedia and Expo (ICME), (2017), 421–426. https://doi.org/10.1109/ICME.2017.8019550
[42]	C. Chen, M. Y. Liu, O. Tuzel, J. Xiao, R-CNN for small object detection, in Asian Conference on Computer Vision, 10115 (2017), 214–230. https://doi.org/10.1007/978-3-319-54193-8_14
[43]	T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, et al., Microsoft COCO: common objects in context, in European Conference on Computer Vision, (2014), 740–755. https://doi.org/10.48550/arXiv.1405.0312
[44]	J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, F. Li, ImageNet: a large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 248–255. https://doi.org/10.1109/CVPR.2009.5206848
[45]	M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman, The pascal visual object classes (voc) challenge, Int. J. Comput. Vis., 88 (2010), 303–338. https://doi.org/10.1007/s11263-009-0275-4 doi: 10.1007/s11263-009-0275-4
[46]	Z. Zong, G. Song, Y. Liu, DETRs with collaborative hybrid assignments training, preprint, arXiv: 2211.12860.
[47]	S. Yang, P. Luo, C. C. Loy, X. Tang, WIDER FACE: a face detection benchmark, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 5525–5533. https://doi.org/10.1109/CVPR.2016.596
[48]	A. B. Chan, Z. S. J. Liang, N. Vasconcelos, Privacy preserving crowd monitoring: counting people without people models or tracking, in 2008 IEEE Conference on Computer Vision and Pattern Recognition, (2008), 1–7. https://doi.org/10.1109/CVPR.2008.4587569
[49]	L. Wang, J. Shi, G. Song, Object detection combining recognition and segmentation, in Asian Conference on Computer Vision, 4843 (2007), 189.
[50]	E. Bondi, R. Jain, P. Aggrawal, S. Anand, R. Hannaford, A. Kapoor, et al., BIRDSAI: a dataset for detection and tracking in aerial thermal infrared videos, in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), (2020), 1736–1745. https://doi.org/10.1109/WACV45572.2020.9093284
[51]	L. Neumann, M. Karg, S. Zhang, C. Scharfenberger, E. Piegert, S. Mistr, et al., NightOwls: a pedestrians at night dataset, in Asian Conference on Computer Vision, (2019), 691–705. https://doi.org/10.1007/978-3-030-20887-5_43
[52]	K. Behrendt, L. Novak, R. Botros, A deep learning approach to traffic lights: Detection, tracking, and classification, in 2017 IEEE International Conference on Robotics and Automation (ICRA), (2017), 1370–1377. https://doi.org/10.1109/ICRA.2017.7989163
[53]	C. Ertler, J. Mislej, T. Ollmann, L. Porzi, G. Neuhold, Y. Kuang, The Mapillary Traffic sign dataset for detection and classification on a global scale, in European Conference on Computer Vision, (2020), 68–84. https://doi.org/10.48550/arXiv.1909.04422
[54]	J. Zhang, M. Huang, X. Jin, X. Li, A real-time chinese traffic sign detection algorithm based on modified yolov2, Algorithms, 10 (2017), 127. https://doi.org/10.3390/a10040127 doi: 10.3390/a10040127
[55]	D. Tabernik, D. Skočaj, Deep learning for large-scale traffic-sign detection and recognition, preprint, arXiv: 1904.00649.
[56]	Z. Zhu, D. Liang, S. Zhang, X. Huang, B. Li, S. Hu, Traffic-sign detection and classification in the wild, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 2110–2118. https://doi.org/10.1109/CVPR.2016.232
[57]	Z. Zhao, P. Zheng, S. T. Xu, X. Wu, Object detection with deep learning: a review, IEEE Trans. Neural Networks Learn. Syst., 30 (2019), 3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865 doi: 10.1109/TNNLS.2018.2876865
[58]	K. Li, G. Wan, G. Cheng, L. Meng, J. Han, Object detection in optical remote sensing images: A survey and a new benchmark, ISPRS J. Photogramm. Remote Sens., 159 (2020), 296–307. https://doi.org/10.1016/j.isprsjprs.2019.11.023 doi: 10.1016/j.isprsjprs.2019.11.023
[59]	K. Oksuz, B. C. Cam, S. Kalkan, E. Akbas, Imbalance problems in object detection: a review, preprint, arXiv: 1909.00169.
[60]	A. G. Menezes, G. de Moura, C. Alves, A. C. P. L. F. de Carvalho, Continual object detection: a review of definitions, strategies, and challenges, preprint, arXiv: 2205.15445.
[61]	L. Jiao, R. Zhang, F. Liu, S. Yang, B. Hou, L. Li, et al., New generation deep learning for video object detection: a survey, IEEE Trans. Neural Networks Learn. Syst., 33 (2022), 3195–3215. https://doi.org/10.1109/TNNLS.2021.3053249 doi: 10.1109/TNNLS.2021.3053249
[62]	L. Jiao, F. Zhang, F. Liu, S. Yang, L. Li, Z. Feng, et al., A survey of deep learning-based object detection, IEEE Access, 7 (2019), 128837–128868. https://doi.org/10.1109/ACCESS.2019.2939201 doi: 10.1109/ACCESS.2019.2939201
[63]	G. Chen, H. Wang, K. Chen, Z. Li, Z. Song, Y. Liu, et al., A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal, IEEE Trans. Syst. Man Cybern, Syst., 52 (2022), 936–953. https://doi.org/10.1109/TSMC.2020.3005231 doi: 10.1109/TSMC.2020.3005231
[64]	K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, et al., MMDetection: open mmlab detection toolbox and benchmark, preprint, arXiv: 1906.07155.
[65]	K. Tong, Y. Wu, F. Zhou, Recent advances in small object detection based on deep learning: A review, Image Vis. Comput., 97 (2020), 103910. https://doi.org/10.1016/j.imavis.2020.103910 doi: 10.1016/j.imavis.2020.103910
[66]	Y. Liu, P. Sun, N. Wergeles, Y. Shang, A survey and performance evaluation of deep learning methods for small object detection, Expert Syst. Appl., 172 (2021), 114602. https://doi.org/10.1016/j.eswa.2021.114602 doi: 10.1016/j.eswa.2021.114602
[67]	K. Tong, Y. Wu, Deep learning-based detection from the perspective of small or tiny objects: A survey, Image Vis. Comput., 123 (2022), 104471. https://doi.org/10.1016/j.imavis.2022.104471 doi: 10.1016/j.imavis.2022.104471
[68]	A. M. Rekavandi, L. Xu, F. Boussaid, A. K. Seghouane, S. Hoefs, M. Bennamoun, A guide to image and video based small object detection using deep learning: case study of maritime surveillance, preprint, arXiv: 2207.12926.
[69]	G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, J. Han, Towards large-scale small object detection: survey and benchmarks, preprint, arXiv: 2207.14096.
[70]	S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
[71]	M. Tan, R. Pang, Q. V. Le, EfficientDet: scalable and efficient object detection, preprint, arXiv: 1911.09070.
[72]	S. Liu, D. Huang, Y. Wang, Learning spatial fusion for single-shot object detection, preprint, arXiv: 1911.09516.
[73]	G. Ghiasi, T. Y. Lin, R. Pang, Q. V. Le, NAS-FPN: learning scalable feature pyramid architecture for object detection, preprint, arXiv: 1904.07392.
[74]	T. Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in 2017 IEEE International Conference on Computer Vision (ICCV), (2017), 2999–3007. https://doi.org/10.1109/ICCV.2017.324
[75]	Z. Li, F. Zhou, FSSD: feature fusion single shot multibox detector, preprint, arXiv: 1712.00960.
[76]	L. Cui, R. Ma, P. Lv, X. Jiang, Z. Gao, B. Zhou, et al., MDSSD: multi-scale deconvolutional single shot detector for small objects, preprint, arXiv: 1805.07009.
[77]	Y. Gong, X. Yu, Y. Ding, X. Peng, J. Zhao, Z. Han, Effective fusion factor in fpn for tiny object detection, preprint, arXiv: 2011.02298.
[78]	Z. Liu, G. Gao, L. Sun, Z. Fang, HRDNet: High-resolution detection network for small objects, preprint, arXiv: 2006.07607.
[79]	Z. Liu, G. Gao, L. Sun, L. Fang, IPG-Net: image pyramid guidance network for small object detection, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2020), 4422–4430. https://doi.org/10.1109/CVPRW50498.2020.00521
[80]	P. Y. Chen, J. W. Hsieh, C. Y. Wang, H. Y. M. Liao, Recursive hybrid fusion pyramid network for real-time small object detection on embedded devices, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2020), 1612–1621. https://doi.org/10.1109/CVPRW50498.2020.00209
[81]	C. Yang, Z. Huang, N. Wang, QueryDet: cascaded sparse query for accelerating high-resolution small object detection, preprint, arXiv: 2103.09136.
[82]	C. Deng, M. Wang, L. Liu, Y. Liu, Y. Jiang, Extended feature pyramid network for small object detection, IEEE Trans. Multimedia, 24 (2022), 1968–1979. https://doi.org/10.1109/TMM.2021.3074273 doi: 10.1109/TMM.2021.3074273
[83]	J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, S. Yan, Perceptual generative adversarial networks for small object detection, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 1951–1959. https://doi.org/10.1109/CVPR.2017.211
[84]	Y. Bai, Y. Zhang, M. Ding, B. Ghanem, SOD-MTGAN: small object detection via multi-task generative adversarial network, in European Conference on Computer Vision, 11217 (2018), 210–226. https://doi.org/10.1007/978-3-030-01261-8_13
[85]	J. Noh, W. Bae, W. Lee, J. Seo, G. Kim, Better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 9724–9733. https://doi.org/10.1109/ICCV.2019.00982
[86]	F. Zhang, L. Jiao, L. Li, F. Liu, X. Liu, MultiResolution attention extractor for small object detection, preprint, arXiv: 2006.05941.
[87]	J. Rabbi, N. Ray, M. Schubert, S. Chowdhury, D. Chao, Small-object detection in remote sensing images with end-to-end edge-enhanced gan and object detector network, preprint, arXiv: 2003.09085.
[88]	K. Jiang, Z. Wang, P. Yi, G. Wang, T. Lu, J. Jiang, Edge-enhanced GAN for remote sensing image super-resolution, IEEE Trans. Geosci. Remote Sens., 57 (2019), 5799–5812. https://doi.org/10.1109/TGRS.2019.2902431 doi: 10.1109/TGRS.2019.2902431
[89]	X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, et al., ESRGAN: enhanced super-resolution generative adversarial networks, in Proceedings of the European conference on computer vision (ECCV), (2018). https://doi.org/10.1007/978-3-030-11021-5_5
[90]	A. Jolicoeur-Martineau, The relativistic discriminator: a key element missing from standard gan, preprint, arXiv: 1807.00734.
[91]	I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial nets, Adv. Neural Inf. Process Syst., 27 (2014). https://doi.org/10.48550/arXiv.1406.2661 doi: 10.48550/arXiv.1406.2661
[92]	J. Cao, Y. Pang, S. Zhao, X. Li, High-level semantic networks for multi-scale object detection, IEEE Trans. Circuits Syst. Video Technol., 30 (2020), 3372–3386. https://doi.org/10.1109/TCSVT.2019.2950526 doi: 10.1109/TCSVT.2019.2950526
[93]	K. Zhang, Z. Zhang, Z. Li, Y. Qiao, Joint face detection and alignment using multitask cascaded convolutional networks, IEEE Signal Process. Lett., 23 (2016), 1499–1503. https://doi.org/10.1109/LSP.2016.2603342 doi: 10.1109/LSP.2016.2603342
[94]	Z. Hao, Y. Liu, H. Qin, J. Yan, X. Li, X. Hu, Scale-aware face detection, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 1913–1922. https://doi.org/10.1109/CVPR.2017.207
[95]	B. Singh, L. S. Davis, An analysis of scale invariance in object detection - snip, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 3578–3587. https://doi.org/10.1109/CVPR.2018.00377
[96]	B. Singh, M. Najibi, L. S. Davis, SNIPER: efficient multi-scale training, Adv. Neural Inf. Process Syst., 31 (2018). https://doi.org/10.48550/arXiv.1805.09300 doi: 10.48550/arXiv.1805.09300
[97]	Y. Kim, B. N. Kang, D. Kim, SAN: learning relationship between convolutional features for multi-scale object detection, in European Conference on Computer Vision, 11209 (2018), 328–343. https://doi.org/10.1007/978-3-030-01228-1_20
[98]	Y. Li, Y. Chen, N. Wang, Z. Zhang, Scale-aware trident networks for object detection, preprint, arXiv: 1901.01892.
[99]	J. Peng, M. Sun, Z. X. Zhang, T. Tan, J. Yan, POD: practical object detection with scale-sensitive network, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 9606–9615. https://doi.org/10.1109/ICCV.2019.00970
[100]	A. Oliva, A. Torralba, The role of context in object recognition, Trends Cogn. Sci., 11 (2007), 520–527. https://doi.org/10.1016/j.tics.2007.09.009 doi: 10.1016/j.tics.2007.09.009
[101]	S. Bell, C. L. Zitnick, K. Bala, R. Girshick, Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 2874–2883. https://doi.org/10.1109/CVPR.2016.314
[102]	C. Y. Fu, W. Liu, A. Ranga, A. Tyagi, A. C. Berg, DSSD: deconvolutional single shot detector, preprint, arXiv: 1701.06659.
[103]	W. Xiang, D. Q. Zhang, H. Yu, V. Athitsos, Context-aware single-shot detector, in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), (2018), 1784–1793. https://doi.org/10.1109/WACV.2018.00198
[104]	X. Chen, A. Gupta, Spatial memory for context reasoning in object detection, in 2017 IEEE International Conference on Computer Vision (ICCV), (2017), 4106–4116. https://doi.org/10.1109/ICCV.2017.440
[105]	K. Fu, J. Li, L. Ma, K. Mu, Y. Tian, Intrinsic relationship reasoning for small object detection, preprint, arXiv: 2009.00833.
[106]	J. S. Lim, M. Astrid, H. J. Yoon, S. I. Lee, Small object detection using context and attention, in 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), (2021), 181–186. https://doi.org/10.1109/ICAIIC51459.2021.9415217
[107]	A. Bochkovskiy, C. Y. Wang, H. Y. M. Liao, YOLOv4: optimal speed and accuracy of object detection, preprint, arXiv: 2004.10934.
[108]	H. Zhang, M. Cisse, Y. N. Dauphin, D. Lopez-Paz, Mixup: beyond empirical risk minimization, preprint, arXiv: 1710.09412.
[109]	S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, Y. Yoo, CutMix: regularization strategy to train strong classifiers with localizable features, in Proceedings of the IEEE International Conference on Computer Vision, (2019), 6023–6032. https://doi.org/10.1109/ICCV.2019.00612
[110]	M. Kisantal, Z. Wojna, J. Murawski, J. Naruniec, K. Cho, Augmentation for small object detection, preprint, arXiv: 1902.07296.
[111]	C. Chen, Y. Zhang, Q. Lv, S. Wei, X. Wang, X. Sun, et al., RRNet: a hybrid detector for object detection in drone-captured images, in 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), (2019), 100–108. https://doi.org/10.1109/ICCVW.2019.00018
[112]	F. O. Unel, B. O. Ozkalayci, C. Cigla, The power of tiling for small object detection, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2019), 582–591. https://doi.org/10.1109/CVPRW.2019.00084
[113]	Y. Chen, P. Zhang, Z. Li, Y. Li, X. Zhang, L. Qi, et al., Dynamic scale training for object detection, preprint, arXiv: 2004.12432.
[114]	B. Zoph, E. D. Cubuk, G. Ghiasi, T. Y. Lin, J. Shlens, Q. V. Le, Learning data augmentation strategies for object detection, in European Conference on Computer Vision, (2020), 566–583. https://doi.org/10.1007/978-3-030-58583-9_34
[115]	E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q. V. Le, AutoAugment: learning augmentation policies from data, preprint, arXiv: 1805.09501.
[116]	Y. Chen, Y. Li, T. Kong, L. Qi, R. Chu, L. Li, et al., Scale-aware automatic augmentation for object detection, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 9563–9572. https://doi.org/10.1109/CVPR46437.2021.00944
[117]	N. Samet, S. Hicsonmez, E. Akbas, Reducing label noise in anchor-free object detection, preprint, arXiv: 2008.01167.
[118]	K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, Q. Tian, CenterNet++ for object detection, preprint, arXiv: 2204.08394.
[119]	J. Wang, C. Xu, W. Yang, L. Yu, A normalized gaussian wasserstein distance for tiny object detection, preprint, arXiv: 2110.13389.
[120]	C. Xu, J. Wang, W. Yang, H. Yu, L. Yu, G. Xia, RFLA: Gaussian receptive field based label assignment for tiny object detection, in Proceedings of the European conference on computer vision (ECCV), (2022). https://doi.org/10.1007/978-3-031-20077-9_31
[121]	C. Lee, S. Park, H. Song, J. Ryu, S. Kim, H. Kim, et al., Interactive multi-class tiny-object detection, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), 14136–14145. https://doi.org/10.1109/CVPR52688.2022.01374
[122]	F. C. Akyon, S. Altinuc, A. Temi̇zel, Slicing aided hyper inference and fine-tuning for small object detection, preprint, arXiv: 2202.06934.
[123]	P. Hu, D. Ramanan, Finding tiny faces, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 1522–1530. https://doi.org/10.1109/CVPR.2017.166
[124]	S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, S. Z. Li, S.3FD: single shot scale-invariant face detector, in 2017 IEEE International Conference on Computer Vision (ICCV), (2017), 192–201. https://doi.org/10.1109/ICCV.2017.30
[125]	Y. Bai, Y. Zhang, M. Ding, B. Ghanem, Finding tiny faces in the wild with generative adversarial network, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 21–30. https://doi.org/10.1109/CVPR.2018.00010
[126]	P. Samangouei, M. Najibi, L. Davis, R. Chellappa, Face-magnet: magnifying feature maps to detect small faces, preprint, arXiv: 1803.05258.
[127]	C. Zhu, R. Tao, K. Luu, M. Savvides, Seeing small faces from robust anchor's perspective, preprint, arXiv: 1802.09058.
[128]	Y. Zhu, H. Cai, S. Zhang, C. Wang, Y. Xiong, TinaFace: strong but simple baseline for face detection, preprint, arXiv: 2011.13183.
[129]	J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, et al., Deformable convolutional networks, in 2017 IEEE International Conference on Computer Vision (ICCV), (2017), 764–773. https://doi.org/10.1109/ICCV.2017.89
[130]	Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, D. Ren, Distance-IoU loss: faster and better learning for bounding box regression, in Proceedings of the AAAI conference on artificial intelligence, 34 (2019), 12993–13000. https://doi.org/10.1609/aaai.v34i07.6999
[131]	A. Shrivastava, A. Gupta, R. Girshick, Training region-based object detectors with online hard example mining, in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), 761–769. https://doi.org/10.1109/CVPR.2016.89
[132]	Z. Zhang, W. Shen, S. Qiao, Y. Wang, B. Wang, A. Yuille, Robust face detection via learning small faces on hard images, in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, (2020), 1361–1370. https://doi.org/10.48550/arXiv.1811.11662
[133]	T. Song, L. Sun, D. Xie, H. Sun, S. Pu, Small-scale pedestrian detection based on somatic topology localization and temporal feature aggregation, preprint, arXiv: 1807.01438.
[134]	S. Das, P. S. Mukherjee, U. Bhattacharya, Seek and you will find: a new optimized framework for efficient detection of pedestrian, preprint, arXiv: 1912.10241.
[135]	W. Liu, S. Liao, W. Ren, W. Hu, Y. Yu, High-level semantic feature detection: a new perspective for pedestrian detection, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 5182–5191. https://doi.org/10.1109/CVPR.2019.00533
[136]	X. Yu, Y. Gong, N. Jiang, Q. Ye, Z. Han, Scale match for tiny person detection, in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), (2020), 1246–1254. https://doi.org/10.1109/WACV45572.2020.9093394
[137]	D. Božić-Štulić, Ž. Marušić, S. Gotovac, Deep learning approach in aerial imagery for supporting land search and rescue missions, Int. J. Comput Vis., 127 (2019), 1256–1278. https://doi.org/10.1007/s11263-019-01177-1 doi: 10.1007/s11263-019-01177-1
[138]	G. Adaimi, S. Kreiss, A. Alahi, Perceiving traffic from aerial images, preprint, arXiv: 2009.07611.
[139]	C. Gheorghe, N. Filip, Road traffic analysis using unmanned aerial vehicle and image processing algorithms, in 2022 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR), (2022), 1–5. https://doi.org/10.1109/AQTR55203.2022.9802058
[140]	J. Han, J. Ding, J. Li, G. S. Xia, Align deep features for oriented object detection, IEEE Trans. Geosci. Remote Sens., 60 (2022), 5602511. https://doi.org/10.1109/TGRS.2021.3062048 doi: 10.1109/TGRS.2021.3062048
[141]	X. Yang, J. Yang, J. Yan, Y. Zhang, T. Zhang, Z. Guo, et al., SCRDet: towards more robust detection for small, cluttered and rotated objects, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 8231–8240. https://doi.org/10.1109/ICCV.2019.00832
[142]	X. Xie, G. Cheng, J. Wang, X. Yao, J. Han, Oriented r-cnn for object detection, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 3500–3509. https://doi.org/10.1109/ICCV48922.2021.00350
[143]	R. Qin, Q. Liu, G. Gao, D. Huang, Y. Wang, MRDet: a multi-head network for accurate oriented object detection in aerial images, preprint, arXiv: 2012.13135.
[144]	X. Zhang, E. Izquierdo, K. Chandramouli, Dense and small object detection in uav vision based on cascade network, in 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), (2019), 118–126. https://doi.org/10.1109/ICCVW.2019.00020
[145]	J. Yi, P. Wu, B. Liu, Q. Huang, H. Qu, D. Metaxas, Oriented object detection in aerial images with box boundary-aware vectors, in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, (2021), 2150–2159. https://doi.org/10.1109/WACV48630.2021.00220
[146]	O. Ronneberger, P. Fischer, T. Brox, U-Net: convolutional networks for biomedical image segmentation, in Medical Image Computing and Computer-Assisted Intervention, (2015), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
[147]	J. Han, J. Ding, N. Xue, G. S. Xia, ReDet: a rotation-equivariant detector for aerial object detection, preprint, arXiv: 2103.07733.
[148]	J. Ding, N. Xue, Y. Long, G. S. Xia, Q. Lu, Learning ROI transformer for oriented object detection in aerial images, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 2849–2858. https://doi.org/10.1109/CVPR.2019.00296
[149]	M. Zand, A. Etemad, M. Greenspan, Oriented bounding boxes for small and freely rotated objects, IEEE Trans. Geosci. Remote Sensing, 60 (2022), 1–15. https://doi.org/10.1109/TGRS.2021.3076050 doi: 10.1109/TGRS.2021.3076050
[150]	Z. Yang, S. Liu, H. Hu, L. Wang, S. Lin, RepPoints: point set representation for object detection, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 9657–9666. https://doi.org/10.1109/ICCV.2019.00975
[151]	W. Li, Y. Chen, K. Hu, J. Zhu, Oriented reppoints for aerial object detection, preprint, arXiv: 2105.11111.
[152]	C. Xu, J. Wang, W. Yang, L. Yu, Dot distance for tiny object detection in aerial images, in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2021), 1192–1201, https://doi.org/10.1109/CVPRW53098.2021.00130
[153]	X. Fang, F. Hu, M. Yang, T. Zhu, R. Bi, Z. Zhang, Z. Gao, Small object detection in remote sensing images based on super-resolution, Pattern Recognit. Lett., 153 (2022), 107–112. https://doi.org/10.1016/j.patrec.2021.11.027.5 doi: 10.1016/j.patrec.2021.11.027.5
[154]	Y. Li, Q. Huang, X. Pei, Y. Chen, L. Jiao, R. Shang, Cross-layer attention network for small object detection in remote sensing imagery, IEEE J. Sel. Top Appl. Earth Obs. Remote Sens., 14 (2021), 2148–2161. https://doi.org/10.1109/JSTARS.2020.3046482 doi: 10.1109/JSTARS.2020.3046482
[155]	O. C. Koyun, R. K. Keser, İ. B. Akkaya, B. U. Töreyin, Focus-and-detect:a small object detection framework for aerial images, Signal Process. Image Commun., 104 (2022), 116675. https://doi.org/10.1016/j.image.2022.116675 doi: 10.1016/j.image.2022.116675
[156]	B. F. Klare, B. Klein, E. Taborsky, A. Blanton, J. Cheney, K. Allen, et al., Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 1931–1939. https://doi.org/10.1109/CVPR.2015.7298803
[157]	Y. Yuan, W. Yang, W. Ren, J. Liu, W. J. Scheirer, Z. Wang, UG²⁺: a collective benchmark effort for evaluating and advancing image understanding in poor visibility environments, preprint, arXiv: 1904.04474.
[158]	H. Nada, V. A. Sindagi, H. Zhang, V. M. Patel, Pushing the limits of unconstrained face detection: a challenge ataset and baseline results, in 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), (2018), 1–10. https://doi.org/10.1109/BTAS.2018.8698561
[159]	M. K. Yucel, Y. C. Bilge, O. Oguz, N. Ikizler-Cinbis, P. Duygulu, R. G. Cinbis, Wildest faces: face detection and recognition in violent settings, preprint, arXiv: 1805.07566.
[160]	S. Zhang, Y. Xie, J. Wan, H. Xia, S. Z. Li, G. Guo, WiderPerson: A diverse dataset for dense pedestrian detection in the wild, IEEE Trans. Multimedia, 22 (2020), 380–393. https://doi.org/10.1109/TMM.2019.2929005 doi: 10.1109/TMM.2019.2929005
[161]	M. Braun, S. Krebs, F. Flohr, D. M. Gavrila, The eurocity persons dataset: a novel benchmark for object detection, preprint, arXiv: 1805.07193.
[162]	S. Zhang, R. Benenson, B. Schiele, CityPersons: a diverse dataset for pedestrian detection, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 4457–4465. https://doi.org/10.1109/CVPR.2017.474
[163]	P. Dollar, C. Wojek, B. Schiele, P. Perona, Pedestrian detection: a benchmark, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 304–311. https://doi.org/10.1109/CVPR.2009.5206631
[164]	P. Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, et al., Detection and tracking meet drones challenge, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2022), 7380–7399. https://doi.org/10.1109/TPAMI.2021.3119563 doi: 10.1109/TPAMI.2021.3119563
[165]	D. Du, Y. Qi, H. Yu, Y. Yang, K. Duan, G. Li, et al., The unmanned aerial vehicle benchmark: object detection and tracking, in Proceedings of the European Conference on Computer Vision (ECCV), (2018), 370–386. https://doi.org/10.1007/s11263-019-01266-1
[166]	G. S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, et al., DOTA: a large-scale dataset for object detection in aerial images, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), 3974–3983. https://doi.org/10.1109/CVPR.2018.00418
[167]	G. Cheng, J. Han, P. Zhou, L. Guo, Multi-class geospatial object detection and geographic image classification based on collection of part detectors, ISPRS J. Photogramm. Remote Sens., 98 (2014), 119–132. https://doi.org/10.1016/j.isprsjprs.2014.10.002 doi: 10.1016/j.isprsjprs.2014.10.002
[168]	H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, J. Jiao, Orientation robust object detection in aerial images using deep convolutional neural network, in 2015 IEEE International Conference on Image Processing (ICIP), (2015), 3735–3739. https://doi.org/10.1109/ICIP.2015.7351502
[169]	L. Tuggener, I. Elezi, J. Schmidhuber, M. Pelillo, T. Stadelmann, DeepScores-a dataset for segmentation, detection and classification of tiny objects, in 2018 24th International Conference on Pattern Recognition (ICPR), (2018), 3704–3709. https://doi.org/10.1109/ICPR.2018.8545307
[170]	A. Geiger, P. Lenz, R. Urtasun, Are we ready for autonomous driving? The KITTI vision benchmark suite, in 2012 IEEE Conference on Computer Vision and Pattern Recognition, (2012), 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074
[171]	S. Song, S. P. Lichtenberg, J. Xiao, SUN RGB-D: a rgb-d scene understanding benchmark suite, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 567–576. https://doi.org/10.1109/CVPR.2015.7298655
[172]	S. Zhang, L. Wen, X. Bian, Z. Lei, S. Z. Li, Single-shot refinement neural network for object detection, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 4203–4212. https://doi.org/10.1109/CVPR.2018.00442
[173]	J. Cao, H. Cholakkal, R. M. Anwer, F. S. Khan, Y. Pang, L. Shao, D2Det: towards high quality object detection and instance segmentation, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 11482–11491.
[174]	Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, J. Feng, Dual path networks, Adv. Neural Inf. Process Syst., 30 (2017). https://doi.org/10.48550/arXiv.1707.01629 doi: 10.48550/arXiv.1707.01629
[175]	Y. Zhu, C. Zhao, J. Wang, X. Zhao, Y. Wu, H. Lu, CoupleNet: coupling global structure with local parts for object detection, in 2017 IEEE International Conference on Computer Vision (ICCV), (2017), 4146–4154. https://doi.org/10.1109/ICCV.2017.444
[176]	H. Hu, J. Gu, Z. Zhang, J. Dai, Y. Wei, Relation networks for object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), 3588–3597. https://doi.org/10.1109/CVPR.2018.00378
[177]	L. Tychsen-Smith, L. Petersson, Improving object localization with fitness nms and bounded iou loss, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), 6877–6885. https://doi.org/10.1109/CVPR.2018.00719
[178]	S. Xu, X. Wang, W. Lv, Q. Chang, C. Cui, K. Deng, et al., PP-YOLOE: an evolved version of YOLO, preprint, arXiv: 2203.16250.
[179]	J. Leng, Y. Ren, W. Jiang, X. Sun, Y. Wang, Realize your surroundings: exploiting context information for small object detection, Neurocomputing, 433 (2021). https://doi.org/10.1016/j.neucom.2020.12.093 doi: 10.1016/j.neucom.2020.12.093
[180]	C. L. Zitnick, P. Dollár, Edge Boxes: locating object proposals from edges, in European Conference on Computer Vision, (2014), 391–405. https://doi.org/10.1007/978-3-319-10602-1_26
[181]	A. Howard, M. Sandler, G. Chu, L. C. Chen, B. Chen, M. Tan, et al., Searching for MobileNetV3, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 1314–1324. https://doi.org/10.1109/ICCV.2019.00140
[182]	X. Tang, D. K. Du, Z. He, J. Liu, PyramidBox: a context-assisted single shot face detector, in Proceedings of the European Conference on Computer Vision (ECCV), (2018), 797–813. https://doi.org/10.1007/978-3-030-01240-3_49
[183]	J. Deng, J. Guo, Y. Zhou, J. Yu, I. Kotsia, S. Zafeiriou, RetinaFace: single-stage dense face localisation in the wild, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 5203–5212. https://doi.org/10.1109/CVPR42600.2020.00525
[184]	Z. Liu, J. Du, F. Tian, J. Wen, MR-CNN: a multi-scale region-based convolutional neural network for small traffic sign recognition, IEEE Access, 7 (2019), 57120–57128. https://doi.org/10.1109/ACCESS.2019.2913882 doi: 10.1109/ACCESS.2019.2913882
[185]	X. Lu, B. Li, Y. Yue, Q. Li, J. Yan, Grid R-CNN, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 7355–7364, https://doi.org/10.1109/CVPR.2019.00754.(2018).
[186]	J. Li, Y. Wang, C. Wang, Y. Tai, J. Qian, J. Yang, et al., DSFD: dual shot face detector, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 5060–5069. https://doi.org/10.1109/CVPR.2019.00520
[187]	X. Zhang, F. Wan, C. Liu, R. Ji, Q. Ye, FreeAnchor: learning to match anchors for visual object detection, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2022), 3096–3109. https://doi.org/10.48550/arXiv.1909.02466 doi: 10.48550/arXiv.1909.02466
[188]	J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, D. Lin, Libra R-CNN: towards balanced learning for object detection, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 821–830. https://doi.org/10.1109/CVPR.2019.00091
[189]	G. Zhang, S. Lu, W. Zhang, CAD-Net: a context-aware detection network for objects in remote sensing imagery, IEEE Trans. Geosci. Remote Sens., 57 (2019), 10015–10024. https://doi.org/10.1109/TGRS.2019.2930982 doi: 10.1109/TGRS.2019.2930982
[190]	N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in European Conference on Computer Vision, 12346 (2020), 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
[191]	S. Li, F. Liu, L. Jiao, X. Liu, P. Chen, Learning salient feature for salient object detection without labels, IEEE Trans. Cybern., 53 (2022), 1012–1025. https://doi.org/10.1109/TCYB.2022.3209978 doi: 10.1109/TCYB.2022.3209978
[192]	F. Liu, X. Qian, L. Jiao, X. Zhang, L. Li, Y. Cui, Contrastive learning-based dual dynamic gcn for sar image scene classification, IEEE Trans. Neural Networks Learn Syst., (2022), 1–15. https://doi.org/10.1109/TNNLS.2022.3174873 doi: 10.1109/TNNLS.2022.3174873
[193]	Y. Du, F. Liu, L. Jiao, Z. Hao, S. Li, X. Liu, et al., Augmentative contrastive learning for one-shot object detection, Neurocomputing, 513 (2022), 13–24. https://doi.org/10.1016/j.neucom.2022.09.125 doi: 10.1016/j.neucom.2022.09.125

This article has been cited by:

1.	João Marques, Mário Franco, Margarida Rodrigues, International universities-firms cooperation as a mechanism for environmental sustainability: a case study of EdgeWise, 2022, 2050-7003, 10.1108/JARHE-05-2022-0170
2.	Vitor Miguel Ribeiro, Pioneering paradigms: unraveling niche opportunities in green finance through bibliometric analysis of nation brands and brand culture, 2024, 6, 2643-1092, 287, 10.3934/GF.2024012
3.	Tulin Dzhengiz, Leona A. Henry, Khaleel Malik, The Role of Partnership Portfolios for Sustainability in Addressing the Stability-Change Paradox: Dong/Orsted’s Transition From Fossil Fuels to Renewables, 2024, 63, 0007-6503, 1518, 10.1177/00076503231211214
4.	Mariami Denosashvili, 2025, 18, 978-80-7694-102-1, 128, 10.32725/978-80-7694-102-1.18

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematical Biosciences and Engineering

3.9

Metrics

Article views(6289) PDF downloads(1128) Cited by(16)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Mathematical Biosciences and Engineering

Deep learning-based small object detection: A survey