Research article

Single wave solutions of the fractional Landau-Ginzburg-Higgs equation in space-time with accuracy via the beta derivative and mEDAM approach

  • The nonlinear wave behavior in the tropical and mid-latitude troposphere has been simulated using the space-time fractional Landau-Ginzburg-Higgs model. These waves are the consequence of interactions between equatorial and mid-latitude waves, fluid flow in dynamic systems, weak scattering, and extended linkages. The mEDAM method has been used to obtain new and extended closed-form solitary wave solutions of the previously published nonlinear fractional partial differential equation via the beta derivative. A wave transformation converts the fractional-order equation into an ordinary differential equation. Several soliton, single, kink, double, triple, anti-kink, and other soliton types are examples of known conventional wave shapes. The answers are displayed using the latest Python code, which enhances the usage of 2D and 3D plotlines, as well as contour plotlines, to emphasise the tangible utility of the solutions. The results of the study are clear, flexible, and easier to replicate.

    Citation: Ikram Ullah, Muhammad Bilal, Javed Iqbal, Hasan Bulut, Funda Turk. Single wave solutions of the fractional Landau-Ginzburg-Higgs equation in space-time with accuracy via the beta derivative and mEDAM approach[J]. AIMS Mathematics, 2025, 10(1): 672-693. doi: 10.3934/math.2025030

    Related Papers:

    [1] Amna Mumtaz, Muhammad Shakeel, Abdul Manan, Marouan Kouki, Nehad Ali Shah . Bifurcation and chaos analysis of the Kadomtsev Petviashvili-modified equal width equation using a novel analytical method: describing ocean waves. AIMS Mathematics, 2025, 10(4): 9516-9538. doi: 10.3934/math.2025439
    [2] Gaukhar Shaikhova, Bayan Kutum, Ratbay Myrzakulov . Periodic traveling wave, bright and dark soliton solutions of the (2+1)-dimensional complex modified Korteweg-de Vries system of equations by using three different methods. AIMS Mathematics, 2022, 7(10): 18948-18970. doi: 10.3934/math.20221043
    [3] Umair Ali, Sanaullah Mastoi, Wan Ainun Mior Othman, Mostafa M. A Khater, Muhammad Sohail . Computation of traveling wave solution for nonlinear variable-order fractional model of modified equal width equation. AIMS Mathematics, 2021, 6(9): 10055-10069. doi: 10.3934/math.2021584
    [4] Naveed Iqbal, Meshari Alesemi . Soliton dynamics in the (2+1)-dimensional Nizhnik-Novikov-Veselov system via the Riccati modified extended simple equation method. AIMS Mathematics, 2025, 10(2): 3306-3333. doi: 10.3934/math.2025154
    [5] Ikram Ullah, Muhammad Bilal, Javed Iqbal, Hasan Bulut, Funda Turk . Single wave solutions of the fractional Landau-Ginzburg-Higgs equation in space-time with accuracy via the beta derivative and mEDAM approach. AIMS Mathematics, 2025, 10(1): 672-693. doi: 10.3934/math.2025030
    [6] Naveed Iqbal, Muhammad Bilal Riaz, Meshari Alesemi, Taher S. Hassan, Ali M. Mahnashi, Ahmad Shafee . Reliable analysis for obtaining exact soliton solutions of (2+1)-dimensional Chaffee-Infante equation. AIMS Mathematics, 2024, 9(6): 16666-16686. doi: 10.3934/math.2024808
    [7] Naher Mohammed A. Alsafri, Hamad Zogan . Probing the diversity of kink solitons in nonlinear generalised Zakharov-Kuznetsov-Benjamin-Bona-Mahony dynamical model. AIMS Mathematics, 2024, 9(12): 34886-34905. doi: 10.3934/math.20241661
    [8] Rehana Ashraf, Saima Rashid, Fahd Jarad, Ali Althobaiti . Numerical solutions of fuzzy equal width models via generalized fuzzy fractional derivative operators. AIMS Mathematics, 2022, 7(2): 2695-2728. doi: 10.3934/math.2022152
    [9] Abeer S. Khalifa, Hamdy M. Ahmed, Niveen M. Badra, Jalil Manafian, Khaled H. Mahmoud, Kottakkaran Sooppy Nisar, Wafaa B. Rabie . Derivation of some solitary wave solutions for the (3+1)- dimensional pKP-BKP equation via the IME tanh function method. AIMS Mathematics, 2024, 9(10): 27704-27720. doi: 10.3934/math.20241345
    [10] M. Mossa Al-Sawalha, Rasool Shah, Kamsing Nonlaopon, Osama Y. Ababneh . Numerical investigation of fractional-order wave-like equation. AIMS Mathematics, 2023, 8(3): 5281-5302. doi: 10.3934/math.2023265
  • The nonlinear wave behavior in the tropical and mid-latitude troposphere has been simulated using the space-time fractional Landau-Ginzburg-Higgs model. These waves are the consequence of interactions between equatorial and mid-latitude waves, fluid flow in dynamic systems, weak scattering, and extended linkages. The mEDAM method has been used to obtain new and extended closed-form solitary wave solutions of the previously published nonlinear fractional partial differential equation via the beta derivative. A wave transformation converts the fractional-order equation into an ordinary differential equation. Several soliton, single, kink, double, triple, anti-kink, and other soliton types are examples of known conventional wave shapes. The answers are displayed using the latest Python code, which enhances the usage of 2D and 3D plotlines, as well as contour plotlines, to emphasise the tangible utility of the solutions. The results of the study are clear, flexible, and easier to replicate.



    This paper is placed in the area of mathematical linguistics and presents a model for calculating the complexity of natural languages.

    Calculating the complexity of languages is very important for understanding their structures, use and evolution. The measurement of language complexity offers valuable insights into theoretical linguistics, language learning and teaching and the development of language technologies. Moreover, by calculating linguistic complexity, we gain a deeper and more comprehensive understanding of the richness and diversity of languages worldwide [1,2,3,4,5,6,7].

    Despite the growing interest in linguistic complexity and the numerous research works dedicated to the study of complexity, we still lack a clear answer regarding the inherent differences in complexity among languages [8,9]. Understanding the complexity of languages remains a challenging task due to the diverse types of complexity, the lack of standardized measures and the varying definitions used in the field [10,11,12]. As a result, to establish a universally applicable method for calculating linguistic complexity, understanding the different complexity among language remains a challenge.

    This paper aims to provide a method for calculating the complexity of natural languages by establishing a relationship between language complexity [10] and universality [13]. Our objective is to determine the complexity of a natural language by considering its level of universality. The fundamental idea underlying our proposal is that languages sharing more universal features will exhibit a lower level of difficulty in learning one from another and vice versa. This approach eliminates the need to compare languages pairwise and considering all their rules in order to calculate their complexity. Instead, we determine the degree of universality exhibited by a language and utilize this information to calculate its complexity. This perspective offers an efficient and objective means of assessing linguistic complexity.

    In the mathematical model proposed in this paper, both complexity and universality are considered as non-discrete concepts while in traditional linguistics, complexity and universality have been treated as discrete categories; we approach them as gradual or fuzzy categories. To accomplish this, we utilize a fuzzy model to handle evaluative expressions, viewing complexity and universality as "evaluations" of languages and their rules.

    We consider it important to approach the concepts of complexity and universality from a fuzzy perspective for two reasons. On one hand, a linguistic feature cannot simply be classified as universal or non-universal; rather, it can exhibit various degrees of universality. This notion has significant implications when calculating the universality of languages, as it involves not only considering the number of universals a language possesses but also taking into account the degree or level of universality associated with those features/rules. For instance, a language with many high-level universals will have a higher level of universality than a language with numerous lower-level universals. On the other hand, a language cannot be simply categorized as complex or non-complex, but it can present different levels of complexity. These complexity levels are determined on the levels of universality found in the languages. We establish a relationship between the two concepts as follows: A high value of universality signifies that the language shares many characteristics with other languages, resulting in a relatively lower level of complexity (making it less challenging to learn). Conversely, a language with low levels of universality will possess many exclusive and non-shared features, leading to a higher level of complexity (making it more difficult to learn).

    To show the effectiveness of our proposed calculation model, we present a proof of concept in which we use a mathematical theory of evaluative expressions from fuzzy natural logic (FNL) to assess the complexity of 143 languages based on their level of universality. The level of universality for each language is determined by measuring the degree of universality of eight Greenberg universals [14]. These eight out of 45 universals are filtered out since only these eight gather three essential criteria for our work: They can be formalized with the Universal Dependencies' (UD) annotation scheme, they can be formalized using the Grew tool and they are taking into account morpho-syntactic linguistic features, which is the domain that we want to explore. We analyze the presence of the eight universals in the languages within our corpus.

    The results show that our mathematical model allows for the establishment of a trichotomous scale, representing the varying values of universality and complexity and revealing an inversely proportional relationship between the degree of universality and the observed level of complexity in the languages.

    In summary, this paper introduces a model that explores the complexity of natural languages by linking it with the notion of universality and considers both complexity and universality as fuzzy concepts. Moreover, it presents a proof of concept that not only validates the effectiveness of our proposed model for calculating language complexity based on universality, but also shows the potential of using a mathematical model of fuzzy evaluative expressions in linguistic analysis.

    The paper is organized as follows. Initially, we provide a concise presentation of the linguistic and formal background to contextualize the contributions presented in this article. In the linguistic part, we briefly introduce linguistic universals and discuss diverse approaches to tackling linguistic complexity. The subsequent section outlines the formal tools employed in this study: Universal dependencies, Grew-Match and evaluative expressions. Moving forward, we elaborate on the methodology and present the obtained results. Finally, we engage in a comprehensive discussion of the findings and the conclusions derived from this analysis.

    Linguistic complexity can be defined as a multifaceted concept that can be analyzed from different perspectives [15]. From a structural perspective, complexity is understood as a formal property of linguistic systems related to the number of elements. From a cognitive perspective, complexity is defined as the processing cost of linguistic structures. From a developmental point of view, complexity will be determined by the order in which linguistic structures emerge or are learned in the processes of acquisition and learning first and second languages.

    The study of linguistic complexity has witnessed significant changes in recent years. From denying the possibility of calculating complexity during the 20th century, linguistics has now shifted its focus to a growing interest in understanding and measuring linguistic complexity. The resurgence in interest began around 2001 with an article by McWhorter in Linguistic Typology's special issue [16]. The once prevailing dogma of equicomplexity, which claimed that all languages are equal in total complexity, has been questioned by researchers in the 21st century [17,18]. The increasing number of works on complexity in theoretical and applied linguistics highlights the interest in finding a method to measure linguistic complexity [19,20,21,22,23,24,25,26,27,28].

    Despite the growing interest in linguistic complexity studies in recent years and the general acknowledgment of varying levels of complexity among languages, accurately quantifying these differences remains a challenge. This difficulty may arise from the diverse interpretations of complexity within the field of natural language study.

    Among the possible definitions of the term, one of the most recurrent dichotomies in the literature is the one that distinguishes absolute complexity and relative complexity [29]. Additional dichotomies in the literature include global complexity versus local complexity [29] or the distinction between system complexity and structural complexity [30].

    The difference between absolute complexity and relative complexity is established as follows:

    Absolute complexity is defined as an objective property of the system and is calculated in terms of the number of parts of the system, the number of interrelationships between the parts or the length of the description of a phenomenon. This approach is common in typology studies [16,30].

    Relative complexity takes into account language users and is identified with the difficulty or cost of processing, learning or acquisition. It is common in sociolinguistic and psycholinguistic studies [31].

    Researchers have proposed diverse measures to capture these two types of linguistic complexity, leading to a wide range of approaches. These measures encompass a vast range of formalisms, which can be categorized into two main types:

    ● Measures of absolute complexity, such as the count of categories or rules, description length, ambiguity, redundancy etc. [29].

    ● Measures of relative complexity, which grapple with the challenge of determining the type of task (learning, acquisition, processing) and the type of agent (speaker, listener, child, adult) to consider. For instance, complexity measures related to second language learning (L2) in adults [31,32] or processing complexity [33] are examples of assessments based on difficulty/cost considerations.

    Moreover, researchers have explored other disciplines to find tools for calculating language complexity. Information theory, employing formalisms like Shannon entropy or Kolmogorov complexity [29,30,34], complex systems theory [35] or computational linguistics [36] are some instances of disciplines that have offered quantitative measures for evaluating linguistic complexity.

    Most studies on linguistic complexity have primarily focused on absolute complexity [2,20,27,29], while relative complexity [3,23,28], though conceptually consistent, has not been thoroughly explored. Approaching complexity analysis from a relative perspective poses challenges in determining the specific task (learning, acquisition, processing) and the type of agent (speaker, listener, child, adult) to consider. Some authors argue that relative complexity should be examined within the context of adult (user) second language learning (L2) [31,32]. However, many studies that explore complexity in L2 processes primarily measure the complexity of the target language, neglecting the potential influence of the learner's mother tongue on relative complexity [37,38,39,40,41,42]. Observational and experimental methods used to calculate complexity in L2 processes may encounter difficulties related to the impact of extralinguistic factors, which could influence the process and affect complexity measurements. As a result, the objectivity of such analyses for cross-linguistic comparison may be compromised, as the perceived difficulty might depend on the specific speakers considered in the experiments. Moreover, the lack of standardized definitions and measures has led to inconsistent and noncomparable results in this area [40].

    In this paper, we argue that a comprehensive assessment of linguistic complexity requires considering both absolute and relative complexity together. We want to emphasize the impact of the speakers' mother tongue in studies focusing on relative complexity, particularly in the context of L2. The mother tongue plays a significant role in either facilitating or complicating the learning process of the target language, thus influencing the determination of relative linguistic complexity. Additionally, we would like to show the essential methodological advantages offered by mathematical tools in objectively calculating the relative complexity of natural languages.

    A language universal can be defined as "a grammatical characteristic that can be reasonably hypothesized to be present in all or most human languages" [13]. The study of universals in a cross-linguistic way is within the discipline known as linguistic typology, which is "the systematic cross-linguistic comparison that aims to discover the underlying universal properties of human language" [43]. This discipline with its roots in the 19th century underwent a revolution in 1963 thanks to the paradigm shift proposed by Greenberg [14].

    The study presented by Greenberg [14] takes the form of the formulation of 45 different universals in language thanks to the comparison of different grammatical features extracted from the grammars of 30 different languages varied and representative of the languages of the world.

    In the years following Greenberg, the search for such universals continued. The main difference with this pioneering work was the inclusion of many more languages in the selection in order to try to formulate more reliable universals. The search for new typological conditions leading to new universals was also attempted, and a new research methodology was explored as opposed to grammars (second-hand data): Questionnaires [44]. The results were not very different from those achieved by Greenberg years earlier, and interest in the subject gradually diminished. However, in recent years, we have observed a new boom in typological studies working with universals. The main trigger of such a change in trend may be the new possibilities opened up by cross-disciplinary collaboration with natural language processing and [45,46]. This collaboration has made it possible to have new data to work with new methodologies (linguistic corpora and a quantitative approach with real texts) and tools that allow effective processing of a large amount of data previously unattainable, also giving rise to new metrics [47] and tools offering visualizations of previously unknown data as Typometrics [48].

    In the literature, universals are usually classified taking into account two criteria: Frequency and extension [49]. Considering their extension, we distinguish two types of universals:

    Unrestricted universals are understood as descriptive generalizations of the languages of the world. They are typical universals formulated in the field of generativism, a perspective not considered in this paper, with structures such as: "In all languages, Y" [13].

    Implicational universals are a parameter globally favored under certain structural conditions. In this case, we find rules in a conditional structure, as in: "In all languages, if there is X, then there is Y".

    Considering their frequency, universals can be [49]:

    Absolute universals are formulations applicable to all the languages of the world. Absolute universals are formulated as: "All languages have Y".

    Statistical universals are formulations that exhibit a high frequency of adoption in the languages of the world without being absolute. Statistical universals usually are a formulation similar to: "Almost all languages have Y".

    The most frequent and fruitful universals in Greenberg's proposal tend to be implicational and statistical universals. The universals proposed by Greenberg could be divided into three main groups:

    (1) Syntactic universals about word order. For example, universal 1: "In declarative sentences with nominal subject and object, the dominant order is almost always one in which the subject precedes the object."

    (2) Morphological universals about word inflection and derivation. For example, universal 29: "If a language has inflection, it always has derivation".

    (3) Morphological universals about word features. For example, universal 36: "If a language has the category of gender, it always has the category of number".

    In this paper, we work with Greenberg universals, specifically with the above third group of morphological universals about word features. This means that we will only consider the universals that refer to the linguistic domain of this group: The morphological characteristics present (or not) in the different languages and their correlation.

    Universal Dependencies (UD) [50] is an open repository of homogeneously annotated multilingual corpora. This means that in this resource, we find large collections of different real texts (241) corresponding to different languages (143), in the 2.11 version. The main differentiating aspect of this resource compared to others is the homogeneous annotation. This means that the labeling of the texts in the different languages has been done using the same methodology and the same labels, which facilitates comparison and the drawing of conclusions in multiple languages. For this purpose, the labels and methodology proposed by Google [51] in relation to part of speech are used. On the other hand, for the syntactic analysis of the different sentences, the guidelines and terminology of the Stanford Dependencies [52] are used.

    Multiple researchers from all over the world are updating the database with more texts or more languages. Although most languages are Indo-European and it is still difficult to avoid such bias, there is a remarkable effort to try to include languages with a marginal or nonexistent representation in the linguistic tradition. This can provide very interesting data, especially in studies such as the one presented here.

    Moreover, the computational annotation of most of the morphological and syntactic data of the analyzed texts allows a quantitative and more objective approach to linguistic phenomena. The claims that can be formulated have a mathematical backing and are more fine-grained. In addition, it also allows new metrics to be obtained in an automatic and efficient way [53].

    The morphological and syntactic data in the texts of the different UD languages are annotated in text, which is not processed or normalized. In order to know the invariance of occurrences, all of this data must be automatically cross-checked. Therefore, Grew-Match is ideal since it is able to manage non-normalized information text from UD efficiently [54].

    This tool has both an online interface and a Python implementation. It allows both the query of linguistic occurrences in a given corpus and a comparison of the results in multiple languages. Both quantitative results and qualitative examples of occurrences can be accessed at the same time. In order to carry out the queries, one must know the tool's formal language, which will be the syntax that supports the labels of the UD annotation system.

    However, the tool also has an alternative labeling system to the original UD, named Surface Universal Dependencies (SUD), which is the one we use. In this case, it is an updated and improved version of syntactic annotation. It offers a representation with a higher weight of syntactic criteria and a lower semantic weight when deciding which word acts as head and which word acts as dependent.

    The queries that can be performed are unrestricted and can contain different complex structures within themselves. The result of the query makes it possible to obtain quantitative data on the occurrences of specific linguistic structures in real texts, which allows comparison and the formulation or revision of universals. In addition, the mere formalization to be carried out is already of great interest since it will enable us to offer a linguistic formalization.

    We propose to compute the assumed concepts of linguistic universality and complexity in a continuum with natural language words.

    Fuzzy natural logic (FNL) is based on six fundamental concepts, which are the following: The concept of fuzzy set, Lakoff's universal meaning hypothesis, the evaluative expressions, the concept of possible world and the concepts of intension and extension. The most remarkable aspect of this work is the theory of evaluative linguistic expressions.

    An evaluative linguistic expression is defined as an expression used by speakers when they want to refer to the characteristics of objects or their parts [55,56,57,58,59,60,61], such as length, age, depth, thickness, beauty, kindness, among others. We will consider "universality" and "complexity" as evaluative expressions.

    FNL assumes evaluative linguistic expression with the general form of:

    intensifierTEhead. (2.1)

    TEhead (head of a trichotomous evaluative linguistic expressions) can be grouped to form a fundamental evaluative trichotomy consisting of two antonyms and a middle term; for example, good,normal,bad. We will consider the trichotomy of low,medium,high.

    FNL has been applied in linguistics in the work of Torrens-Urrutia et al. [62,63,64]. In [65], the study of linguistic universals and complexity through the use of fuzzy evaluative expressions displays the membership scale of universality in linguistic rules recognizing:

    High Satisfied Universal. Linguistic rules that trigger a high truth value of satisfaction in a set of languages therefore, found satisfied in quasi-all the objects of a set.

    Medium Satisfied Universal. Linguistic rules that trigger a medium truth value of satisfaction in a set of languages.

    Low Satisfied Universal. Linguistic rules that trigger a low truth value of satisfaction in a set of languages.

    The value of complexity is usually computed as a negation of the value of universality, defining its correlation and by using IFTHEN rules such as:

    We characterize fuzzy IFTHEN rules for complexity as follows:

    ● IF a rule is a high universal THEN the value of complexity is low.

    ● IF a rule is a medium universal THEN the value of complexity is medium.

    ● IF a rule is a low universal THEN the value of complexity is high.

    Similarly, we can express:

    ● IF the value of complexity is high THEN the rule is a low universal.

    ● IF the value of complexity is medium THEN the rule is medium universal.

    ● IF the value of complexity is low THEN the rule is high universal.

    The membership scale of complexity in linguistic rules is [66]:

    Low Complexity. Linguistic rules that have a high truth value in terms of weight in a set of linguistic rules.

    Medium Complexity. Linguistic rules that have a medium truth value in terms of weight in a set of linguistic rules.

    High Complexity. Linguistic rules that have a low truth value in terms of weight in a set of linguistic rules.

    A possible world is defined as a specific context in which a linguistic expression is used. In the case of evaluative expressions, it is characterized by a triple w=vL,vS,vR. Without loss of generality, it can be defined by three real numbers vL,vS,vRR, where vL<vS<vR.

    Intension and extension: Our intension will simply be the membership degree [0, 1], while our extension will depend on the number of languages we consider in a representative set for evaluating universality and complexity.

    Figure 1 is the representation of which we will base our work in this paper for interpreting values with evaluative expressions.

    Figure 1.  Linguistic universality as an evaluative expression.

    We have established a theoretical partition of the possible world:

    ● Being impossible to find a real number scale for a context, such as what happens when evaluating temperature o speed, we establish an abstract context of a degree 0-1, usually understanding the y-axis as a membership degree of universality and the x-axis as the number of possible languages of an evaluative set.

    ● We respect the structure of an evaluative expression, two antonyms in each contrary set and one middle term; the middle term shares space with each antonym representing a transition between sets.

    ● Our strict theoretical tripartition can be defined as:

    small 0-0.4.

    medium 0.41-0.6.

    big 0.61-1.

    Figure 2 represents the process of our research regarding its materials and methods.

    Figure 2.  Diagram of materials and methods to compute degree of theoretical and absolute universality and complexity of Greenberg's universals and natural languages.

    Figure 2 has to be interpreted in two main parts:

    ● The materials:

    – Greenberg's universals (in blue), Grew Tool (in orange) and Universal Dependencies (in green).

    ● The methods to calculate:

    – Theoretical universality and complexity of Greenberg's universals and languages (in pink).

    – Relative universality and complexity of Greenberg's universals and languages (in yellow).

    – Application of the theory of fuzzy evaluative expressions for computing with words results of universality and complexity of natural languages.

    Regarding the materials, we distinguish three steps:

    - We have made a selection of Greenberg's universals.

    - We have formalized Greenberg's universals with the Grew Tool.

    - We have prepared a dataset of 146 languages annotated with Universal Dependencies in which we have been searched for the satisfaction, violation or non-applicability of Greenberg's universals formalised with the Grew Tool.

    Regarding the method, we distinguish three main parts:

    1) Method for computing theoretical weight of universality and complexity of Greenberg's universals.

    2) Method for computing relative weight of universality and complexity of Greenberg's universals.

    3) Method for computing theoretical and relative weight of universality and complexity of language, expressing such results in words with the theory of the fuzzy evaluative expressions.

    Regarding these three parts, we will obtain the following results:

    1) Evaluation w.r.t. the weight of theoretical universality and complexity of Greenberg's universals.

    2) Evaluate w.r.t. the weight of relative universality and complexity of Greenberg's universals.

    3) Taking into account the results of point one and two, we are able to:

    – Evaluate theoretical, universality and complexity of languages, language families and languages grouped by basic word order dominance through Greenberg's universals.

    – Evaluate theoretical and relative complexity of languages, language families and languages grouped by basic word order dominance through Greeneberg's universals.

    We explain each of these parts with more detail in the following Subsections 3.1 and 3.2.

    To find a balance between Greenberg's universals, and the data available, we have to disregard those Greenberg's universals hardly evaluable under the combination of the Grew Tool and UD Dependencies (UD) corpus.

    UD corpus is one of the most reliable, well-annotated, vast and accessible available data we can work with for linguistic studies. However, all 45 Greenberg's universals cannot be computationally analyzed in UD. Therefore, the data and its formalism condition the Greenberg's universals to be evaluated and used in our research.

    We disregard:

    ● Greenberg's universals that cannot be formalized with UD annotation schemes since this information is not labeled in this repository. These universals correspond mainly to the morphological universal groups regarding rules of inflection and derivation of words.

    ● Those universals related to intonation and similar aspects not covered in UD or in any corpora that gathers written language as linguistic data.

    ● Those universals from the group of syntactic universals evaluating word order [73,74]. Additionally, the influence of the word order constraints evaluating the interplay between complexity and universality has already been shown in [65]. Therefore, we characterize those universals that fall under the morpho-syntactic domain since they are the ones that have been less carefully studied to evaluate the tandem complexity-universality through their features.

    Therefore, we are left with the group of morphological universals considering word features since they adapt to our materials, and we have yet to find any information and previous research on them being evaluated and used to compute linguistic universality and complexity. Therefore, we work with the following universals:

    Universal 30. If the verb has categories of person-number or if it has categories of gender, it always has tense-mode categories.

    Universal 31. If either the subject or object noun agrees with the verb in gender, then the adjective always agrees with the noun in gender.

    Universal 32. Whenever the verb agrees with a nominal subject or nominal object in gender, it also agrees in number.

    Universal 34. No language has a trial number unless it has a dual. No language has a dual unless it has a plural.

    Universal 36. If a language has the category of gender, it always has the category of number.

    Universal 40. When the adjective follows the noun, the adjective expresses all the inflectional categories of the noun.

    Universal 42. All languages have pronominal categories involving at least three persons and two numbers.

    Universal 43. If a language has gender categories in the noun, it has gender categories in the pronoun.

    To obtain the results of each universal for each language in the UD corpora, we must convert Greenberg's natural language formulations into a more abstract formalization compatible with the terminology used in the systems mentioned above. This is possible thanks to the use of Grew tool 2.2.2.

    Due to a lack of space, we provide two examples of the functioning of the formalization process of universals. In the rest of the cases, the functioning is the same. We must carry out the query in Grew-Match (freely available), which will allow us to obtain the occurrences of each language in relation to the analyzed feature. Once this data is obtained, it can be determined whether the universal is fulfilled in the different languages. In the case of Universal 30:

    – If the verb has categories of person-number or if it has categories of gender, it always has tense-mode categories.

    If the universal is correct in each of the languages, these conditions must be met:

    (1) There must be the same amount or less of person-number than tense-mode.

    (2) There must be the same amount or less of gender than tense-mode.

    Therefore, Universal 30 states that we cannot (or shouldn't) find verbs with person-number or gender that do not have tense-mode. Thus, the world languages' verb forms may possess both features (having tense-mode and gender or person-number), one of the features (having tense-mode) or none of those features. In formal terms, we could propose this universal as:

    U30=ABC, (3.1)

    where A is understood as "person-number", B is understood as "gender" and C is understood as "tense-mode."

    First of all, we have to formalize the presence of person-number in the verbs of the world languages, for which we propose the formalization (3.2):

    %30VPersonnumberpattern{V[upos=VERB,Person,Number]}. (3.2)

    Equation (3.2) reads as follows. In the first line, headed by the symbol %, we find the title of the formalization or the information to identify it. In this case, we have named it 30-V-Person-number, which refers to the number of universals that such a structure contains, the part of speech affected (verb) and the features that such an object has (person-number). Subsequently, we open the call to the occurrence filter through the word pattern, which will restrict the search to the structure enunciated within "{" and "}". Within these symbols, we activate an element that we name V for simplification, corresponding to the characteristics within the symbols "[" and "]". In these symbols, we indicate the part of speech that we want to restrict (upos = VERB) and, subsequently, separated by ", " we indicate the characteristics that this verb must have. In this case, we ask for any value corresponding to the characteristics of Person and Number to be active.

    To formalize the presence of gender in verbs, we use the pattern (3.3), and for the formalization of tense-mode, we employ the pattern (3.4):

    %30Vgenderpattern{V[upos=VERB,Gender]} (3.3)
    %30VTenseMoodpattern{V[upos=VERB,Tense,Mood]}. (3.4)

    Equations (3.3) and (3.4) read their structure as Eq (3.2). Only the characteristics displayed for the verbs are changed to the desired ones.

    Once the first universal has been exemplified, we must apply the same type of searches using Grew's syntax to search for the characteristics we want. For example, in the case of universal 36:

    – If a language has the category of gender, it always has the category of number.

    We must formalize both categories independently. That is, what we show in (3.5) and (3.6):

    %36Genderpattern{Gender[Gender]} (3.5)
    %36Numberpattern{Number[Number]}. (3.6)

    Once we obtain the occurrences in the different languages, we will know whether such categories apply to each analyzed language. We understand this universal as a simple implication:

    U36=AB. (3.7)

    If we find A (Gender) in a language, we will always find B (Number). This implication tells us that we will not find languages with gender that do not also have number (something that is possible the other way around).

    Through these two examples of formalization, the rest of the universals we work with can be understood and retrieved. If the premise of the implication of the universal is double, this structure can be rescued from the universal 30. If the premise is simple, the structure can be rescued from the universal 36. The only universal without the implicational structures shown above is universal 42, which we can formalize as:

    U42=A+B. (3.8)

    This can be understood as meaning that any language contains A and B. By A we mean three features or more of pronominal person and by B we mean two features or more of pronominal number.

    A selection of languages is mandatory, as it is impossible to analyze the totality of the world's languages for two main reasons. First, we are still determining exactly how many languages there are in the world, as languages are constantly being born and dying without our knowledge [67]. Second, there has yet to be an agreement on the distinction between dialects and languages [68]. Therefore, we do not know the totality of the world's languages [69]. Additionally, for many non-Indo-European languages, even though we have a label for them, we don't have reliable and normalized scientific data[70].

    Therefore, studies of linguistic universality and complexity have to make a representative selection of the world's languages that will allow us to extrapolate the data obtained. Depending on the type of study carried out, this representation may have different characteristics [71]:

    Convenience Sampling. If data availability is inadequate, a perfect balance cannot be guaranteed. However, the results may be indicative of a clear universal trend.

    Variety Sampling. There is a wide availability of language data, yet the phenomenon needs to be better studied. Representative languages of different linguistic types and genetic backgrounds and areas are selected, also including languages that are characteristically untypical examples in the language set.

    Probability Sampling. If the availability of data from different languages is reliable, normalized and correct and we want to know the representativeness of a phenomenon, we must balance the selection to maintain an equilibrium of linguistic type, linguistic family and area.

    In our case, we have created a dataset of convenience sampling. We have worked with 241 corpora corresponding to 143 different languages of UD 2.2.1. We analyze the totality of the available corpus for two reasons:

    (1) First, given the pioneering nature of the study, it is interesting to extend the results to the maximum possible number of languages to gain information on those not represented.

    (2) Second, recent studies guarantee that with a varied number of languages, it is not necessary to establish any corrective measure for sampling, as the results are the same [72].

    In our research, the totality and detailed list of the analyzed languages can be checked in [50]. 46% of the languages are from the Indo-European family, a common bias in typology studies. However, all the different macro-areas of the world (except Australia) are present: Papunesia, Eurasia, North America, South America and Africa. We also find some dead languages (Latin, Sanskrit, Ancient Greek, ) and several isolated languages (Basque, Japanese, ). Another of the most interesting aspects of the set of languages analyzed is the presence of unusual varieties such as Creoles, code-switching languages or sign languages.

    To guarantee the neutrality of the selection and the falsifiability of the data used, we have analyzed the first 1, 000 sentences of each of the 241 corpora.

    We perform three main tasks:

    1) Computing theoretical universality and complexity of Greenberg's universals.

    2) Computing relative universality and complexity of Greenberg's universals.

    3) Computing theoretical, relative universality and complexity of natural languages.

    We explain each of these tasks in the following subsections:

    The weight of theoretical universality (GUT) of Greenberg's universals is computed by checking all the languages in which the universal is satisfied (AllLs), divided by the set of languages to which the universal applies (AllLapp) (Eq (3.9)):

    GUT=AllLsAllLapp. (3.9)

    When we refer to the fact that a universal does not apply to a language, this can be due to multiple reasons. In short, this means that in an analyzed language L, the elements cited in the universal are not present and, therefore, it is not testable. If it does apply, on the other hand, these elements are present and we check whether Greenberg's proposal is satisfied or violated. The theoretical complexity of a Greenberg's universal (GCT) is computed as a negation of the weight of theoretical universality (GUT) in Eq (3.10):

    GCT=GUT+1. (3.10)

    Therefore, we establish a co-relation in which the more universal a language is, the less complex it is. Thus, the language sharing more rules with all the other languages is theoretically less complex than the language that shares less of the universals concerning the rest of the set of languages.

    To estimate the degree of universality and complexity, we can apply fuzzy/linguistic IF-THEN rules. Using them, we can replace evaluation using numbers with words.

    In this case, the degree of universality and complexity can be estimated using fuzzy/linguistic IF-THEN rules as follows:

    ● IF a Greenberg's universal is highly satisfied THEN the degree of universality is high.

    ● IF a Greenberg's universal is quite satisfied THEN the degree of universality is medium.

    ● IF a Greenberg's universal is barely satisfied THEN the degree of universality is is low.

    Similarly, we can express:

    ● IF the degree of universality is high THEN the degree of complexity is low.

    ● IF the degree of universality is medium THEN the degree of complexity is medium.

    ● IF the degree of universality is low THEN the degree of complexity is high.

    We compute the weight of relative universality and complexity of Greenberg's universals, considering how many languages each universal is satisfied, violated or does not apply.

    The value of a satisfied relative universality of a Greenberg's universal (GUR) is computed by checking how each universal behaves in all our sets of languages. As a result, each universal has a relative weight for each state: A weight of satisfaction, violation and non-applicability.

    The value of relative universality satisfaction of a universal (GURS) is computed by considering all the languages in which the universal is satisfied (AllLs), divided by our full set of languages (AllL) (Eq (3.11)):

    GURS=AllLsAllL. (3.11)

    The value of relative universality violation of a universal (GURV) is computed by considering all the languages in which the universal is violated (AllLv), divided by our full set of languages (AllL) (Eq (3.12)):

    GURV=AllLvAllL. (3.12)

    The value of relative universality non-applicability of a universal (GURnapp) is computed by considering all the languages in which the universal is non-applicable (AllLnapp), divided by our full set of languages (AllL) (Eq (3.13)):

    GURnapp=AllLnappAllL. (3.13)

    The value of relative complexity of a Greenberg's universal (GCR) is computed as a negation of the weight of relative universality for three of the behaviors of a Greenberg's universal GURS as in (3.14), GURV as in (3.15) and GURnapp as in (3.16).

    GCR=GURS+1 (3.14)
    GCR=GURV+1 (3.15)
    GCR=GURnapp+1. (3.16)

    Therefore, we establish again a correlation between linguistic universality and complexity. The more universal a language is, the less complex it is since it shares more rules with all the other languages. We can express our results computing with words as with those fuzzy evaluative expressions mentioned above.

    We have based this step on the results from calculating the theoretical and relative universality and complexity of Greenberg's universals. Therefore:

    ● If the language applies and satisfies a Greenberg's universal, the language adds a value of one to its weight.

    ● On the contrary, if a language does not satisfy nor apply the universal, the universal does not add any value to its weight. Therefore, such universal weight is zero with respect to the language.

    Consequently, we compute the theoretical value of universality of a language (LUT) by taking into account all the satisfied universals (AllGUS) in it, divided by all the Greenberg's universals of our set (AllGU).

    LUT=AllGUSAllGU. (3.17)

    On the other hand, we compute the value of complexity of a language (LCT) again as the negation of its universality, as in Eq (3.18.):

    LCT=LUT+1. (3.18)

    Table 1 is an example of a calculation of theoretical universality and complexity.

    Table 1.  Example of computing theoretical universality and complexity of languages.
    Language u30 u31 u32 u34 u36 u40 u42 u43 Theoretical Universality Theoretical Complexity Fuzzy Evaluative Expressions
    Slovenian 1 1 0 1 1 1 1 1 0, 875 0, 125 High-Low
    Wolof 1 0 1 0 1 0 1 0 0, 5 0, 5 Medium
    Guarani 0 0 1 0 0 0 0 0 0, 125 0, 875 Low-High

     | Show Table
    DownLoad: CSV

    We have taken the value of relative universality (GUR) that corresponds to each language per each universal.

    Following the example of Table 1, U30 has the relative values if satisfied weights 0.56, if violated 0.07 and if non-applicable 0.37. Naija computes U30 as satisfied; therefore, it has a value of 0.56. In Japanese, U30 is non-applicable, adding a value of 0.37 for U30. In Arabic, U30 is violated, so it adds 0.07. The same applies to the rest of the universals. Therefore, the final value of relative universality of a language (RUL) is computed as the addition of all the relative values of Greenberg's universals in the language (AllGURinL), divided by all the set of Greenberg's universals (AllGU), as shown in Eq (3.19):

    RUL=AllGURinLAllGU. (3.19)

    On the other hand, the relative complexity of a language is computed as a negation of the relative universality of that same language, as in Eq (3.20):

    RCL=RUL+1. (3.20)

    To compute the values of theoretical and relative universality and complexity of language families and word order dominance groups, we have applied the calculations by grouping all the values of each language in their families, or word order dominance groups, and dividing it by the total amount of objects.

    Table 2 shows the theoretical universality and complexity values for each of Greenberg's universals. We confirm that universals U31, U32, U34, U36 and U43 are highly satisfied. Only U42 displays a tendency toward showing a medium value of satisfaction, but still belonging to the set of high universal. Differently, U30 and U40 do belong to the set of a medium universal. Therefore, U40 and U30 could be questioned as a linguistic universal. The rest of the universals fall under the spectrum of 0.7 and one; thus they are on the set of high universal and, consequently, are definitely a universal rule from the theoretical point of view.

    Table 2.  Degree of theoretical universality and complexity of Greenberg's universals.
    Degree of theoretical universality and complexity of Greenberg's universals
    Greenberg's Universals Set of Languages Languages in which applies Satisfied Violated Theoretical Universality Theoretical Complexity
    u30 143 85 37 48 0, 43 0, 57
    u31 143 94 94 0 1 0
    u32 143 119 108 11 0, 91 0, 09
    u34 143 10 10 0 1 0
    u36 143 76 76 0 1 0
    u40 143 38 16 22 0, 42 0, 58
    u42 143 92 64 28 0, 7 0, 3
    u43 143 63 60 3 0, 95 0, 05

     | Show Table
    DownLoad: CSV

    Figure 3 shows the same information as Table 2, but in the shape of a fuzzy evaluative expression graph. Y-axis is the degree of membership of a Greenberg's universal according to how many applicable languages a universal has been satisfied. X-axis displays the conversion of the applicable language for each universal to 100. The left set defines the spectrum of low universality, the medium set represents medium universality and the right set represents high universality.

    Figure 3.  Degree of theoretical universality and complexity of Greenberg's universals in the form of an evaluative expression graph.

    Table 3 presents the relative weight of satisfaction, violation and non-applicability of each of Greenberg's universals. We mark the highest value on green and on red, the lowest. Therefore, we observed that most languages satisfy Greenberg's universals. On the other hand, those universals that are non-applicable in many languages have the highest weight on the label of non-applicable. Therefore, no universal with a higher weight of violation exists, showing that Greenberg wasn't entirely wrong in any of those universals.

    Table 3.  Degree of relative universality and complexity of Greenberg's universals.
    Degree of relative universality and complexity of Greenberg's universals
    Geenberg's Universal Set of languages YES NO NOTAPP Relative Universality Relative Complexity
    Weight of Satisfaction Weight of Violation Weight of Non-applicable Weight of Satisfaction Weight of Violation Weight of Non-applicable
    u30 126 37 48 41 0, 29 0, 38 0, 37 0, 71 0, 62 0, 66
    u31 143 94 0 49 0, 66 0, 00 0, 34 0, 34 1, 00 0, 66
    u32 143 108 11 24 0, 76 0, 08 0, 17 0, 24 0, 92 0, 83
    u34 123 10 0 113 0, 08 0, 00 0, 92 0, 92 1, 00 0, 08
    u36 143 76 0 67 0, 53 0, 00 0, 47 0, 47 1, 00 0, 53
    u40 143 16 22 105 0, 11 0, 15 0, 73 0, 89 0, 85 0, 27
    u42 143 64 28 51 0, 45 0, 20 0, 36 0, 55 0, 80 0, 64
    u43 143 60 3 80 0, 42 0, 02 0, 56 0, 58 0, 98 0, 44

     | Show Table
    DownLoad: CSV

    We propose two interpretations for those universals that display non-applicability as their highest weight regarding both Tables 2 and 3. First, we consider those universals that fall on non-applicability over satisfaction, such as U34, U40 and U43, as a mistake from Greenberg by proposing universals that are too specific or, on the other hand, taking into account the general trend of Table 2 and considering that, in case those universals would ever be applicable in all languages, we predict that the tendency would be to fall on a hard satisfaction weight.

    On the other hand, only U30 falls on the weight of violation over satisfaction and non-applicability. More than half of the languages where this universal applies in our set of languages do not necessarily have tense-mode categories when they bear either person-number or gender. Therefore, Greenberg created a universal that could be questioned.

    Table 4 displays a classification of our set of 143 languages classified by their theoretical degree of universality and complexity. It is described with a fuzzy evaluative expression below in the last row. For example, Slovenian and Turkish German fall on the spectrum of being highly universal and not so complex from an absolute theoretical point of view. Bengali and Guarani fall on the spectrum of having a low value of universality and, therefore, being highly complex. This result expresses how many of Greenberg's universals are satisfied or not in each language. Thus, universality and complexity are always interpreted strictly from Greenberg's perspective in this results.

    Table 4.  Degree of theoretical universality and complexity of languages.
    Degree of theoretical universality and complexity per languages
    Universality-Complexity Value 0, 875-0, 125 0, 75-0, 25 0, 625-0, 375 0, 5-0, 5 0, 375-0, 625 0, 25-0, 75 0, 125-0, 875
    Languages Slovenian Turkish_German, English,
    Danish, Welsh, Umbrian,
    Old_Church_Slavonic,
    Irish, Upper_Sorbian,
    Ukrainian,
    Ancienthebrew, Latvian
    Naija, Manx, Kazakh, Hittite,
    North_Sami, Erzya,
    Moksha, German, Lowsaxon,
    Scottish_Gaelic, Faroese,
    Norwegian,
    Greek, Latin,
    Italian, Czech, Breton,
    Belarussian, French,
    Kurmanji, Spanish, Russian, Icelandic,
    Croatian, Serbian, Pomak, Gothic,
    Ligurian, Lithuanian, Ancientgreek,
    Slovak, Arabic, Bulgarian
    Uyghur, Estonian, Karelian,
    Western_Sierra_Puebla_Nahuatl, Xibe,
    Turkish, Komizyrian,
    Amharic, Buryat,
    Persian, Coptic, Wolof,
    Swedish, Gheg,
    Yakut, Galician,
    Sinhala, Tamil, Marathi, Hindi,
    Sanskrit, Hebrew, Albanian,
    Catalan, Polish, Armenian,
    Portuguese, Western_Armenian,
    Oldeastslavic, Romanian
    Tatar, Finnish, Karo, Livvi,
    Yoruba, Zaar, Komipermayah,
    Skolt_Sami, Korean,
    Mbyaguarani, Bambara,
    Apurina, Cebuano, Malayalam,
    Nheengatu, Beja, Indonesian,
    Kiche, Afrikaans,
    Hungarian, Basque,
    Assyrian, Urdu, Bhojpuri
    Tupinamba, Sud_Chinesepud,
    Bengali, Sud_Chinesegsd,
    Sud_Chinesegsdsimp,
    Tagalog, Warlpiri,
    Khunsari,
    Nayini, Thai, Cantonese(Hk),
    Sud_Chinesecfl, Sud_Chinesehk,
    Sud_Chinesepatentchar,
    Sud_Chukchihse, Frisiandutch,
    Hindi_English, Kangri,
    Maltese, South_Levantine_Arabic,
    Swedish_Sign_Language,
    Swiss_German, Telugu,
    Vietnamese, Abaza,
    Yupik, Old_French,
    Dutch, Javanese,
    Akkadian, Madi
    Teko, Akuntsu,
    Guarani, Kaapor,
    Soi, Xavante, Makurap,
    Munduruku, Japanese,
    Neapolitan, Old_Turkish,
    Guajajara,
    Sud_Classical_Chinesekyoto
    Evaluative Expression High Universality, Low Complexity Medium Universality and Complexity Low Universality, High Complexity

     | Show Table
    DownLoad: CSV

    Figure 4 shows a classification of theoretical universality (y-axis) and complexity (x-axis) per language family. By far, Creole and Indoeuropean language families are the most universal and less complex ones. Creole is usually hardly influenced by Indo-European languages; therefore, it is logical that they fall on a similar spectrum. The less universal and more complex are Sino-Tibetan, Tupian and Japanese.

    Figure 4.  Degree of theoretical universality and complexity of language families.

    There are two possible interpretations of these graphs: Indo-European languages are the largest group and, therefore, the one with the most possibilities of being the universal one, or Greenberg's universals are biased toward Indo-European languages, as well as our data.

    Figure 5 shows a classification of theoretical universality (y-axis) and complexity (x-axis) per basic word order dominance. The groups are distributed throughout the whole gradience. OVS (Object-Verb-Subject) and VSO (Verb-Subject-Object) satisfy almost entirely all Greenberg's universals, and NDO (Non dominant order) and SOV (Subject-Object-Verb), the group of languages that meet Greenberg's universals, satisfy less.

    Figure 5.  Degree of theoretical universality and complexity of basic word order dominance.

    Figure 6 displays a radar chart with a degree of relative universality and complexity per language with colors of evaluative expressions: Green-high/low, orange-medium and red-low/high. The angular axis displays the value of complexity, while the radial axis displays the value of universality. Most languages have a medium weight of relative complexity and universality, while almost none have a high complexity value (in red). This distribution coincides with Table 4. However, the languages are very differently distributed, such as what happens with Slovenian, which was the most universal from a theoretical point of view. It appears as not a universal one from the viewpoint of relative universality. However, in Table 4, we are evaluating the theoretical universality of language according to only Greenberg's universals, and in Figure 6, we are evaluating universality and complexity concerning the relative weights, meaning Slovenian was very solid on a more discrete counting (only one and zero). At the same time, it is not that similar to the other languages on a more fuzzy counting, considering different weights concerning the behavior of the rest of the languages.

    Figure 6.  Degree of relative universality and complexity of languages.

    Figure 7 presents a classification of relative universality (y-axis) and complexity (x-axis) per language family. No language falls after a value of the medium. The lowest value is for Eskimo-Aleut, with a relative universality of 0.54. Therefore, all the language families have a high or medium value on relative universality and complexity. In contrast with Figure 4, Creole and Uto-Azteca languages are the most universal. On the other hand, Indo-European languages display a different behavior by being the third less universal family group.

    Figure 7.  Degree of relative universality and complexity of language families.

    Figure 8 shows a classification of relative universality (y-axis) and complexity (x-axis) per basic word order dominance. The groups are distributed throughout the whole gradience. OSV Greenberg's universals the most. NDO and VSO are the languages that meet Greenberg's universals the least. These results disagree with the data from evaluating theoretical universality and complexity in Figure 5.

    Figure 8.  Degree of relative universality and complexity of basic word order dominance.

    A possible interpretation of this data is that basic word order doesn't provide much relevant information regarding language universality and complexity and, therefore, to have a proper gradient classification, we need to dig in with bigger groups, such as groups of languages or families.

    The contributions of this study extend across various domains. Noteworthy are its contributions to the field of linguistic complexity, which constitutes the central focus of this paper, as well as its relevance to the realm of linguistic universals and the mathematical theory of evaluative expressions.

    In linguistic complexity, we present a mathematical-formal model and employ computational tools to compute relative complexity across real data corpora. This approach enables us to avoid the challenges encountered by studies that determine relative complexity through psycholinguistic experiments (subject to the influence of extralinguistic factors or individual variability) or those grappling with calculating absolute complexity based on grammars (which lacks grounding in authentic data). In fact, as highlighted by Kortmann [24], one issue in the literature on linguistic complexity is that often these works rely on rather unsystematic and intuition-based evidence. When grounded in actual data, they tend to be confined to reference grammars in conjunction with certain typological sampling techniques. By working with real data on 143 languages, we aim to provide a possible solution to the limitations encountered in much of the literature on linguistic complexity.

    Another significant contribution of this research lies in our approach to complexity assessment. Unlike a predominant focus on either absolute or relative complexity in much of the existing literature, our work bridges the gap between these two forms of complexity and is presented as disconnected entities. Specifically, we establish a connection between absolute and relative measures of complexity. In this study, based on calculating the degree of universality of a language, we determine its complexity for acquisition by adults who already possess proficiency in a first language. Consequently, we employ absolute measures that allow for assessing complexity in relative terms. This perspective aligns with viewpoints like that of Sinnemäki [75], who highlights the need for further research to explore the interplay between complexity and difficulty through psycholinguistic experiments, asserting that a comprehensive understanding of intricate phenomena necessitates multifaceted investigation.

    We establish a correlation between the concepts of linguistic complexity and linguistic universals. Despite the extensive body of work conducted in recent years within the realm of typological studies on complexity [4,16,29,30,31,32,33,75,76], these concepts have rarely been jointly analyzed in the manner how they are interconnected within this work. On the one hand, we understand the concept of complexity in terms of the difficulty of learning one language from another (second language acquisition); on the other hand, we interpret universals as structures/categories present in all languages. From this standpoint, we establish an inversely proportional relationship between the two concepts: The greater the degree of shared characteristics between two languages, the less challenging it will be to learn one from the other. In essence, the higher the universality of a language, the lower its complexity level when learned as a second language.

    The relationship between complexity and universality aligns with the concept of the connection between rarity and complexity [75]. Scholars like Newmeyer [77] and Harris [78] have linked cross-linguistic rarity to linguistic complexity. Miestamo [29] suggests that while a direct correlation between rarity and absolute complexity might not always exist, some level of association between rarity and difficulty, namely, relative complexity, can be anticipated. Hawkins [33] sheds light on this relationship by noting that structures that are easy and efficient in performance tend to grammaticalize more frequently in languages, while those that are complex and inefficient tend to grammaticalize less often. Additionally, Sinnemäki [75] points out that low probability has been tied to complexity [79], and typological rarities (opposite of universals) may consequently demonstrate higher grammatical complexity [78].

    The relationship between the level of universality and the degree of language complexity is established in a previous work [80]. Although Greenberg's universals were not the focus of that paper, the same philosophical approach was employed. It is worth noting the advantages of the method presented here compared to the previous work. The prior study examined only nine languages, whereas this work analyzes 143 languages. In our last approach, it was necessary to generate a grammar for each of the analyzed languages, in addition to a universal grammar with 42 billion syntactic constraints. A universality weight was assigned to each rule using each of the generated grammars, and constraints were classified as having low, medium or high universality. This method enabled the determination of language complexity: The most complex languages were those with fewer universal constraints. This analysis required computing a correlation between every pair of languages to determine the complexity levels regarding shared syntactic constraints.

    The analysis presented in this current work employs a more abstract and generic concept, that of linguistic universals. Through this concept, we are able to examine more specific characteristics, thereby enabling the grouping of languages by type (three distinct types, varying according to the universal). We work with fewer rules and, therefore, cannot construct a complete grammar, as was done in [80]. However, given our detailed understanding of each language's behavior concerning each universal (due to the smaller scope, it is more manageable), we can gain better insights into the formed language groups. This approach, in turn, allows us to calculate linguistic complexity more effectively.

    We introduce a fuzzy approach to both the complexity and universality concepts. This innovative framework enhances their description and classification, providing transparency and coherence with its non-discrete (fuzzy) nature. Concerning complexity, establishing a fuzzy definition and presenting a formal model for calculating its levels is a challenging endeavor. Regarding universals, this fuzzy approach effectively addresses classical terminological challenges in linguistic typology. While authors like Tomlin [81] and Dryer [82] advocate for universals with exceptions and present compelling reasons to engage with them, they often fall short of offering a system capable of classifying and comprehending them as non-discrete entities.

    In the context of universals, with a specific focus on Greenberg's universals, we present a formalized approach. Our proposal differs from typological studies, where it is often difficult to encounter formal models and where the prevailing norm uses nonformal and occasionally ambiguous formulations to describe these linguistic regularities [14,83].

    Another contribution of this study is the validation of the universals formulated by Greenberg. We assess the validity of these universals using a quantitative, objective and verifiable methodology. The universals under investigation in this article have yet to undergo an in-depth analysis of the existing literature. While there have been isolated analyses of certain universals [84], a systematic analysis like the one presented in this study, grounded in real-text data, has yet to be previously conducted.

    While computational validations of Greenberg's universals are becoming increasingly common, the analysis presented in this paper offers a distinct set of characteristics. First and foremost, computational analyses often tend to be isolated and focus on specific universals, in contrast to our approach in this article. Moreover, the universals typically scrutinized are often associated with Word Order [73,74], whereas this study delves into a different category of Greenberg's universals: The morphological ones. Furthermore, most approaches to universals still do not employ quantitative methodologies based on occurrences within a corpus of real texts; instead, they often lean toward grammar-based analyses [83].

    Finally, we provide fine-grained results for various linguistic types within the different universals, without this approach being incompatible with the categorical approach to validate/refute Greenberg's universals (which we can also and do employ). This more fine-grained analysis enhances precision, showcasing non-prototypical or less canonical cases. This level of granularity is less common in more traditional approaches, such as [14,85].

    Regarding the limitations of our results, the fact of dealing only with 8/45 universals and questioning how well-distributed the universals are across languages (i.e., Indo-European vs. other languages) could seem detrimental to the validity of our study. With this respect, analyzing eight out of 45 universals is significant since there are no experiments or other experiments considering these universals across an extended number of languages, such as our corpus of 146 selected languages (up to our knowledge). Moreover, studies are usually on only two or three universals. Providing research results concerning eight universals in a paper is not usual, either, cf. [48,73,74,86,87].

    On the other hand, the distribution of universals depends on the premise used:

    ● If the universals are analyzed to find the weight of universals, i.e., which universals are most respected by each of the languages, all universals are highly respected except for U30, U40 and U42.

    ● On the other hand, if the languages are analyzed individually to find the universality weight of each language by taking into account how many of Greenberg's universals are respected in each of the languages individually, then we find that practically no language has a high level of universality. Additionally, we conclude that the universality weight of Greenberg's rules is based on a homogeneous distribution across the languages rather than on a large group of languages satisfying the universals.

    ● In our corpus, there are more Indo-European languages than families from other languages since UD are biased toward them. However, we have provided a very varied distribution, which includes 25 different language families. By providing a distribution of universality considering language families as a criterion, we show that Indo-European languages have a lower degree of universality than most other language families. Indo-European languages are very diverse and they display very different features amongst them. Contrarily, other smaller language families, such as Creoles or Korean, gather more universals since smaller language families tend to be more homogeneous.

    Otherwise, to clarify the issue regarding the distribution of languages, we can refer to it as a "bibliographical bias". In other words, our study is influenced by the available sources. However, we must emphasize that our options are limited in this regard. It might seem that the number of Indo-European languages is much lower than 46% in WALS (World Atlas of Language Structures)-type databases. Still, it's crucial to consider that while there are entries from various languages, many of these entries contain only limited or nonspecific information. As a result, we need more substantive data regarding the aspects covered by these universals. Given these constraints, our approach is based on the available data within the UD framework. As UD continues to expand, we propose this study method, which can be further refined by incorporating data from other languages in the future.

    This paper contributed to the studies on linguistic complexity by demonstrating differences in complexity among languages and by providing an objective and valid method to calculate the relative complexity of natural languages, particularly in the context of second language learning (L2) in adults.

    To calculate linguistic complexity, the paper introduced the use of computational tools and mathematical models that can provide objective and reliable measures for quantitatively evaluating linguistic complexity.

    Overall, the work presented in this paper seeks to advance the understanding of linguistic complexity and offer valuable insights into the nature of language complexity. The use of computational and mathematical tools can contribute to challenge the long-held assumption of equicomplexity among languages and may have significant implications for various areas, including theoretical linguistics, comparative linguistics, language acquisition, second language teaching and language technologies. It suggests that acknowledging differences in linguistic complexity can lead to improved language teaching methods and better-designed language technologies.

    Furthermore, this study conducted a cross-validation on Greenberg's universals by utilizing data from 143 languages and formalizing them in a structured manner. Our approach was grounded in the analysis of several languages and relied on a source that is distinct from Greenberg's, involving real texts and actual utterances. We extracted simplified frequencies within a particular grammatical category using authentic texts produced by speakers.

    Our objective was to verify the consistency of Greenberg's universals by examining a significantly larger dataset, encompassing 113 additional languages compared to Greenberg's original work. Additionally, by using texts generated by native speakers, we aimed to align our findings with Greenberg's, further validating his proposed universals. Our approach and results serve to reinforce the credibility of Greenberg's proposals, given the alignment of results across this expanded and diversified dataset.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This paper was supported by the project PID2020-120158GB-I00 funded by MCIN/AEI/10.13039/501100011033.

    The authors have no conflicts of interest.



    [1] L. Wei, Y. Yang, Optimal order finite difference/local discontinuous Galerkin method for variable-order time-fractional diffusion equation, J. Comput. Appl. Math., 383 (2021), 113129. https://doi.org/10.1016/j.cam.2020.113129 doi: 10.1016/j.cam.2020.113129
    [2] Y. Y. Wang, Y. P. Zhang, C. Q. Dai, Re-study on localized structures based on variable separation solutions from the modified tanh-function method, Nonlinear Dyn., 83 (2016), 1331–1339. https://doi.org/10.1007/s11071-015-2406-5 doi: 10.1007/s11071-015-2406-5
    [3] H. Zhang, X. Jiang, X. Yang, A time-space spectral method for the time-space fractional Fokker–Planck equation and its inverse problem, Appl. Math. Comput., 320 (2018), 302–318. https://doi.org/10.1016/j.amc.2017.09.040 doi: 10.1016/j.amc.2017.09.040
    [4] K. J. Wang, S. Li, Study on the local fractional (3+1)-dimensional modified Zakharov–Kuznetsov equation by a simple approach, Fractals, 32 (2024), 2450091. https://doi.org/10.1142/S0218348X24500919 doi: 10.1142/S0218348X24500919
    [5] K. Diethelm, N. J. Ford, The analysis of fractional differential equations, J. Math. Anal. Appl., 265 (2002), 229–248. https://doi.org/10.1006/jmaa.2000.7194 doi: 10.1006/jmaa.2000.7194
    [6] X. Jiang, H. Qi, Thermal wave model of bioheat transfer with modified Riemann-Liouville fractional derivative, J. Phys. A: Math. Theor., 45 (2012), 485101. https://doi.org/10.1088/1751-8113/45/48/485101 doi: 10.1088/1751-8113/45/48/485101
    [7] M. T. Islam, M. A. Akter, Distinct solutions of nonlinear space–time fractional evolution equations appearing in mathematical physics via a new technique, PDE Appl. Math., 3 (2021), 100031. https://doi.org/10.1016/j.padiff.2021.100031 doi: 10.1016/j.padiff.2021.100031
    [8] B. Ghanbari, K. S. Nisar, M. Aldhaifallah, Abundant solitary wave solutions to an extended nonlinear Schrödinger's equation with conformable derivative using an efficient integration method, Adv. Differ. Equ., 2020 (2020), 328. https://doi.org/10.1186/s13662-020-02787-7 doi: 10.1186/s13662-020-02787-7
    [9] U. H. M. Zaman, M. A. Arefin, M. A. Akbar, M. H. Uddin, Study of the soliton propagation of the fractional nonlinear type evolution equation through a novel technique, PLos One, 18 (2023), e0285178. https://doi.org/10.1371/journal.pone.0285178 doi: 10.1371/journal.pone.0285178
    [10] H. C. Yaslan, A. Girgin, Exp-function method for the conformable space-time fractional STO, ZKBBM and coupled Boussinesq equations, Arab J. Basic Appl. Sci., 26 (2019), 163–170. https://doi.org/10.1080/25765299.2019.1580815 doi: 10.1080/25765299.2019.1580815
    [11] M. M. Khater, D. Kumar, New exact solutions for the time fractional coupled Boussinesq-Burger equation and approximate long water wave equation in shallow water, J. Ocean Eng. Sci., 2 (2017), 223–228. https://doi.org/10.1016/j.joes.2017.07.001 doi: 10.1016/j.joes.2017.07.001
    [12] S. X. Deng, X. X. Ge, Analytical solution to local fractional Landau-Ginzburg-Higgs equation on fractal media, Ther. Sci., 25 (2021), 4449–4455. https://doi.org/10.2298/TSCI2106449D doi: 10.2298/TSCI2106449D
    [13] S. Javeed, S. Saif, A. Waheed, D. Baleanu, Exact solutions of fractional mBBM equation and coupled system of fractional Boussinesq-Burgers, Results Phys., 9 (2018), 1275–1281. https://doi.org/10.1016/j.rinp.2018.04.026 doi: 10.1016/j.rinp.2018.04.026
    [14] M. Khater, D. Lu, R. A. Attia, Dispersive long wave of nonlinear fractional Wu-Zhang system via a modified auxiliary equation method, AIP Adv., 9 (2019), 025003. https://doi.org/10.1063/1.5087647 doi: 10.1063/1.5087647
    [15] Z. Z. Si, Y. Y. Wang, C. Q. Dai, Switching, explosion, and chaos of multi-wavelength soliton states in ultrafast fiber lasers, Sci. China Phys. Mech. Astronomy, 67 (2024), 1–9. https://doi.org/10.1007/s11433-023-2365-7 doi: 10.1007/s11433-023-2365-7
    [16] X. Lü, S. J. Chen, G. Z. Liu, W. X. Ma, Study on lump behavior for a new (3+1)-dimensional generalised Kadomtsev-Petviashvili equation, East Asian J. Appl. Math., 11 (2021), 594–603. https://doi.org/10.4208/eajam.101120.180221 doi: 10.4208/eajam.101120.180221
    [17] Z. Liang, Z. Li-Feng, L. Chong-Yin, Some new exact solutions of Jacobian elliptic function about the generalized Boussinesq equation and Boussinesq-Burgers equation, Chinese Phys. B, 17 (2008), 403. https://doi.org10.1088/1674-1056/17/2/009 doi: 10.1088/1674-1056/17/2/009
    [18] R. Ali, E. Tag-eldin, A comparative analysis of generalized and extended (G' G)-Expansion methods for travelling wave solutions of fractional Maccari's system with complex structure, Alex. Eng. J., 79 (2023), 508–530. https://doi.org/10.1016/j.aej.2023.08.007 doi: 10.1016/j.aej.2023.08.007
    [19] S. Guo, L. Mei, Y. Li, Y. Sun, The improved fractional sub-equation method and its applications to the space–time fractional differential equations in fluid mechanics, Phys. Lett. A, 376 (2012), 407–411. https://doi.org/10.1016/j.physleta.2011.10.056 doi: 10.1016/j.physleta.2011.10.056
    [20] K. J. Wang, S. Li, Complexiton, complex multiple kink soliton and the rational wave solutions to the generalized (3+1)-dimensional kadomtsev-petviashvili equation, Phys. Scripta, 99 (2024), 075214. https://doi.org10.1088/1402-4896/ad5062 doi: 10.1088/1402-4896/ad5062
    [21] S. S. Ray, R. K. Bera, Analytical solution of a fractional diffusion equation by Adomian decomposition method, Appl. Math. Comput., 174 (2006), 329–336. https://doi.org/10.1016/j.amc.2005.04.082 doi: 10.1016/j.amc.2005.04.082
    [22] M. Dehghan, J. Manafian, A. Saadatmandi, Solving nonlinear fractional partial differential equations using the homotopy analysis method, Numer. Meth. PDE: Int. J., 26 (2010), 448–479. https://doi.org/10.1002/num.20460 doi: 10.1002/num.20460
    [23] M. A. Khatun, M. A. Arefin, M. A. Akbar, M. H. Uddin, Numerous explicit soliton solutions to the fractional simplified Camassa-Holm equation through two reliable techniques, Ain Shams Eng. J., 14 (2023), 102214. https://doi.org/10.1016/j.asej.2023.102214 doi: 10.1016/j.asej.2023.102214
    [24] V. K. Srivastava, M. K. Awasthi, S. Kumar, Analytical approximations of two and three dimensional time-fractional telegraphic equation by reduced differential transform method, Egypt. J. Basic Appl. Sci., 1 (2014), 60–66. https://doi.org/10.1016/j.ejbas.2014.01.002 doi: 10.1016/j.ejbas.2014.01.002
    [25] V. Ala, U. Demirbilek, K. R. Mamedov, An application of improved Bernoulli sub-equation function method to the nonlinear conformable time-fractional SRLW equation, AIMS Math., 5 (2020), 3751–3761. https://doi:10.3934/math.2020243 doi: 10.3934/math.2020243
    [26] I. Ullah, K. Shah, T. Abdeljawad, S. Barak, Pioneering the plethora of soliton for the (3+1)-dimensional fractional heisenberg ferromagnetic spin chain equation, Phys. Scripta, 99 (2024), 095229. https://doi.org/10.1088/1402-4896/ad6ae6 doi: 10.1088/1402-4896/ad6ae6
    [27] U. H. M. Zaman, M. A. Arefin, M. A. Akbar, M. H. Uddin, Analyzing numerous travelling wave behavior to the fractional-order nonlinear Phi-4 and Allen-Cahn equations throughout a novel technique, Results Phys., 37 (2022), 105486. https://doi.org/10.1016/j.rinp.2022.105486 doi: 10.1016/j.rinp.2022.105486
    [28] U. H. M. Zaman, M. A. Arefin, M. A. Akbar, M. H. Uddin, Solitary wave solution to the space-time fractional modified Equal Width equation in plasma and optical fiber systems, Results Phys., 52 (2023), 106903. https://doi.org/10.1016/j.rinp.2023.106903 doi: 10.1016/j.rinp.2023.106903
    [29] R. M. Zulqarnain, W. X. Ma, K. B. Mehdi, I. Siddique, A. M. Hassan, S. Askar, Physically significant solitary wave solutions to the space-time fractional Landau-Ginsburg-Higgs equation via three consistent methods, Front. Phys., 11 (2023), 1205060. https://doi.org/10.3389/fphy.2023.1205060 doi: 10.3389/fphy.2023.1205060
    [30] I. Ullah, K. Shah, T. Abdeljawad, Study of traveling soliton and fronts phenomena in fractional Kolmogorov-Petrovskii-Piskunov equation, Phys. Scripta, 99 (2024), 055259. https://doi.org/10.1088/1402-4896/ad3c7e doi: 10.1088/1402-4896/ad3c7e
    [31] M. Bilal, J. Iqbal, I. Ullah, K. Shah, T. Abdeljawad, Using extended direct algebraic method to investigate families of solitary wave solutions for the space-time fractional modified benjamin bona mahony equation, Phys. Scripta, 100 (2024), 015283. https://doi.org/10.1088/1402-4896/ad96e9 doi: 10.1088/1402-4896/ad96e9
    [32] I. Ullah, K. Shah, T. Abdeljawad, M. M. Alam, A. S. Hendy, S. Barak, Dynamics behaviours of kink solitons in conformable Kolmogorov-Petrovskii-Piskunov equation, Qual. Theory Dyn. Syst., 23 (2024), 268. https://doi.org/10.1007/s12346-024-01119-4 doi: 10.1007/s12346-024-01119-4
    [33] M. Bilal, J. Iqbal, R. Ali, F. A. Awwad, E. A. Ismail, Exploring families of solitary wave solutions for the fractional coupled Higgs system using modified extended direct algebraic method, Fractal Fract., 7 (2023), 653. https://doi.org/10.3390/fractalfract7090653 doi: 10.3390/fractalfract7090653
    [34] M. Bilal, J. Iqbal, I. Ullah, K. Shah, T. Abdeljawad, Using extended direct algebraic method to investigate families of solitary wave solutions for the space-time fractional modified benjamin bona mahony equation, Phys. Scripta, 100 (2024), 015283. https://doi.org/10.1088/1402-4896/ad96e9 doi: 10.1088/1402-4896/ad96e9
    [35] A. Khan, I. Ullah, J. Iqbal, K. Shah, M. Bilal, An innovative method for solving the nonlinear fractional diffusion reaction equation with quadratic nonlinearity analysis, Phys. Scripta, 100 (2024), 015209. https://doi.org/10.1088/1402-4896/ad952b doi: 10.1088/1402-4896/ad952b
    [36] I. Ullah, M. Bilal, A. Sharma, H. Khan, S. Bhardwaj, S. K. Sharma, A novel approach is proposed for obtaining exact travelling wave solutions to the space-time fractional Phi-4 equation, AIMS Math., 9 (2024), 32674–32695. https://doi.org/10.3934/math.20241564 doi: 10.3934/math.20241564
    [37] I. Ullah, Dynamics behaviours of N-kink solitons in conformable Fisher-Kolmogorov-Petrovskii-Piskunov equation, Eng. Comput., 41 (2024), 2404–2426. https://doi.org/10.1108/EC-04-2024-0358 doi: 10.1108/EC-04-2024-0358
    [38] M. Bilal, J. Iqbal, K. Shah, B. Abdalla, T. Abdeljawad, I. Ullah, Analytical solutions of the space-time fractional Kundu-Eckhaus equation by using modified extended direct algebraic method, PDE Appl. Math., 11 (2024), 100832. https://doi.org/10.1016/j.padiff.2024.100832 doi: 10.1016/j.padiff.2024.100832
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(615) PDF downloads(51) Cited by(1)

Figures and Tables

Figures(8)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog