Citation: Roberto Castro-Muñoz. Pressure-driven membrane processes involved in waste management in agro-food industries: A viewpoint[J]. AIMS Energy, 2018, 6(6): 1025-1031. doi: 10.3934/energy.2018.6.1025
[1] | Wenxue Huang, Yuanyi Pan . On Balancing between Optimal and Proportional categorical predictions. Big Data and Information Analytics, 2016, 1(1): 129-137. doi: 10.3934/bdia.2016.1.129 |
[2] | Dongyang Yang, Wei Xu . Statistical modeling on human microbiome sequencing data. Big Data and Information Analytics, 2019, 4(1): 1-12. doi: 10.3934/bdia.2019001 |
[3] | Wenxue Huang, Xiaofeng Li, Yuanyi Pan . Increase statistical reliability without losing predictive power by merging classes and adding variables. Big Data and Information Analytics, 2016, 1(4): 341-348. doi: 10.3934/bdia.2016014 |
[4] | Jianguo Dai, Wenxue Huang, Yuanyi Pan . A category-based probabilistic approach to feature selection. Big Data and Information Analytics, 2018, 3(1): 14-21. doi: 10.3934/bdia.2017020 |
[5] | Amanda Working, Mohammed Alqawba, Norou Diawara, Ling Li . TIME DEPENDENT ATTRIBUTE-LEVEL BEST WORST DISCRETE CHOICE MODELLING. Big Data and Information Analytics, 2018, 3(1): 55-72. doi: 10.3934/bdia.2018010 |
[6] | Xiaoxiao Yuan, Jing Liu, Xingxing Hao . A moving block sequence-based evolutionary algorithm for resource investment project scheduling problems. Big Data and Information Analytics, 2017, 2(1): 39-58. doi: 10.3934/bdia.2017007 |
[7] | Yaguang Huangfu, Guanqing Liang, Jiannong Cao . MatrixMap: Programming abstraction and implementation of matrix computation for big data analytics. Big Data and Information Analytics, 2016, 1(4): 349-376. doi: 10.3934/bdia.2016015 |
[8] | Tao Wu, Yu Lei, Jiao Shi, Maoguo Gong . An evolutionary multiobjective method for low-rank and sparse matrix decomposition. Big Data and Information Analytics, 2017, 2(1): 23-37. doi: 10.3934/bdia.2017006 |
[9] | Wenxue Huang, Qitian Qiu . Forward Supervised Discretization for Multivariate with Categorical Responses. Big Data and Information Analytics, 2016, 1(2): 217-225. doi: 10.3934/bdia.2016005 |
[10] | Yiwen Tao, Zhenqiang Zhang, Bengbeng Wang, Jingli Ren . Motality prediction of ICU rheumatic heart disease with imbalanced data based on machine learning. Big Data and Information Analytics, 2024, 8(0): 43-64. doi: 10.3934/bdia.2024003 |
Multi-nominal data are common in scientific and engineering research such as biomedical research, customer behavior analysis, network analysis, search engine marketing optimization, web mining etc. When the response variable has more than two levels, the principle of mode-based or distribution-based proportional prediction can be used to construct nonparametric nominal association measure. For example, Goodman and Kruskal [3,4] and others proposed some local-to-global association measures towards optimal predictions. Both Monte Carlo and discrete Markov chain methods are conceptually based on the proportional associations. The association matrix, association vector and association measure were proposed by the thought of proportional associations in [9]. If there is no ordering to the response variable's categories, or the ordering is not of interest, they will be regarded as nominal in the proportional prediction model and the other association statistics.
But in reality, different categories in the same response variable often are of different values, sometimes much different. When selecting a model or selecting explanatory variables, we want to choose the ones that can enhance the total revenue, not just the accuracy rate. Similarly, when the explanatory variables with cost weight vector, they should be considered in the model too. The association measure in [9],
To implement the previous adjustments, we need the following assumptions:
It needs to be addressed that the second assumption is probably not always the case. The law of large number suggests that the larger the sample size is, the closer the expected value of a distribution is to the real value. The study of this subject has been conducted for hundreds of years including how large the sample size is enough to simulate the real distribution. Yet it is not the major subject of this article. The purpose of this assumption is nothing but a simplification to a more complicated discussion.
The article is organized as follows. Section 2 discusses the adjustment to the association measure when the response variable has a revenue weight; section 3 considers the case where both the explanatory and the response variable have weights; how the adjusted measure changes the existing feature selection framework is presented in section 4. Conclusion and future works will be briefly discussed in the last section.
Let's first recall the association matrix
γs,t(Y|X)=E(p(Y=s|X)p(Y=t|X))p(Y=s)=α∑i=1p(X=i|Y=s)p(Y=t|X=i);s,t=1,2,..,βτY|X=ωY|X−Ep(Y)1−Ep(Y)ωY|X=EX(EY(p(Y|X)))=β∑s=1α∑i=1p(Y=s|X=i)2p(X=i)=β∑s=1γssp(Y=s) | (1) |
Our discussion begins with only one response variable with revenue weight and one explanatory variable without cost weight. Let
Definition 2.1.
ˆωY|X=β∑s=1α∑i=1p(Y=s|X=i)2rsp(X=i)=β∑s=1γssp(Y=s)rsrs>0,s=1,2,3...,β | (2) |
Please note that
It is easy to see that
Example.Consider a simulated data motivated by a real situation. Suppose that variable
1000 | 100 | 500 | 400 | 500 | 300 | 200 | 1500 | |||
200 | 1500 | 500 | 300 | 500 | 400 | 400 | 50 | |||
400 | 50 | 500 | 500 | 500 | 500 | 300 | 700 | |||
300 | 700 | 500 | 400 | 500 | 400 | 1000 | 100 | |||
200 | 500 | 400 | 200 | 200 | 400 | 500 | 200 |
Let us first consider the association matrix
0.34 | 0.18 | 0.27 | 0.22 | 0.26 | 0.22 | 0.27 | 0.25 | |||
0.13 | 0.48 | 0.24 | 0.15 | 0.25 | 0.24 | 0.29 | 0.23 | |||
0.24 | 0.28 | 0.27 | 0.21 | 0.25 | 0.24 | 0.36 | 0.15 | |||
0.25 | 0.25 | 0.28 | 0.22 | 0.22 | 0.18 | 0.14 | 0.46 |
Please note that
The correct prediction contingency tables of
471 | 6 | 121 | 83 | 98 | 34 | 19 | 926 | |||
101 | 746 | 159 | 107 | 177 | 114 | 113 | 1 | |||
130 | 1 | 167 | 157 | 114 | 124 | 42 | 256 | |||
44 | 243 | 145 | 85 | 109 | 81 | 489 | 6 | |||
21 | 210 | 114 | 32 | 36 | 119 | 206 | 28 |
The total number of the correct predictions by
total revenue | average revenue | |||
0.3406 | 0.456 | 4313 | 0.4714 | |
0.3391 | 0.564 | 5178 | 0.5659 |
Given that
In summary, it is possible for an explanatory variable
Let us further discuss the case with cost weight vector in predictors in addition to the revenue weight vector in the dependent variable. The goal is to find a predictor with bigger profit in total. We hence define the new association measure as in 3.
Definition 3.1.
ˉωY|X=α∑i=1β∑s=1p(Y=s|X=i)2rscip(X=i) | (3) |
Example. We first continue the example in the previous section with new cost weight vectors for
total profit | average profit | ||||
0.3406 | 0.3406 | 1.3057 | 12016.17 | 1.3132 | |
0.3391 | 0.3391 | 1.8546 | 17072.17 | 1.8658 |
By
We then investigate how the change of cost weight affect the result. Suppose the new weight vectors are:
total profit | average profit | ||||
0.3406 | 0.3406 | 1.7420 | 15938.17 | 1.7419 | |
0.3391 | 0.3391 | 1.3424 | 12268.17 | 1.3408 |
Hence
By the updated association defined in the previous section, we present the feature selection result in this section to a given data set
At first, consider a synthetic data set simulating the contribution factors to the sales of certain commodity. In general, lots of factors could contribute differently to the commodity sales: age, career, time, income, personal preference, credit, etc. Each factor could have different cost vectors, each class in a variable could have different cost as well. For example, collecting income information might be more difficult than to know the customer's career; determining a dinner waitress' purchase preference is easier than that of a high income lawyer. Therefore we just assume that there are four potential predictors,
total profit | average profit | ||||
7 | 0.3906 | 3.5381 | 35390 | 3.5390 | |
4 | 0.3882 | 3.8433 | 38771 | 3.8771 | |
4 | 0.3250 | 4.8986 | 48678 | 4.8678 | |
8 | 0.3274 | 3.7050 | 36889 | 3.6889 |
The first variable to be selected is
total profit | average profit | ||||
28 | 0.4367 | 1.8682 | 18971 | 1.8971 | |
28 | 0.4025 | 2.1106 | 20746 | 2.0746 | |
56 | 0.4055 | 1.8055 | 17915 | 1.7915 | |
16 | 0.4055 | 2.3585 | 24404 | 2.4404 | |
32 | 0.3385 | 2.0145 | 19903 | 1.9903 |
As we can see, all
In summary, the updated association with cost and revenue vector not only changes the feature selection result by different profit expectations, it also reflects a practical reality that collecting information for more variables costs more thus reduces the overall profit, meaning more variables is not necessarily better on a Return-Over-Invest basis.
We propose a new metrics,
The presented framework can also be applied to high dimensional cases as in national survey, misclassification costs, association matrix and association vector [9]. It should be more helpful to identify the predictors' quality with various response variables.
Given the distinct character of this new statistics, we believe it brings us more opportunities to further studies of finding the better decision for categorical data. We are currently investigating the asymptotic properties of the proposed measures and it also can be extended to symmetrical situation. Of course, the synthetical nature of the experiments in this article brings also the question of how it affects a real data set/application. It is also arguable that the improvements introduced by the new measures probably come from the randomness. Thus we can use
[1] | Castro-Muñoz R, Fíla V, Durán-Páramo E (2017) A Review of the Primary By-product (Nejayote) of the Nixtamalization During Maize Processing: Potential Reuses. Waste Biomass Valori, 1–10. Avaiblabe from: https://doi.org/10.1007/s12649-017-0029-4. |
[2] | Cheryan M (1998) Ultrafiltration and Microfiltration Handbook, CRC Press. |
[3] |
Van Der Bruggen B, Lejon L, Vandecasteele C (2003) Reuse, treatment, and discharge of the concentrate of pressure-driven membrane processes. Environ Sci Technol 37: 3733–3738. doi: 10.1021/es0201754
![]() |
[4] |
Cassano A, Conidi C, Ruby-Figueroa R, et al. (2018) Nanofiltration and Tight Ultrafiltration Membranes for the Recovery of Polyphenols from Agro-Food By-Products. Int J Mol Sci 19: 351. doi: 10.3390/ijms19020351
![]() |
[5] | Galanakis CM, Castro-Muñoz R, Cassano A, et al. (2016) Recovery of high-added-value compounds from food waste by membrane technology, In: Alberto Figoli, Alfredo Cassano, Angelo Basile, Author, Membrane Technologies for Biorefining, Woodhead Publishing, 189–215. |
[6] |
Castro-Muñoz R, Barragán-Huerta BE, Yáñez-Fernández J (2016) The Use of Nixtamalization Waste Waters Clarified by Ultrafiltration for Production of a Fraction Rich in Phenolic Compounds. Waste Biomass Valori 7: 1167–1176. doi: 10.1007/s12649-016-9512-6
![]() |
[7] | Galanakis CM (2015) The universal recovery strategy, In: C.M. Galanakis (Ed.), Food Waste Recover., 1 Ed., Elsevier Ltd, UK, 59–81. |
[8] |
Galanakis CM (2013) Emerging technologies for the production of nutraceuticals from agricultural by-products: A viewpoint of opportunities and challenges. Food Bioprod Process 91: 575–579. doi: 10.1016/j.fbp.2013.01.004
![]() |
[9] |
Castro-Muñoz R, Yáñez-Fernández J, Fíla V (2016) Phenolic compounds recovered from agro-food by-products using membrane technologies: An overview. Food Chem 213: 753–762. doi: 10.1016/j.foodchem.2016.07.030
![]() |
[10] | Castro-Muñoz V, Rodríguez-Romero R, Yáñez-Fernández V, et al. (2017) Water production from food processing wastewaters by integrated membrane systems: Sustainable approach. Water Technol Sci 8: 129–136. |
[11] |
Castro-Muñoz R, Yañez-Fernandez J (2015) Valorization of Nixtamalization wastewaters (Nejayote) by integrated membrane process. Food Bioprod Process 95: 7–18. doi: 10.1016/j.fbp.2015.03.006
![]() |
[12] | Cassano A, Conidi C, Giorno L, et al. (2013) Fractionation of olive mill wastewaters by membrane separation techniques. J Hazard Mater 248: 185–193. |
[13] |
Martín J, Díaz-Montaña EJ, Asuero AG (2018) Recovery of Anthocyanins Using Membrane Technologies: A Review. Crit Rev Anal Chem 48: 143–175. doi: 10.1080/10408347.2017.1411249
![]() |
[14] |
Castro-Muñoz R, Barragán-Huerta BE, Fíla V, et al. (2018) Current Role of Membrane Technology: From the Treatment of Agro-Industrial by-Products up to the Valorization of Valuable Compounds. Waste Biomass Valori 9: 513–529. doi: 10.1007/s12649-017-0003-1
![]() |
[15] | Cassano A, Conidi C, Galanakis CM, et al. (2016) Recovery of polyphenols from olive mill wastewaters by membrane operations, In Membrane Technologies for Biorefining,163–187. |
[16] |
Galanakis CM (2015) Separation of functional macromolecules and micromolecules: From ultrafiltration to the border of nanofiltration. Trends Food Sci Technol 42: 44–63. doi: 10.1016/j.tifs.2014.11.005
![]() |
[17] | Castro-Muñoz R (2018) Separation, Fractionation and Concentration of High-Added-Value Compounds From Agro-Food By-Products Through Membrane-Based Technologies, In: G. Smithers (Ed.), Ref. Modul. Food Sci., Elsevier Inc.. |
[18] | Castro-Muñoz R, Conidi C, Cassano A (2018) Membrane-based technologies for meeting the recovery of biologically active compounds from foods and their by-products. Crit Rev Food Sci Nutr, 1–22. |
[19] |
Cassano A, De Luca G, Conidi C, et al. (2017) Effect of polyphenols-membrane interactions on the performance of membrane-based processes. A review. Coord Chem Rev 351: 45–75. doi: 10.1016/j.ccr.2017.06.013
![]() |
[20] | Crespo JG, Brazinha C (2010) Membrane processing: Natural antioxidants from winemaking by-products. Filtr Sep 47: 32–35. |
[21] | Galanakis CM (2017) Membrane Technologies for the Separation of Compounds Recovered From Grape Processing By-Products, In Handbook of Grape Processing By-Products, 137–154. |
[22] |
Galanakis CM (2012) Recovery of high added-value components from food wastes: Conventional, emerging technologies and commercialized applications. Trends Food Sci Technol 26: 68–87. doi: 10.1016/j.tifs.2012.03.003
![]() |
1000 | 100 | 500 | 400 | 500 | 300 | 200 | 1500 | |||
200 | 1500 | 500 | 300 | 500 | 400 | 400 | 50 | |||
400 | 50 | 500 | 500 | 500 | 500 | 300 | 700 | |||
300 | 700 | 500 | 400 | 500 | 400 | 1000 | 100 | |||
200 | 500 | 400 | 200 | 200 | 400 | 500 | 200 |
0.34 | 0.18 | 0.27 | 0.22 | 0.26 | 0.22 | 0.27 | 0.25 | |||
0.13 | 0.48 | 0.24 | 0.15 | 0.25 | 0.24 | 0.29 | 0.23 | |||
0.24 | 0.28 | 0.27 | 0.21 | 0.25 | 0.24 | 0.36 | 0.15 | |||
0.25 | 0.25 | 0.28 | 0.22 | 0.22 | 0.18 | 0.14 | 0.46 |
471 | 6 | 121 | 83 | 98 | 34 | 19 | 926 | |||
101 | 746 | 159 | 107 | 177 | 114 | 113 | 1 | |||
130 | 1 | 167 | 157 | 114 | 124 | 42 | 256 | |||
44 | 243 | 145 | 85 | 109 | 81 | 489 | 6 | |||
21 | 210 | 114 | 32 | 36 | 119 | 206 | 28 |
total revenue | average revenue | |||
0.3406 | 0.456 | 4313 | 0.4714 | |
0.3391 | 0.564 | 5178 | 0.5659 |
total profit | average profit | ||||
0.3406 | 0.3406 | 1.3057 | 12016.17 | 1.3132 | |
0.3391 | 0.3391 | 1.8546 | 17072.17 | 1.8658 |
total profit | average profit | ||||
0.3406 | 0.3406 | 1.7420 | 15938.17 | 1.7419 | |
0.3391 | 0.3391 | 1.3424 | 12268.17 | 1.3408 |
total profit | average profit | ||||
7 | 0.3906 | 3.5381 | 35390 | 3.5390 | |
4 | 0.3882 | 3.8433 | 38771 | 3.8771 | |
4 | 0.3250 | 4.8986 | 48678 | 4.8678 | |
8 | 0.3274 | 3.7050 | 36889 | 3.6889 |
total profit | average profit | ||||
28 | 0.4367 | 1.8682 | 18971 | 1.8971 | |
28 | 0.4025 | 2.1106 | 20746 | 2.0746 | |
56 | 0.4055 | 1.8055 | 17915 | 1.7915 | |
16 | 0.4055 | 2.3585 | 24404 | 2.4404 | |
32 | 0.3385 | 2.0145 | 19903 | 1.9903 |
1000 | 100 | 500 | 400 | 500 | 300 | 200 | 1500 | |||
200 | 1500 | 500 | 300 | 500 | 400 | 400 | 50 | |||
400 | 50 | 500 | 500 | 500 | 500 | 300 | 700 | |||
300 | 700 | 500 | 400 | 500 | 400 | 1000 | 100 | |||
200 | 500 | 400 | 200 | 200 | 400 | 500 | 200 |
0.34 | 0.18 | 0.27 | 0.22 | 0.26 | 0.22 | 0.27 | 0.25 | |||
0.13 | 0.48 | 0.24 | 0.15 | 0.25 | 0.24 | 0.29 | 0.23 | |||
0.24 | 0.28 | 0.27 | 0.21 | 0.25 | 0.24 | 0.36 | 0.15 | |||
0.25 | 0.25 | 0.28 | 0.22 | 0.22 | 0.18 | 0.14 | 0.46 |
471 | 6 | 121 | 83 | 98 | 34 | 19 | 926 | |||
101 | 746 | 159 | 107 | 177 | 114 | 113 | 1 | |||
130 | 1 | 167 | 157 | 114 | 124 | 42 | 256 | |||
44 | 243 | 145 | 85 | 109 | 81 | 489 | 6 | |||
21 | 210 | 114 | 32 | 36 | 119 | 206 | 28 |
total revenue | average revenue | |||
0.3406 | 0.456 | 4313 | 0.4714 | |
0.3391 | 0.564 | 5178 | 0.5659 |
total profit | average profit | ||||
0.3406 | 0.3406 | 1.3057 | 12016.17 | 1.3132 | |
0.3391 | 0.3391 | 1.8546 | 17072.17 | 1.8658 |
total profit | average profit | ||||
0.3406 | 0.3406 | 1.7420 | 15938.17 | 1.7419 | |
0.3391 | 0.3391 | 1.3424 | 12268.17 | 1.3408 |
total profit | average profit | ||||
7 | 0.3906 | 3.5381 | 35390 | 3.5390 | |
4 | 0.3882 | 3.8433 | 38771 | 3.8771 | |
4 | 0.3250 | 4.8986 | 48678 | 4.8678 | |
8 | 0.3274 | 3.7050 | 36889 | 3.6889 |
total profit | average profit | ||||
28 | 0.4367 | 1.8682 | 18971 | 1.8971 | |
28 | 0.4025 | 2.1106 | 20746 | 2.0746 | |
56 | 0.4055 | 1.8055 | 17915 | 1.7915 | |
16 | 0.4055 | 2.3585 | 24404 | 2.4404 | |
32 | 0.3385 | 2.0145 | 19903 | 1.9903 |