Long-Term Quantitative Assessment of Women Survivability from Cancer: A Unique Descriptive Analysis

Statistical Process Control (SPC) methodologies are a set of statistical methods and techniques that were initially designed for industrial processes but could be adopted for non-industrial applications. The current prospective study aimed to provide a unique quantitative investigation of an epidemiological disease using the SPC program platform. The selected case herein was a long-term monitoring record of the yearly cancer mortality rates in women worldwide. Multidimensional segregation of the dataset into subgroups was conducted to visualize the clustering pattern based on nations (42 countries as boxplot), time and the Gaussian Mixture Model (two-interfering bell-shaped distributions approach). The trend of death rates versus the elapsed years would demonstrate a moderately negative correlation with the time following the theory of splines. Construction of control chart based on the fitted Weibull distribution showed a gradual steady improvement in survivability rates from malignancy. The greatest variations in the mortality ratios existed within the European countries.


Introduction
Malignancy is one of the most devastating diseases that affect humanity in the modern era of civilization. It threatens human lifestyle quality, productivity, financial and economic status of society and country, in addition to morbidity, disability and mortality [1]. Notably, women's health and well-being are severely impacted by the rising challenge of cancer disease in the community, which hinders them from serving their crucial role in any rising and developed nations. The burden of cancer among women is high in both High Income Countries (HICs) and Low Middle Income Countries (LMICs), although the distribution of most common cancers differs. This burden is predicted to extend as populations grow and age and because the prevalence of cancer risk factors increases in some countries, especially in LMICs. The costs of cancer are considerable and even catastrophic in HICs and LMICs alike. However, this burden of disease, loss of life, and economic hardship is not inevitable [2]. All of the foremost common cancers among women worldwide, including lung, breast, cervix, liver, and colorectum, have known means of prevention and/or early detection, which may be used to cut back incidence and mortality. Furthermore, carcinoma and cervical cancer, two of the highest four cancers in women worldwide have several proven prevention measures [3]. These two cancers combined represent about 20% of all cancer deaths among women [4].
Many of those deaths may be prevented through effective tobacco control, vaccination, and screening activities. There are varieties of effective cancer control measures available to countries of all resource levels [4]. Many of those measures are extremely cost-effective given the lives saved for the cost of the intervention, especially within the case of vaccination. To stop cancer within the future, countries must prioritize policies to cut back known cancer risk factors and make prevention accessible to any or all. For those that have cancer today, effective treatments and palliative care are also needed. Additionally, to those needs, cancer surveillance and research for the prevention and treatment are indispensable for the setting of cancer control priorities and for determining the foremost effective interventions and coverings in an exceedingly given context [4]. For LMICs, all of those activities may require support and commitment from the worldwide community.
While malignant diseases are holding and exhaustive and comprehensive databases that were collected from extensive surveys and gathering centers from national and international organizations, it is the interpretation of the results that would derive useful conclusions and insight through record's pattern [5,6]. The application of Statistical Process Control (SPC) methodologies was historically started at the end of the first quarter of the 20 th century in the industrial field the monitoring, control and improvement of the manufacturing processes as well as product quality by Walter A. Shewhart at Bell Laboratories [7,8]. In addition to the wide industrial applications, SPC techniques have been applied even in the improvement of service fields.
Moreover, SPC techniques were found to be useful in other non-industrial fields for assessment, evaluation and investigation of specific inspection properties or events [9,10]. Some interesting fields of implementation include -but not limited to -environmental monitoring (EM), microbiological water quality, surgical site infection, epidemiological diseases and outbreaks [11][12][13][14][15]. The previous studies embraced control charts, Pareto diagrams, Box-and-Whisker plots, distribution identification, correlation analysis and fitted line curves, in addition to other conventional analysis tools such as descriptive statistics. While epidemiological diseases constitute a continuous threat that strike humanity, it would be useful to develop a quantitative descriptive tool that describes the progression and behavior of these diseases.
One of the important indicators that could provide insight into cancer progression is the mortality ratio of the patients that could be found in the database of national and international health organizations. The application of statistical tools and process control methodologies was sought as a unique metric means in the evaluation and comparison of the epidemiological status that could support in the decision-making and cancer management and control. Accordingly, the present analysis might reveal new horizons for researchers to a simple, fast and useful way to study other diseases, outbreaks and pandemics.

SPC Perspective of Long-Term Global Trend of Cancer in Women
Cancer mortality rates records of different countries were obtained as a downloadable database file from the websites: https://www.cancer.gov/ and https://www.who.int/ [16,17] and women data were extracted from the Excel file by filtering and arranged chronologically (Table S1). Three different computer programs were used in data processing combined as statistical software platform viz Prism -GraphPad v6.01, Minitab v17.1.0 and Excel built-in XLSTAT v2014.05.03 [18][19][20][21]. The Prism -GraphPad v6.01 commercial scientific software was used for the preliminary data description followed by defining the significance estimation of the change of the yearly mortality rates with time. The Excel built-in XLSTAT v2014.05.03, which is a statistical suite add-in, was used for drawing of Gaussian Mixture Model (GMM), which is a probabilistic model to spot typically disseminated subpopulations inside a general populace. Blend models, as a rule, do not require prior knowledge of datasets subordination to the subgroup, permitting the model to become familiar with the subpopulations spontaneously [22]. Thus, it will be important in this case for viewing the overall data-clustering pattern for the record from 1960 to 2017.
On the other hand, Minitab as a statistical tool package was assigned for fitting line modeling, box plot creation, histogram drawing and data trending using process behavior chart [23]. Two-dimensional data segregation was done based on GMM and individual countries found in the record to identify clustering pattern of data. Construction of the mean (X-bar)-variation ( -S) trending chart was executed according to the distribution identification scheme in Minitab. Shewhart charts show mean of the death rate and the standard deviation along with the event window i.e. Upper Control Limit (UCL) and Lower Control Limit (LCL).

Descriptive Statistics and Correlation Output
A preliminary examination of the database for the basic statistical analysis yielded a result in Table 1 showing the coinciding countries with the annual values. Data are mildly positive or right-skewed [24]. Failure to comply with normality distribution was followed by the screening of the closest possible spreading pattern that might fit the record. A Weibull distribution hypothesis could not be rejected as a suitable fit for the raw dataset based on the Anderson-Darling test (AD). No aberrant values could be detected through the overall record. Thus, mortality rates showed apparent concatenation in the values with a range of 193.1 deaths/100000 cases with half and four-fifths of data record resides within a range of 46.3 and 86.5 deaths/100000 cases, respectively, suggesting extensive tailing. The mean and the median were almost coincided with the geometric mean very close to the average confirming that there is no extreme value(s) distant from the group collectively [25]. However, the arithmetic mean is slightly greater than that of the geometric mean as expected for the basic rule in mathematics [26]. This gap is reflected similarly on the Confidence Intervals (CIs) which showed the corresponding shift too. The relative standard deviation (RSD) showed significant dispersion of data around the mean of about 18%. In addition, Kurtosis value suggests data broadening with shoulders that is a variation from the normal bell-shaped spreading if the normal distribution is expected, a sign of possible mixed intervening patterns [24]. This would probably be the result of the impact of annual and regional variations that was reflected in the total record of women cancer mortality. Table 2 demonstrates the moderately negative correlation between the women mortality rates and the time in years [27,28]. Accordingly, there was an improvement of women survivability with time passing, which might be partially attributed to the development of effective detection and control measures for the containment of cancer epidemiology [29].

Dataset Clustering Approaches: A Multidimensional Analysis
Data stratification using GMM analysis estimated two possible mixed bell-shaped spreading, indicating a global overlapping distribution of mortality rates among the studied nations in Figure 1. Cumulative Distribution Function (CDF) and Q-Q plots showed the convenience of this assumption. The first and the second distributions were called (Φ Phi) and (Ψ Psi) with contribution proportions 0.57 and 0.43, respectively (Figure 1). However, there was no clustering tendency that could show any heterogeneity according to the Normalized Entropy Criterion (NEC), which was greater than one with the selected the Integrated Completed Likelihood (ICL) criterion with the standard algorithm (EM) Expectation Maximization [30]. The means and covariances of Φ and Ψ for malignancy mortality rates in women were 222.7, 241.0 and 1892.4, 909.9, respectively.  The highest mortality rates from cancer occurred primarily in the EUR region with the domination of economically rich nations suggesting that the overall survivability of women from cancer depends on other factors that seemed to be important to be controlled in addition to the improvement of the healthcare quality. This assumption might be partially in agreement with previous analysis provided by some researchers that had linked the prevalence of some types of cancer in the high-income countries with some adverse lifestyles such as smoking, obesity and alcohol consumption [31,32]. Nevertheless, it should be noted that beforehand rigorous control, awareness and actions taken about cancer detection and diagnosis associated with controlled monitoring might be one of the important reasons for the comprehensiveness of data records of high-income and developed nations if compared with those from the relatively lower income-countries. This hypothesis may be a factor that contributed to a possible underestimation of the actual overall estimation of the mortality rates in the developing and low-income countries.
The time-based data segregation shows that, Europe WHO region was the most versatile with divergent mortality rates between nations, as it is rich with country data, in contrast to the African area that includes only South Africa. This could also be seen on the extend of the outlier figures (denoted by asterisks) above and below the box plot graphs in Figure 3. South Africa data record only started in 1993, which is later than the other three regions by 33 years and reaching its peak in 1999 (Figure 3). The annual spreading of death cases rates for AMR and WPR was less than that of EUR. However, the distribution of WPR values was generally more compact than that of AMR, where greater pervasion could be observed between annual records. The general profile can be seen in Figure 3 with the yearly steady and almost regular decline in death rates started with AMR in late 1980s, EUR and WPR at almost 1995.

Fitted Line of Yearly Average Death Rates from Cancer for the Affected Women Populations versus Time
Dataset pattern showed a steady decline in the global mortality rates from cancer after the initial high plateau that could be demonstrated by the quadratic relation with significant correlation (Figure 4). The regression equation was expressed through Polynomial Regression Analysis of average mortality rates from cancer for women versus the time (in years).

Trending of Global Women Survivability from Malignancy Using Process-Behavior Chart
In Figure 5, Histogram showed a pattern of dual hump that could be explained from GMM model. Boxplot graph is indicative of the pattern for the studied WHO regions (Figure 5). AFR demonstrated the lowest mean and spreading due to the inclusion of ZAF only for relatively a short period from the region, in contrast of EUR with upper and lower aberrant records from some countries in the European region from Island (1960) and Turkey (2009, 2010 and 2011). The survivability rate in WPR is better that AMR. Analysis of the distribution fitting showed the convenience of the Weibull distribution based on the probability plot examination as could be observed in Figure  5. Plots the method mean (Xbar chart) and method standard deviation (S chart) over time for variable information in subgroups ( Figure 5). This dual type of control chart is extensively used to observe the steadiness of procedures in lots of fields. Both the Xbar and S charts are displayed together due to the need to interpret each chart to decide if the trend is stable [32]. The procedure variation in s chart was almost stable within the control limits and tendency toward lower annual variation since 1979. The Xbar chart in Figure 5 showed initially intermittent excursions in the first few years followed by a successive shift in the annual mortality means leading to a declining curve with the improvement of survivability over time until exceeding the lower bound, which is a desirable outcome [33]. The last record of 2017 was incomplete during data registration that may explain the apparently sudden rise in mortality ratio due to the inclusion of few countries with high values in the process-behavior chart of Figure 5.

Conclusion
SPC methodologies could provide a unique insight into the characteristics and patterns of epidemiological diseases. Quantitative assessment of mortality and/or morbidity levels may be a useful indispensable tool in management and future decision-making. Further studies are crucial to access the absolute number of deaths since using the ratio record alone may not reflect the real progression of the disease. For instance, the annual increase in the number of cancer patients may counterbalance the efforts to control deaths and the actual yearly mortality number could be a rising challenge to the efforts of cancer healthcare sector due to exacerbation of cancer epidemiology. In addition, the accuracy of the investigation and analysis is largely dependent on the Impartialness and comprehensiveness of the database. However, cancer researchers could adopt similar techniques to track different types of malignancies in their countries over time to access the changes in the epidemiological pattern and the effectiveness of the control measures. The application of SPC methodologies would be encouraged to be projected in future studies of other diseases such as Coronavirus disease (COVID-19) global cases.

Supplementary Files
The Supplementary Material for this article can be found online at: https://doi.org/10.36462/H.BioSci.20208 Supplementary Table S1: List of countries whose data have been used in this study and their corresponding abbreviation.