Update: 2025-07-17
Hossein Bevrani
Faculty of Science / Department of Statistics
M.Sc. Theses
-
Statistical Inference of Generalized Poisson Regression Hurdle Model
2025Counting data is one of the most widely used data types because it is found in various sciences such as medicine, pharmacy, management, industry, and economics. Generally, a count regression model such as Poisson regression is used to analyze this type of data, but when the count data contains extra zeros, the results will not be efficient. Since extra zeros in the data cause over-dispersion, zero-inflated Poisson (ZIP) regression models are used for the Poisson distribution. Hurdle regression (HR) models are another type of modified count regression, HR is an effective model for dealing with zero-inflated data, which in combination with the generalized Poisson distribution can deal with over-dispersion or under-dispersion in addition to the problem of extra zeros. In this dissertation, we will study and analyze the generalized Hurdle Poisson regression model (GPHR) which has two properties of extra zero values and non-equality variance and mean. We will use different approaches to estimate the parameters of the discussed model. We will simulate the proposed methods by coding in R and compare the results using the evaluation criteria. Finally, we will apply the selected method to real data.
-
Data mining and statistical analysis of liability insurance information for insurers of Iran Insurance Company
2025Data mining, as one of the modern methods of data analysis, plays a significant role in enhancing decision-making processes in liability insurance. Through data mining, insurance companies are able to examine and analyze large-scale customer data, identify behavioral patterns, and provide optimized services. This technique aids in more accurate data analysis and better resource management for insurance companies. This study aims to investigate data mining in the field of liability insurance and conduct statistical analysis of the liability insurance customers of Iran Insurance Company. In the first chapter, key concepts such as data mining, liability insurance, clustering, log-linear model, and analysis of variance are introduced. The second chapter is dedicated to the description and analysis of the data related to the company’s liability insurance customers from 1393 to 1402. In the third chapter, statistical inference and data mining of the policyholders' data are performed based on statistical techniques such as clustering, analysis of variance, and the log-linear model.
-
Signature-based reliability and maintenance planning for a load- sharing coherent system operating in a random environment
2024In various industries, especially in fields involving complex and sensitive systems such as aerospace, energy, and transportation, one of the key approaches is focusing on reliability and preventive maintenance. This dissertation addresses this approach with the aim of enhancing the efficiency of coherent systems, and reducing environmental and repair costs. Emphasizing reliability and preventive maintenance enables organizations to predict and prevent failures, thereby maintaining optimal system performance and more effectively avoiding unnecessary costs. using the signature technique and a generalized Farlie-Gumbel-Morgenstern (FGM) copula function, presents a generic mean residual lifetime (MRL) model for the reliability analysis of a load-sharing coherent system. The present approach differs from earlier models in that in addition to load-sharing phenomenon it simultaneously considers the effect of operating conditions on the system. Further, using the developed model and the renewal-reward argument, an age replacement policy is investigated. The proposed MRL model and the behavior of the optimal solution as the model parameters change are illustrated through numerical examples
-
Benefits and Downsides of P-value and Bayes Factor in the Statistical Tests
2024In statistical testing, the use of p-values and Bayes factors for evaluating and analyzing test results is common. However, understanding the differences, advantages, and disadvantages of both methods can enhance our comprehension of statistical inference rules. In this thesis, we aim to examine the advantages and disadvantages of using p-values and Bayes factors in statistical tests, assessing the strengths and weaknesses of each method. By investigating this issue, we can determine the best approach for application in statistical tests to enhance the accuracy and power of statistical result analysis.
-
Analyzing the traffic accident frequency in Iraq using log-linear models
2024It's crucial to focus on urban and interurban traffic accidents as we are witnessing an increase in fatalities and injuries due to road accidents each year. This thesis aims to analyze the factors involved in traffic accidents by examining multi-dimensional cross-tables showing the frequency of deaths and injuries from 2020 to 2021. We will extract connection diagrams to illustrate the relationships between these variables. Using a specialized linear logarithm model for categorized data, we will estimate the model's parameters and interpret the results with the help of connection diagrams.
-
Artificial Neural Network Analysis Using Statistical Approaches
2024Neural networks, often viewed as black boxes due to their complex composition of functions and parameters, pose significant challenges for interpretability. This study addresses these challenges by exploring various methods for interpreting neural networks, focusing on both theoretical and practical aspects. Firstly, we demonstrate that the neural network estimator \ f_n \ can be interpreted as a nonparametric regression model constructed as a sieved M-estimator. This approach ensures the weak convergence of \ f_n \ within the metric space \ (\Theta, d) \, providing a solid theoretical foundation for understanding neural networks. Building on these theoretical insights, the study introduces statistical tests designed to assess the importance of input variables, offering a clearer understanding of their contributions to the model. Dimensionality reduction algorithms are also explored, highlighting their role in simplifying the model, enhancing both interpretability and accuracy. Furthermore, we show that statistical confidence intervals enhance model reliability by providing more robust estimates. Statistical tests are also employed to evaluate and interpret the performance of individual neurons, identifying their contribution to classification tasks and providing insights into the network's functioning. To validate these theoretical findings, simulations were conducted and applied to the IDC and Iris datasets. These experiments illustrate the practical utility of the proposed methods and affirm the effectiveness of the neural network estimator in real-world applications. This study contributes to the emerging field of Explainable Artificial Intelligence by presenting methodologies for interpreting traditional deep artificial neural networks through statistical frameworks, thereby facilitating a better understanding of the relationship between inputs and outputs and the performance of individual network components.
-
Exploring the characteristics and trends of crude oil and its refined products in an Iraq refinery using statistical analysis
2024One of the important sources of Iraq's income is crude oil, most of which is exported directly and a part of which will be converted into other products needed in refineries. In this thesis, we will examine and dissect the information on one of Iraq's refineries. For this purpose, while describing the information related to the production of that refinery using the techniques available in descriptive statistics, an attempt will be made to provide appropriate models for predicting the amount of production of the products. In the end, we will arrive at the final model for each product by using the model selection criteria, and we will act on the forecast for the next six periods using the selected final model.
-
Data Mining and Statistical Analysis of Economic Indicators of Iran and Russia
2024Comparing the economic indicators of Iran and Russia is very important because of the pivotal role of these two countries in the industry, trade and economy of the region and the world. In this thesis, we are going to do a comparative analysis of their economic indicators focusing on these two countries. The purpose of this research is data mining and statistical analysis of selected and important economic indicators of two countries, Iran and Russia, in order to reach a deeper and more realistic understanding of their economic situation. The selected economic indicators are related to the years 1993 to 2021, which were extracted from the World Bank website. After refining and architecture of data and selecting important economic indicators, in addition to single and multivariate description, statistical and data mining techniques such as: multiple regression, factor analysis, correlation analysis and time series will be used.
-
Statistical inference for lifetime models under joint censoring data
2024In many lifetime studies, researchers often need to collect and analyze partial information about experimental units to save time and resources. This type of data, known as censored data, arises when certain observations are not fully recorded. One common scenario is when comparing products from multiple production units simultaneously and under the same conditions in a lifetime test. In such cases, a joint censoring scheme is recommended. For instance, imagine a factory with two television production lines. By employing a joint censoring scheme, the factory manager can compare the quality and useful lifespan of TVs produced by both lines and make informed decisions regarding process improvement based on the test results. This approach allows for statistical inferences about the distribution parameters of both populations simultaneously. Noteworthy lifetime distributions applicable in this context include the Burr-XII and Poisson-ثexponential distributions. The Burr-XII distribution outperforms other distributions (such as Weibull, Gamma, Rice, and Extreme Value) due to its probability model and flexibility in survival and reliability analysis. Additionally, the Poisson-exponential distribution suits scenarios where the cause of failure is unknown or the hazard rate function is increasing. This thesis focuses on the estimation of parameters for two Burr-XII distributions and two Poisson-exponential distributions under a joint type-II censoring scheme. Firstly, maximum likelihood estimators are computed using the expectation-maximization algorithm. Bayesian estimators are then obtained using the importance sampling method and considering informative and non-informative prior distributions, taking into account various loss functions like squared error, linear-exponential, and generalized entropy. To enhance efficiency, the thesis presents linear shrinkage and shrinkage pretest estimators. Additionally, approximate confidence intervals are calculated using the missing information matrix concept, bootstrap intervals, as well as credible and high posterior density intervals. The thesis also addresses point and interval prediction for censored units using classical and Bayesian methods within the joint type-II censoring scheme, considering both Burr-XII and Poisson-exponential distributions. The performance of the estimators is evaluated through Monte Carlo simulations, assessing bias, mean square error, efficiency, and interval length. After validating the simulation results and estimator efficiency, the proposed methods are applied to a set of real data to estimate unknown parameters under joint censoring
-
Estimation of logistic regression parameters in the presence of multicollinearity with application to medical data
2023Logistic regression, a widely utilized regression model for binary response variables, relies on the maximum likelihood method for parameter estimation. However, when multicollinearity exists among independent variables, the estimators become ineffective due to variance inflation. To address this issue, various methods, including ridge regression, have been proposed. Ridge regression is crucial in estimating the ridge adjustment parameter, and several formulas have been suggested for this purpose. This thesis aims to introduce and compare a comprehensive set of ridge parameter formulas for logistic regression, utilizing efficiency criteria. To achieve this objective, Monte Carlo simulations will be conducted by varying correlation, the number of predictor variables, and sample size. The performance of selected ridge estimators will be compared, and the most suitable ones will be identified and recommended. Furthermore, the introduced ridge estimators, with different parameters, will be applied to real-world examples in the field of medical sciences. The implementation of the research utilizes R software, and the codes employed are presented in a dedicated section, ensuring practicality and accessibility.
-
Bayesian inference in regression models with the restricted parameter space
2023Statisticians have recognized the efficiency improvements that result from incorporating prior information about parameters or restrictions on parameter values in inferential problems. These restrictions often arise naturally in various sciences and can be expressed as linear equality and inequality constraints. In regression models, linear and generalized linear regression models, linear equality restrictions are determined based on the researcher's information or based on the selection of significant variables in the model. Linear inequality restrictions, on the other hand, can be imposed based on the phenomenology of the study under investigation or research prior information, and are essential for guaranteeing the validity of scientific theories. Therefore, it is crucial to take into account the valuable information provided by these restrictions in model parameter estimation since ignoring them can reduce the quality of the estimator. Several works have been done in the field of classical inference by considering linear equality and inequality restrictions. However, in Bayesian inference, particularly in generalized linear models, incorporating these restrictions is still useful due to the advantages of this method. This thesis focuses on Bayesian inference in regression models, specifically on estimating parameters in the linear regression model with linear inequality constraints. The available algorithms for this task will be investigated, and efforts will be made to improve and increase their efficiency. The next step of the thesis involves Bayesian inference in generalized linear models with linear inequality constraints. An algorithm will be proposed to improve the performance of the restricted Bayesian estimator compared to both the unrestricted Bayesian estimator and the maximum likelihood estimator. The proposed algorithm aims to yield a more suitable performance for the restricted Bayesian estimator. This thesis will apply our proposed inference method to gamma and beta regression models. Since multicollinearity among independent variables is a common problem in regression models, we will also address this issue and compare the efficiency of the proposed estimators with the ridge estimator. Finally, we will investigate the application of the proposed Bayesian restricted estimators to a real dataset
-
Improving the Efficiency of Machine Learning Algorithms by Penalized Regression Methods
2023Nowadays, in addition to increasing the accuracy of existent algorithms, the reduction of computational time is a challenging issue that has attracted much attention in prediction topics. Since in this case study, the existent base algorithms do not have enough efficiency and accuracy, we use a combination of machine learning algorithms and statistical methods to solve this problem. In this thesis, in order to improve the efficiency of machine learning algorithms, three combined approaches are proposed. In the first approach, the combination of Random Forest and penalized regression methods are used to reduce the number of trees. This technique automatically reduces the trees using penalized methods and aggregates the remaining trees, thereby reducing the computational load. The second approach focuses on improving the model's accuracy by clustering of input data, identifying homogeneous subsets of data, assigning them to similar groups, and reducing the number of Random Forest trees within each cluster. In this way, within each cluster, Random Forest is used as predictor. Finally, by reducing the number of trees within each cluster and total clusters, the model error and computational load are reduced and the performance of model is improved. Continuing, a third approach is proposed that uses state-of-the-art algorithms and structures for deeper and more accurate prediction. In this approach, the combination of deep learning algorithm, penalized regression methods and ensemble learning methods are used. Deep regressions extract relationships between features and make prediction as learner. The penalized regression methods reduce the number of predictors, while ensemble methods aggregate the remaining learners. Finally, the proposed approaches are evaluated with other base models based on simulation study and three real datasets. The results show that the proposed approaches have better and efficient performance than the existing methods
-
Customer Shopping Cart Analysis Using Data mining Methods
2023Customer basket analysis studies the components of the basket of products purchased by customers in a single purchase. There is the idea that the portfolio of purchased products reflects the dependencies between products or the purchases made between different categories of products, and determining these dependencies can be a good basis for marketing and sales decisions. This thesis aims to find and check their shopping patterns by analyzing the shopping carts of customers in the Florence grocery store in Tabriz city and checking the transactions recorded during one month using Excel and PowerBi Microsoft software
-
Comparison of statistical methods and artificial intelligence in forecasting the stock index of Apple, Google and Amazon
2023Stock market prediction models are considered an important research activity because stock prices lead to large profits due to correct decisions and large losses due to wrong decisions. Stagnation and data volatility have made stock market predictions a major challenge for investors who use their money to earn profits. Mathematical methods and learning tools are used for stock market predictions. In this thesis, an attempt is made to compare and compare classical statistical methods, machine learning and artificial intelligence to predict stock prices. Also, as an application of the stock price of some famous companies such as Apple, Google and Amazon, we modelled with the aforementioned methods and selected the optimal model to predict the stock price of each of these companies.
-
Influence Diagnostics and Analysis for High-Dimensional Regression
2023The availability of high-dimensional data in which the number of variables is considerably larger than the number of observations, is now commonplace in many scientific fields, in particular genomics and molecular biology. Analysis of high-dimensional data often assumes that the number of variables truly related to the response of interest is small. The search for a small number of important variables has emphasized the importance of model selection methods in high-dimensional settings. Increased utilization of the penalized likelihood methods has been driven by the prevalence of high-dimensional datasets. However, when the number of observations is relatively few compared to the number of covariates, each observation can potentially have tremendous influence on model selection and inference. Therefore, the identification of influential observations is important in penalized methods. In this thesis, lasso regression influence measures are introduced for gauging the influence of an observation on the model selection component of a fitted regression. In addition, these measures under the elastic net method are studied for the identification of influential observations in high-dimensional data. The elastic Net method combines feature elimination from the lasso and feature coefficient reduction from the ridge model to improve model's predictions. Through simulation and real datasets is illustrated that introduced influence measures effectively identify influential observations and can help reveal, otherwise hidden, relationships in the data.
-
Statistical Inference and Simulation for Spatial Regression
2023Exploratory spatial data analysis is often an initial step for formal modeling approaches that seek to establish relationships between dependent variable and independent variables where spatial dependence exists. The focus in this thesis is on spatial regression models in a simple cross-sectional space, which includes regression models of spatial lag, spatial error and spatial Durbin. For this purpose, while examining the mentioned model and how to estimate its parameters, we will get to know by performing various types of spatial regression with R software. At the end, we will show the application of the investigated models with real data and select the appropriate model.
-
Classical and Bayesian parameters estimation of lifetime distributions based on censored data
2022In this thesis, classical and Bayesian estimators have been discussed for two- parameter Exponential-Logarithmic distribution based on type-I hybrid and progressive type-II censored samples. Maximum Likelihood Estimators (MLEs) in the traditional tech- nique are noted to lack closed form expressions. To compute the MLEs, we suggest using both the EM and SEM techniques. The asymptotic confidence intervals are built using the observed Fisher information matrix and the missing information principle. We develop the Bayes estimators using the Bayesian method in relation to various symmetric and asymmetric loss functions. We employ Tierney-Kadane and the significance sampling approaches in this regard. For illustrative purposes, Monte-Carlo simulation and a few real data set examples have been provided.
-
Fitting the truncated regression model to count data
2022Regression is used to predict a count-dependent variable based on several independent variables. Because the dependent variable is count, simple linear regression is not used much, and more count regressions are used, the most common of which are Poisson regression and negative binomial regression, which belong to generalized linear models. In this thesis, while examining these two models in advance, we pay attention to the models that are truncated, and we will conduct a simulation study to analyze the performance of the proposed truncated regression models against the standard models, and in this regard, we will compare these models with We will compare. Finally, a practical example with real data is provided for the application of truncated models.
-
Estimation of gamma regression parameters in the presence of multicollinearity
2022The gamma regression model is one of the regression models that has found a lot of uses, particularly in a variety of sciences such as engineering, medicine, insurance, and the humanities, among other areas. When the response variable can only take positive restricted to positive real numbers, this model is utilized. The maximum likelihood approach is normally the one that is used whenever the covariates do not have any link with each other. This is because the maximum likelihood method is the most accurate estimation method. However, just like in linear regression models, it is possible that we come across situations in which there is a correlation or linear relationship between the covariates. In such a scenario, the inference that is drawn using this method will be incorrect due to the large estimate that is produced. In this thesis, we investigate how to estimate the parameters of a gamma regression model when there is multicollinearity between the covariates. We begin by presenting the ridge estimator for the gamma regression model. After that, we use the various techniques that have been suggested to estimate the ridge parameter in other regression models in the gamma regression model. Finally, we use Monte Carlo simulation to determine which ridge estimator provides the most accurate results. After that, we discuss the gamma regression model, and then apply the selected estimator to a practical scenario.
-
Statistical inference for finite mixture model of linear regression
2022One of the most useful techniques in multivariate topics is linear and nonlinear regression. There is a dependent variable in regression, which in classical methods assumed to be a normal variable, which is problematic in practice. For this reason, in recent decades’ regression has been made with the dependent variable Poisson, Gamma, exponential and other distributions. The use of finite mixed distribution has also added to the richness of regression. In this dissertation, in addition to reviewing linear regression with finite mixed distribution, we chose the Bayesian variable in this issue.
-
Statistical Inference in High-Dimensional Generalized Linear Models
2021The increasing advancement of various sciences including medicine and health and management has led to the production of large volumes of data. In such cases, variable selection is a powerful tool for exploration in various fields that has attracted much attention over the years. Variable selection using penalized regression methods is based solely on effective predictive variables. Therefore, failure to identify variables with weak predictive properties leads to a decrease in the efficiency of inferences obtained from regression methods. In this thesis, the strategy of Stein-type shrinkage and positive Stein estimators with the aim of improving the performance of forecasting in high-dimensional generalized linear models is studied. The introduced estimators are a linear combination of weighted ridge estimators and lasso-type estimators. Under some regularity conditions, the asymptotic behavior of the proposed estimators is examined. In addition, in the form of a simulation study and a real example, their performance is evaluated from the perspective of the mean square error.
-
Statistical Inference of Zero-Inflated Count Models
2021Poisson regression and negative binomial models are a subset of count models in which the response variable takes the nonnegative integers. Sometimes a large number of observations may be zero, so that this number is more than the produced zeros by an ordinary count model. To analyze such data, zero-inflated count models are used in which the data are assumed to be generated from an ordinary count distribution and a degenerate distribution at zero. In this case, if the generated data from the count distribution have the same mean and variance, the zero-inflated Poisson model is used, but when these data are overdispersed, the zero-inflated negative binomial model is appropriate to model the relationship between the response variable and predictor variables and estimation of model parameters. When there is multicollinearity between predictor variables in a regression model, the ridge estimation method and Liu-type estimation are used to estimate the model parameters, so that their efficiency is more than the maximum likelihood method. On the other hand, in order to improve the performance of estimating model parameters, using prior information about some of parameters that do not have a significant effect on the response variable and combination of this information with information from the random sample can be useful. Prior information, which appears as restrictions on the model parameters, is tested before being used in the model. If the information are valid, they improve the performance of model parameters estimation. The estimators obtained from this method are called shrinkage estimators, which include linear shrinkage, pretest, shrinkage pretest, Stein-type and positive Stein-type estimators. In this thesis, in order to improve the performance of the model parameters estimation, the shrinkage estimators are proposed in the zero-inflated negative binomial model, negative binomial mixed model, zero-inflated negative binomial mixed model, and Liu-type shrinkage estimators in the zero-inflated negative binomial model. In order to compare the performance of different estimators, the relative efficiency of the estimators in terms of the mean squared error is examined using Monte Carlo simulation. Also, the asymptotic bias and risk of the different estimators are theoretically expressed and proved. The efficiency of all the proposed methods in real data set is computed.
-
Bayesian beta regression models
2021Often in many applications, we are interested in examining the impact and relationship of different factors on a variable that is expressed as a ratio and percentage, such as disease rate, percentage of a disease in a particular area, unemployment ratio and inflation rate. To aim it, a appropiate regression model is chosen according to the type of response variable. When the response variable is in form of percentage or ratio, or even defined in the range like (a, b), the beta regression model is proposed. Beta regression model belongs to the family of generalized linear models. In this thesis, first the beta regression model by considering different models on the precision parameter are introduced and parameters of the model have been estimated in classical framework. Then we consider bayseian estimation of parameters. We show performance of the bayesian estimation of the parameters by using a Monte Carlo simulations and applying the bayesian model of beta regression to real data set will be the final application of this research.
-
Ground water quality monitoring network design using entropy method
2021Groundwater quality monitoring networks are very important for network design due to financial constraints and changes in monitoring and determining a sufficient number of monitoring stations and their locations. One of the most promising approaches to network design is the use of entropy methods, which are preliminary studies using the principle of entropy maximization and the use of theoretical information. Types of entropy include marginal entropy, common entropy, conditional entropy, information transfer, and total correlation, which are essential and are commonly used in entropy programs to design a monitoring network. In the study area, artificial neural network (ANN) and K-Neighborhood (KNN) methods are used to estimate blind spot information that is not available and then evaluate the information difference using the optimization algorithm (DE). The net obtained by the stations is maximized. The number and location of optimal monitoring stations are selected from the active wells located in the study area. The spatial distribution of the selected wells provides sufficient coverage of the entire area and, at the same time, provides the maximum useful information about the groundwater quality status. The choice of redundant stations is also avoided by considering operating and time costs. The proposed entropy method is compared with the error minimization and K-mean clustering methods to optimize the groundwater quality monitoring network.
-
Beta Regression Model and Its Applications
2021The linear regression model is a good method for predicting one variable (the response variable) in relation to other variables (auxiliary variables or independent variables). The model’s fundamental assumption is that the response variable contains real numbers. However, in practice, we frequently encounter cases where the response variable is restricted to specific ranges, such as data in the form of percentages and percentages restricted to (0, 1). Ratio and percentage data, on the other hand, are frequently skewed, and inference based on the assumption that the data is symmetric can be misleading. The beta regression model, defined in the (0, 1) range, is a suitable regression model for this type of data. The response variable is assumed to follow the beta distribution in this regression model. In this thesis, we present the beta regression model and the method for estimating model parameters when the independent variables are orthogonal and non-orthogonal. In first case, we will use the maximum likelihood estimators and for second one, we will use the common method of combating the multicollinearity in regression model such as Ridge and Liu estimator. In the following, by using Monte Carlo simulation, the mean squared error criterion and efficiency of the estimators are calculated. Finally, the applications of these estimators on the real life data set is examined.
-
Statistical Model Selection for Gross Domestic Product
2021GDP is one of the main and most practical economic indicators; Therefore, its prediction has always been considered by activists and researchers in economics and related sciences, including statistical sciences. In this research, the main purpose is find the most appropriate statistical model and optimal pattern for predicting GDP. Time series models have been selected as one of the best methods for forecasting GDP but among these, the emphasis of this research is on seasonal time series models (Classical decomposition method, Holt-Winters method) and techniques related to $ARIMA$ models, $SARMA$ and $SARIMA$ seasonal series. In the next step, Goodness of fit indicators in time series models are reviewed and the difference between linear and nonlinear models, efficiency and error rate of each are analyzed. Finally, by comparing the obtained results, the best model selected.
-
Investigating the impact of renewable and non-renewable energy on economic growth in different countries
2021This Thesis aims to investigate the impact of energy consumption on economic growth of different countries during the period 1999 to 2018 for 65 different countries using panel regression analysis. In this regard, the study of this effect has been done by distinguishing between renewable and non-renewable energy. In this regard, the results of the analysis related to Iran have shown that the consumption of non-renewable energy has a negative and significant impact on Iran's economic growth. In this regard, it shows that with the increase in consumption of this type of energy, the country's economic growth decreases. This can be attributed to the cost of extraction of this type of energy compared to fossil fuels, which in this regard, incurs significant costs and reduces the efficiency of consumption of this type of energy and ultimately reduces economic growth. Will be. Also, the results of analysis and comparison of the impact of renewable energy and non-renewable energy on the economic growth of Asian and European countries and 65 countries have shown that for Asian countries, non-renewable and renewable energy has a positive and significant impact on economic growth. However, the consumption of renewable energy increases economic growth more than non-renewable energy. But for European countries, non-renewable energy has a significant negative impact on economic growth. While for this group of countries, renewable energy has a positive and significant impact on the economic growth of this group of countries. Also, for 65 selected countries of the world, the results show that the consumption of renewable and non-renewable energy has a positive and significant effect on the economic growth of selected countries. However, the impact of renewable energy on economic growth is somewhat greater than non-renewable energy
-
Statistical analysis of mother and child information in Miandoab Health Center
2020The aim of this study was to statistically analyze maternal and child information in Miandoab Health Center. The present study was descriptive-correlational in terms of applied purpose and in terms of data collection method. The statistical population of the study included 400 pregnant women whose care form was completed during pregnancy and after pregnancy, their condition with their children was examined and relevant information was recorded. The data collection tool was a researcher-made questionnaire. Spss22 software was used for data analysis. Chi-square test was used to analyze the data. The results showed that some chronic diseases of mothers affect the anthropometric indices (height, weight and head circumference) of children.
-
Stock Price Prediction Using Deep Learning
2020Stock price forecasting is challenging due to the influence of numerous internal and external factors (political, economic, and social). This thesis explores the use of deep learning models to predict stock prices in four sectors: Asia Insurance, New Economic Bank, Iran Tractor Sazi, and Tabriz Oil Refining Company, based on source [41].
-
The T-X Distribution Family and Its Bayesian Inference
2020Statistical distributions have many applications in a variety of contexts and are often used to describe real-world phenomena. For this reason, their properties have been extensively studied. Also, other new distributions have been developed in addition to well-known distributions such as gamma, exponential, normal and etc. The interest in generalizing flexible statistical distributions remains strong. This study identifies three different distributions called Lomax-exponential, Weibull- Rayleigh and Gamma-Rayleigh distributions. We study many of their statistical properties in this project and show their good applications compared to previous results. Also, we generate the distribution family T − X to bivariate models. It introduces new bivariate distributions obtained by this method. We illustrate the usefulness of practical examples of this method.
-
On the analysis of longitudinal data with missing responses
2019A longitudinal study is an observational research method in which data is gathered for the same subjects repeatedly over a period of time. we encounter missing data in many longitudinal studies. Since the number of subjects may not be available at all repeated times. When missing data are no ignorable, data analysis with common methods will result in invalid estimators. Then it is important to analyze the data by incorporating the missing یشفش mechanism into observed data likelihood function. The classical maximum likelihood method for analyzing longitudinal missing data has been extensively studied in the literature. However, it is well-known that the ordinary ML estimators are sensitive to extreme observations or outliers in the data, though it is common to encounter missing values along with outliers in real life data. Therefore in this paper, we propose and explore a robust method, which is developed in the framework of the ML method. Finally, we are going to study the properties of the robust estimators in a small simulation and also, we will illustrate the robust method using longitudinal data of HIV-infected patients.
-
Prediction of order statistics
2017In this thesis, the estimation and prediction of observations under type II censoring and hybrid censoring and hybrid incremental censoring type I in Poisson, lognormal and Burr type III distributions are investigated. First, maximum likelihood estimators of unknown parameters are obtained using EM and SEM methods and interval estimates using Fisher information matrix. Then, Bayesian estimates under squared error loss functions, entropy and Linex are presented using informed and uninformed prior densities. To obtain Bayesian estimates of parameters, the Turney-Kidden approximation, Lindley approximation and the important point sampling method are used. We also express the prediction of censored observations and prediction intervals with different methods. Finally, we use real data and simulation methods to evaluate the theoretical findings.
-
A binomial noised model for cluster validation
2017Today huge volumes of data are produced in the world. One of the most strategic and important sciences in data world is data mining. The clustering is one of the most commonly used techniques of data mining. Clustering is exploring unsupervised function of data mining in order to discover natural groupings within the data various clustering algorithms usually needs parameters as the number of clusters And the optimum number of them used for algorithms evaluation. There are three methods to estimate the number of clusters: hypothesis testing, external criteria, internal criteria, that each have several indices. In this work, binomial noised model was used for estimating the number of clusters. This thesis was collected in four separate chapters. The first chapter describes the concepts and definitions, the second chapter explains clustering and a variety of algorithms in R and the last chapter gives the analyses of binomial noised model and a numerical example for the model.
-
Data mining for credit card fraud
2016Billions of dollars are lost annually due to credit card fraud. The 10th annual online fraud report by CyberSource shows that although the percentage loss of revenues has been a steady 1.4% of online payments for the last three years (2006 to 2008), the actual amount has gone up due to growth in online sales. The estimated loss due to online fraud is $4 billion for 2008, an increase of 11% on the 2007 loss of $ billion ,With the growth in credit card transactions, as a share of the payment system, there has also been an increase in credit card fraud, and 70% of U.S. consumers are noted to be significantly concerned about identity fraud. Additionally, credit card fraud has broader ramifications, as such fraud helps fund organized crime, international narcotics trafficking, and even terrorist financing.
-
Stock index prediction using Markov chains
2016A huge part of the intellectual preoccupation of investors and brokers and in general those who are involved with stock indices is not knowing the correct understanding of the rising or falling trend of the stock index. For this purpose, methods have been presented to achieve this understanding, the most important of which are methods based on the use of stock index information from the past to the present in order to correctly understand the future trend of the stock index. The purpose of this thesis is to analyze the behavior of stock indices including OTC, stock exchange and industry using Markov chains and the EM algorithm.
-
Estimation based on the stochastic EM algorithm
2016Progressive type-2 censoring, introduced by Kemps [3] in 1999, is widely applied in durability and reliability testing. Given the importance of both this censoring scheme and the two-parameter Burr type-3 distribution (used in quality control, reliability studies, and failure/survival data modeling), and the lack of explicit solutions for maximum likelihood estimates (MLEs) of the Burr type-3 distribution parameters under progressive type-2 censoring, this thesis proposes a simple method using expectation-maximization (EM) and random EM algorithms to obtain these MLEs. Interval estimates are calculated using the Fisher information matrix. Bayesian estimates are then derived via Lindley's approximation and importance sampling techniques under squared error, entropy, and Linex loss functions. The prediction problem is also addressed, yielding prediction estimates and intervals for censored observations using various methods. Finally, a real dataset is analyzed, and simulation studies in R compare the proposed estimators, assessing their efficiency.
-
Data Mining of Technical and Vocational University Information
2016Recent advances in data collection and storage have resulted in a surge of information across many scientific disciplines. Researchers in fields like engineering, economics, astronomy, and biology are increasingly managing vast datasets. This abundance of data is also prevalent in educational and research institutions, including the Technical and Vocational University, a nationwide network of over 170 educational centers. This thesis aims to analyze the data from these centers and faculties, specifically focusing on student numbers, employee counts, and operating costs. Furthermore, it explores the application of clustering and classification techniques using various algorithms to this data.
-
Penalized Maximum Likelihood Estimation
2015In this study, we examined the two-parameter exponential distribution parameter estimation. The maximum likelihood(ML), penalized maximum likelihood (PML) and Bayes estimators assuming both of the location and scale parameters to be unknown were obtained. The results show that the PMLE is the same as uniformly minimum variance unbiased estimator (UMVUE). The mean square errors of proposed estimators both analytical and a Mont Carlo simulation study for different types of censoring schemes were computed. The simulation results revealed that the Bayes estimators outperforms the PMLEs. Further the PMLEs is superior to the MLEs.
-
Data mining With Clustering
2015Nowadays, the advent of modern technologies, scientific experiments and applied research in various fields, even simple tasks like phone calls and daily purchases using credit cards has been leaded to massive amounts of data. The volume of this data can reach petabytes. These data extract large variety of patterns which can lead to important results in many areas. The useful knowledge discovery in databases (KDD) by using helpful massive raw data is called data mining. This study was collected in three separate chapters. In the first chapter, the concept of data mining and its various types is reviewed. The second chapter discusses clustering algorithms and different kinds of them. Finally, the third chapter is encompassed data mining of a gas company. In this section, firstly, the data is prepared. Then, a proper clustering algorithm is applied for extracting useful information which could be helpful for future planning.
-
Statistical Inference Based on Joint Records
2015In this thesis, the statistical inference based on "joint records" is studied in the parametric and nonparametric setting. In this regards, we introduce the joint records scheme from two continuous sequences and then consider the statistical inference on the parameters from two exponential populations based on lower joint records and their corresponding inter-record times and derive their exact statistical properties. We specifically derive the conditional maximum likelihood and Bayes estimators and their exact distributions then we use them to construct exact confidence intervals and compare their coverage probabilities with those of Bayes, bootstrap and approximate confidence intervals through simulation study. The nonparametric setting, includes extended distribution-free confidence intervals for quantiles, outer and inner confidence intervals for quantile intervals and upper and lower confidence limits for quantile differences from the two independent and identically distributed.
-
Nonparametric Estimation of the Average Availability function
2014Using a nonparametric approach, we present approximating formulas of the confidence interval for the point availability and the limiting availability of repairable systems.
-
The Discrete Normal Distribution
2014The normal distribution has been playing a key role in stochastic modeling for a continuous setup. But its distribution function does not have an analytical form. Moreover, the distribution of a complex multicomponent system made of normal varieties occasionally poses derivational difficulties. It may be worth exploring the possibility of developing a discrete version of the normal distribution so that the same can be used for modeling discrete data. Keeping in mind the above requirement we propose a discrete version of the continuous normal distribution. The Increasing Failure Rate property in the discrete setup has been ensured. Characterization results have also been made to establish a direct link between the discrete normal distribution and its continuous counterpart. The corresponding concept of a discrete approximate for the normal deviate has been suggested. An application of the discrete normal distributions for evaluating the reliability of complex systems has been elaborated as an alternative to simulation methods.
-
A method for generating families of continuous distributions
2014Statistical distributions and models are commonly used in many applied areas such as economics, engineering, social, health, and biological sciences and statistical distributions are commonly applied to describe real world phenomena. The interest in developing more flexible statistical distributions remains strong in statistics profession. Many generalized classes of distributions have been developed and applied to describe various phenomena. A common feature of these generalized distributions is that they have more parameters. Chapter 1 has introduced a number of distributions used in other chapters. In chapter 2 a method to generate families of continuous distributions called $X-T$ family of distributions is presented. Chapter 3 Beta-Weibull-X family, the family that has T-X-Y introduced. Also in chapter four a family of Weibull-X, called the Weibull-Rayleigh will be discussed
-
Stock Price Forecasting Using Bayesian Networks
2013Stock price forecasting is crucial for business and the economy. Traditional time series methods like autoregressive models rely on stationarity and linearity, assumptions often invalid for stock prices, which exhibit nonlinear and chaotic behavior. Consequently, these methods can perform poorly. Nonlinear Bayesian networks offer a potential alternative for time series prediction. This thesis introduces Bayesian networks for time series forecasting, comparing them to classical algorithms. We will specifically evaluate the Bayesian network algorithm presented by Yizhou and Kita (2012) against classical approaches using two numerical examples.
-
Bayesian inference for categorical data analysis
2013This thesis explores Bayesian methods for categorical data analysis. It covers Bayesian parameter estimation for binomial and multinomial distributions, examines cross tables, and details the parameterization of log-linear models. The study then focuses on Bayesian estimation of parameters for two-sided log-linear models, discussing various model specifications. Finally, it extends the estimation to three-sided log-linear models and explores associated cases.
-
Probability function with continuous distributions
2013Statistical distributions are commonly applied to describe real world phenomena. Due to the usefulness of statistical distributions, their theory is widely studied and new distributions are developed. Many research papers have been published on the study and application of discrete distributions. Various techniques have been proposed to generate families of discrete distributions, that some of these techniques are described in chapter two. This chapter presents discrete Laplace and discrete Burr distributions and their properties is also investigated. Also in chapter three methods for generating continuous and discrete families of distributions which is famous T-X family, is presented. This family of distributions is examined in different modes. Fourth chapter is devoted to bayesian estimations of the parameters of the produced distributions in chapter two.
-
Exponential Weibull Distribution
2013This thesis focuses on estimating the reliability, R = P(Y < X), of a component in the pressure-force model, where force (X) and pressure (Y) are random variables. X and Y are assumed to follow the exponential Weibull distribution, a flexible family with two shape parameters and one scale parameter, encompassing both the exponential Weibull and exponential distributions. We introduce the exponential Weibull distribution, outline its key characteristics, and then, for the first time, address the problem of estimating R within the context of this distribution and the pressure-force model.
-
Data Mining with Logistic Regression
2013Nowadays, data mining has been proposed as a new scientific tool and method for data analysis in large data bases. By using techniques of data mining, a large number of data can be transformed into useful and applicable information. Data mining has a variety of techniques and algorithms for data analysis, the most famous of them are Neural networks, Decision tree, Logistic regression, Association rule mining and Cluster analysis. One of the most widely used of these methods is logistic regression. In situations where the response variable is a binary variable, this method is used to classify or predict the response variable. This method compared to other methods is very powerful and precise and has no need for the usual preconditions. This thesis investigate the logistic regression model and the way to use it. Then, we forecast two prominent financial variables by logistic regression model and real data, and discuss on building a suitable model and opting the best possible model between some of them.
-
Data Mining Using Decision Trees
2012Data mining, a discipline born from the proliferation of large databases, is a knowledge discovery process transforming raw data into actionable knowledge. Its methods, including decision trees, association rules, and cluster analysis, address real-world problems. Decision trees, a frequently used algorithm, excel at classifying and predicting variables. Their key advantage lies in generating accurate and interpretable models, providing users with rapid access to valuable insights. Algorithms like CHAID, C5, C4.5, ID3, and CART facilitate decision tree construction. This thesis explores data mining concepts and algorithms, focusing on ID3 and CHAID decision trees, ultimately applying the CHAID algorithm to real bank data.
-
Online inference for change point problems
2012Change point models address temporal heterogeneity in data, enhancing statistical model flexibility. This thesis focuses on Bayesian inference for change point models using particle filters. We present an efficient online algorithm for a class of multiple change point problems with conditional independence, accurately simulating the number and location of change points from the true joint posterior. While the algorithm's computational cost grows quadratically with the number of observations, we demonstrate that particle filtering resampling techniques can reduce this cost linearly. We further propose two optimal resampling algorithms for these problems.
-
رتبه بندی عملکرد بانکهای تجاری منتخب با استفاده از روش تاپسیس و آنالیز فرایند سلسله مراتبی فازی
2012هدف اصلی بانک های امروزی کسب مزیت رقابتی پایدار می باشد. ارزیابی مداوم عملکرد لازمه کسب مزیت رقابتی پایدار می باشد. در این تحقیق تلاش می گردد تا عملکرد پنج بانک تجاری بزرگ (ملی، ملت، تجارت، رفاه، و سپه) با استفاده از دو مدل فرایند تحلیل سلسله مراتبی فازی (FAHP) و تاپسیس (TOPSIS) مورد ارزیابی قرار گیرد. این تحقیق در شهر تبریز و در سال 1389 انجام شده است. در این تحقیق عملکرد مالی و غیر مالی شعبات مرکزی بانکهای فوق الذکر با استفاده از دو مدل فرایند تحلیل سلسله مراتبی و تاپسیس مورد مطالعه و رتبه بندی قرار گرفت. نتایج تحقیق نشان داد که اثرات معیارهای ارزیابی مختلف یکسان نبودند. بدین معنی معیارهایی که دارای تاثیر کمتری بر هزینه تامین مالی و یا دارای تاثیر بیشتری بر افزایش درآمد بوده اند، بیشتر مورد توجه واقع شده اند. همچنین مشخص گردید که نتایج حاصل از بکارگیری هر دو مدل یکسان می باشد. درنهایت طبق نتایج حاصل از تحقیق رتبه بندی بانک ها در هر دو بخش عملکرد مالی و غیرمالی به ترتیب عبارت است از: ملت، ملی، سپه، تجارت، و رفاه.
-
بررسی ارتباط بین ریسک و سرمایه با سودآوری بانک ها
2012هدف از این مطالعه بررسی تأثیر عوامل مشخصه بانک مشتمل بر نسبت کفایت سرمایه (CAR) و ریسک اعتباری (CRisk) بر سودآوری بانک های دولتی و خصوصی ایران است که توسط حاشیه سود خالص (NIM) اندازه گیری می شود. روابط بین نسبت کفایت سرمایه، ریسک اعتباری و حاشیه سود خالص با استفاده از داده های پانل با مدل اثرات ثابت (FEM) و به کمک یک نمونه 49تایی از صورت های مالی سالیانه سه بانک دولتی و چهار بانک خصوصی طی دوره هفت ساله 1389-1383 بررسی می شود. نتیجه گیری مطالعه ما این است که سودآوری بانک های دولتی به طور منفی و معنی داری از نسبت کفایت سرمایه و به طور مثبت و معنی داری از ریسک اعتباری (که توسط نسبت ذخایر مطالبات به کل تسهیلات تعیین می شود) تأثیر می پذیرد. همچنین، نتایج برآوردها نشان می دهد که سودآوری بانک های خصوصی به طور مثبت و معنی داری از هر دوی نسبت کفایت سرمایه و ریسک اعتباری تأثیر می پذیرد. این نتایج بدین معنی هستند که در دو سیستم بانکی نسبت کفایت سرمایه اثرات متفاوتی بر سودآوری دارد، در حالی که ریسک اعتباری به طور مثبت و معنی داری بر سودآوری هر دو سیستم بانکی اثرگذار است. این یافته ها نشان می دهد که هیچ ارتباط نظام مندی بین نسبت کفایت سرمایه و سودآوری وجود ندارد، ولی این فرضیه را که ارتباط مثبتی بین ریسک اعتباری و سودآوری است می توان تأیید می کند.
-
Minimum chi-square distance probability density function given prior density function and moments
2011The estimation of probability density function is a important application area in statistics. One of the issues relating to this subject is estimate probability density function through information about moments. therefore we can used methods such as maximum entropy, minimum discrimination information and minimum chi-square divergence principle.\\ In this thesis, we exhaust minimum chi-square divergence principle due to Kumar and Taneja \cite{taneja, tan1}. Chi-square divergence and minimum chi-square divergence principle are introduced. Then minimum chi-square divergence principle is applied to estimate density function given the observed density function and information about moments. In continue, we determined the minimum chi-square divergence density function for particular density function such as Gamma \cite{kumarg} and Weibull distribution and information on the geometric, arithmetic mean and/or variance of random variable. \\ We generalized Kumar and Taneja's method for the estimation of joint probability density function. Similar preview, introduced chi-square divergence and minimum chi-square divergence principle for bivariate manner and estimation method presented. Then we apply minimum chi-square divergence principle when the observed probability distribution are Dirichlet and product of two Exponential distributions and also we have information about product moment. In final provide the concluding remarks.
-
The Pearson System and the Product Pearson-type VII Density Function
2011The Pearson's family of distributions is a family of twelve continuous probability density functions with different skewness and kurtosis and it includes many of famous and important distributions. The first chapter of this thesis introduces Pearson's family of density functions or Pearson system. In this chapter we explain the method of make this probability density functions, estimation of their parameters and the criterion of separation of them. Members of this family can be used for fitting a continuous probability density function to data with unknown distribution. We express the method of fitting probability density functions to data and estimation of parameters of them and describe these with a numerical example. In second chapter a new probability density function is introduced that made by product of two Pearson type VII probability density functions and named product Pearson-type VII density function. Then we mention to some applications of this distribution and compute structural properties of it, including its cdf, moments, maximum likelihood estimates, Fisher information matrix, mean deviation about the mean and median, entropy, asymptotic distribution of the extreme order statistics. At last, we introduce two new product density function by Pearson family and compute their k order moments. In chapter three, there are some applications of some members of Pearson system. At first, an application of Pearson-type VII density function is explained and it is used as posterior distribution of a random variable at a Bayesian test for mean of Normal distribution. In the next section, a continuous approximation to Binomial distribution with n=50 and p=0.3 is obtained by Pearson type I density function and it is compared with another approximations to Binomial distribution. Next application is about Pearson type III density function. We use it for estimation of maximum flow of a river per different return periods at a hydrometric station.
-
Ranking With Multiple Criteria Decision Making methods
2011In this thesis basics and concepts of multiple criteria decision making methods and some of weighting methods to criteria, types of normalization methods, measure distance methods of ideal solution and some methods of ranking is described and results of this methods by an example shown. In addition, the preferences of more than one decision maker are internally and externally aggregated into the TOPSIS procedure. In the final part, results and ranks of thirty province for price index based on various methods of normalizations and computational distance of ideal solution e.g. Euclidean distance and Manhattan distance is expressed and their results are compared.
-
Estimation of the parameter n in the Binomial distribution
2010Binomial distribution is one of the most prominent distributions. Estimation of parameter n, is one of the most precious topics in this distribution that has been based on method of moments and maximum likelihood method. Other estimation techniques has not been noticed as well; however in sample size determination, we are dealing with Bayesian approach. This type of vision allows researchers to use the prior information of experiments to estimate parameters. In this thesis, the primary goal is to find an admissible estimation for parameter n, according to the Bayesian loss function. Thus we are going to estimate parameter n with assuming that the parameter p is known, dedicating different prior distributions for n among some famous discrete distributions and using loss functions such as squared error loss function and scaled squared error loss function.
-
Nonparametric confidence intervals for quantile intervals and quantile differences based on record statistics
2010It is shown how various exact nonparametric inferential procedures can be developed based on record statistics. These include confidence intervals for quantiles, tolerance intervals, outer and inner confidence intervals for quantile intervals, and upper and lower confidence limits for quantile differences. These intervals are all exact and distribution-free. Also universal upper bounds for the expectation of the length of the confidence intervals are derived. We provide some tables that help one choose the appropriate record values and present a numerical example. The results may be of interest in situation where only record values are stored. In the end a data set representing the records of the monthly (farvardin 1330– esfand 1386) temperature at Tabriz Center is used to illustrate the proposed inferential procedures
-
Bayesian Sample Size Determination via Loss Function
2010Determining the sample size, is one of the main stages in doing statistical plans. In most of these plans, classic methods like Cochran formula, ChebyChev´s theorem and confidence interval are used and unfortunately there is less consideration given on Bayesian method, although we mostly deal with Bayesian approach in determining sample size that allows the researcher to estimate unknown parameters by previous examinations information. In this thesis, we try to do the optimal determination of the sample size by Bayesian approach & using decision theory. We calculate the Bayesian estimating of the sample size by three methods. 1- By using point estimating of the intended parameter from the posterior distribution by means of loss function. 2- By using areas of p-tolerance with lowest posterior loss. 3- By using testing hypothesis and loss function, i.e. We determine sample size in such a way that the results show the false decisions about hypothesis under examination. We use the of the error Square loss function in order to use these three methods, then we calculate these amounts sample size for Normal & Binomial distributions.