Meteorology has become an important application area for systems employing machine learning methods. A Tropical cyclone (TC) is a meteorological phenomena, where every year tens of storms reach hurricane strength in the Atlantic basin creating strong winds and heavy rain. Providing timely and accurate predictions of TC's behavior can save lives and reduce damage to property and infrastructure. The intensity of a hurricane is measured by its maximum sustained wind speed. Current TC track prediction models perform much better than intensity models which is partially due to the existence of rapid intensification (RI) events. An RI event is defined by Kaplan and De Maria (2013) as a sudden increase in the maximum sustained wind speed of 30 knots or greater within 24 hours. Forecasting RI events is so important that it has been put on the National Hurricane Center (NHC) top forecast priority list. The research published on using the wide range of available statistical and machine learning methods for RI prediction is currently very limited. Statistical Intensity Prediction Scheme (SHIPS) is an official intensity prediction model used by the NHC and performs well in real time. The related Rapid Intensification Index (RII) is an operational model that predicts RI events based on discriminant analysis. It produces high false alarm ratio (FAR) and low probability of detection (POD) and other machine learning methods techniques might be able to improve prediction quality.
The goal of this thesis is to improve intensity prediction by incorporating models for RI prediction. Several definitions of RI have been proposed and need to be compared. In this thesis we compare different popular machine learning methods and also propose a new definition of RI. For comparison, we use a dataset obtained from SHIPS (version 2010) that includes storms from 1982 to 2011 in the Atlantic basin. The life cycle of each storm is recorded in a 6-hour time interval and includes large-scale weather and climate condition predictors. The evaluated RI prediction models include support vector machines, logistic regression, naïve-Bayes, k-nearest neighbors, neural network classifiers, and classification and regression trees. A wide range of ensemble methods and a newly developed Extensible Markov Model clustering technique are also evaluated. We also consider dimensionality reduction, feature selection and address class imbalance using Synthetic Minority Over-sampling Technique. We compare our RI prediction results with the operational Rapid Intensification Index Model (RII). The evaluation of RI prediction shows that some of the investigated models have a predictive power which improves over the RII model. Finally, we propose and evaluate a two-stage intensity prediction process. We predict RI events in the first stage. Based on the probability of RI events, we predict the intensity of a TC in the second stage using a combination of RI and NRI change in intensity forecasts.
Our proposed methodology of combining a two-stage model has shown a significant improvement in the change in intensity prediction of RI events in particular and non-RI events. A new extended definition of RI shows better performance than the standard definition when combined with the two-stage model.
This work also contributes a preprocessed and easy-to-use data set to the research community; it is our hope that this data set will spark further research within the machine learning community.
|School||SOUTHERN METHODIST UNIVERSITY|
|Subjects||Meteorology; Computer science|
About ProQuest Dissertations & Theses
With nearly 4 million records, the ProQuest Dissertations & Theses (PQDT) Global database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.
PQDT Global combines content from a range of the world's premier universities - from the Ivy League to the Russell Group. Of the nearly 4 million graduate works included in the database, ProQuest offers more than 2.5 million in full text formats. Of those, over 1.7 million are available in PDF format. More than 90,000 dissertations and theses are added to the database each year.