报告题目:An effective algorithm for big streaming data
报告摘要:Rapid development in IT and revolutionary monitoring/measuring technologies have made it possible to collect large scale, sequential data in many practical fields. It has been a challenge to efficiently analyze such data to discover hidden patterns, association/correlation, and trend of changes.
In machine learning and data mining, cost-sensitive learning and sparse online learning are two important research areas. Many algorithms have been proposed for these learnings separately. Generally, an algorithm performs well in one field may be less good in another field. Very few work has been published combining these two fields together.
To tackle the high-dimensional, highly-skewed data streams, we propose a framework of cost-sensitive sparse online learning, which greatly extend the influential Truncated Gradient (TG) method. By formulating a new convex optimization problem, the framework intends to balance misclassification cost and sparsity, two mutual restraint factors. We will present the theoretical analysis on the bounds of the regret of actions and cost, and the comparison to those of the existing methods. Evaluated on eight real-life streaming, high-dimensional, severely-skewed datasets, the proposed method outperforms other traditional ones.
报告人简历:Zhide Fang, PhD, is Professor and Director of Biostatistics in the School of Public Health, and a Professor in the Department of Genetic, School of Medicine, Louisiana State University Health Sciences Center at New Orleans. He is also a Statistician in Louisiana Clinical & Translational Science Center, funded by NIH.
Dr. Fang’s research interests encompass statistical theory and applications in different areas. He has made contributions to the theory of design of experiments for heteroscedastic linear models, dose-response toxicity models, and wavelet regression models. His contributions to the Theory of System Reliability in Engineering include providing algorithms to evaluate system state distribution of certain consecutive-k-out-of-n systems. Dr. Fang has made contributions to Bioinformatics, developing pipelines and statistical methodologies for analysis of high-throughput genomics and metagenomics data, generated via microarrays or next-generation sequencing technologies. Developing algorithms for high-dimensional, highly skewed big streaming data is his new research interest.
报告时间:2017年12月28日(星期四)晚上19:00-20:00
报告地点: 科技楼南楼702室