Efficient Fused Learning for Distributed Imbalanced Data
Publication in refereed journal
已正式接受出版

香港中文大學研究人員
替代計量分析
.

其它資訊
摘要Any data set exhibiting an unequal or highly-skewed distribution between its classes/categories can be regarded as imbalanced data. Due to privacy concern and other technical limitations, imbalanced data distributed across locations/machines cannot be simply combined and stored in a single central location. The commonly used naive averaging estimate may be unstable for imbalanced data. In this paper, we propose a fused estimation for logistic regression in analyzing distributed imbalanced data by combining all the cases available on all machines, which is stable and efficient. The consistency and asymptotic normality of the proposed estimator are established under regularity conditions. Asymptotic efficiency compared with the oracle estimator based on the entire imbalanced data is also studied. Extensive simulation studies show that the proposed estimator is as efficient as the oracle estimator in various situations. An application is illustrated with a credit card data for default payment.
出版社接受日期29.01.2020
著者Jie Zhou, Guohao Shen, Xuan Chen, Yuanyuan Lin
期刊名稱Communications in Statistics - Theory and Methods
詳細描述The article was accepted on Jan 29, 2020.
出版年份2020
國際標準期刊號0361-0926
電子國際標準期刊號1532-415X
語言美式英語
關鍵詞Case-control studies, Distributed imbalanced data, Logistic regression, Oracle estimator

上次更新時間 2020-26-11 於 00:07