报告题目:Communication-Efficient Pilot Estimation for Non-Randomly Distributed Data in Diverging Dimensions
报告人:晁越 博士
报告时间:2024年6月7日,星期五,下午14:00--15:00
报告地址:犀浦校区3号教学楼30425
内容摘要:Distributed learning has been a dispensable tool in dealing with massive or distributed datasets. As an important and popular distributed learning method, the communication-efficient surrogate likelihood (CSL, Jordan et al, 2019, Journal of American Statistical Association) framework were proposed and has received much attention from the statistics community. In most of the works that are based on the CSL framework, there are two common treatments: (i) choosing the first machine as the central machine to solve an optimization problem using the data on the first machine; and (ii) assuming that the dimension is fixed when deriving some statistical properties. However, treatment (i) may not be appropriate when the data are stored in a non-random manner or heterogeneously distributed across different machines, which might be common in practice; and treatment (ii) largely limits the applications of CSL to diverging- or high-dimensional datasets, especially when the purpose is to infer some parameters of interest. To address the challenges posed by (i) and (ii), we extend the CSL framework and develop a communication-efficient pilot (CEP) estimation strategy. Specifically, we first implement a pilot sampling on each machine to obtain a pilot sample dataset, and then use a new pilot sample-based surrogate loss function to approximate the global one and its minimizer is named as the CEP estimator. Second, we rigorously investigate theoretical properties of the CEP estimator including its convergence rate. Finally, extensive synthetic and real datasets are employed to illustrate the superiority of the proposed approaches.
报告人简介: 晁越,苏州大学数学科学学院2021级统计学博士研究生,即将在2024年6月取得统计学博士学位,导师为王过京教授和马学俊副教授。研究兴趣为高维数据分析,海量数据分析,分布式学习。目前已有研究成果发表在Information Sciences, Journal of Statistical Planning and Inference, Metrika, Journal of Statistical Computation and Simulation, Statistics in Medicine等期刊上。