What happens (in practice and theoretically) when enabling
gradient_based subsampling for federated learning?
From my understanding, gradient based subsampling is based on Mimimal Variance Sampling (MVS). My questions are:
- Is the subsampling based on gradient information from each client or global information?
- Can it be used for adaptation to local client data?
- Can we prove convergence using MVS in a federated setting?
Hope to start a great discussion!