We got some interesting comments about this paper, so we decided to write an FAQ. Please feel free to give us more comments.
See the next question.
See the previous question.
We don't really have an answer yet, but both points may be correct. In many cases, subsampling training data does not downgrade the prediction accuracy much, so there is no need to employ large-scale training techniques. This usually happens when the data quality is good. However, we also see situations where Internet companies collect a huge amount of web log and train data on a distributed learning system.
Currently, some people question the need of using large data, but some always think more is better. We think large-scale training is application dependent. According to different properties of the target applications, we decide how much data points are needed. This is still an important research issue for machine learning practice.
Yes. We hope to bridge the two very different viewpoints mentioned above. So far we haven't had many data sets larger than the memory capacity. If you need to train such large sets, we will be very interested in knowing your applications.
Many papers have proposed methods though they may not provide tools. One package that has been designed for such situations is VW at Yahoo!. It is an online algorithm so is slightly different from our off-line setting. If you know other tools, please let us know.
We think so. In the past we didn't worry about system and file issues at all, but we need consider them for large-scale systems.
It is on our todo list. If you have any potential applications needing incremental/decremetal settings, please contact us as we would love to learn more.
Yes, we don't really know yet. But this is why research is interesting.
We really got such a comment at the KDD conference. We always try to be very honest in describing our work.