I would appreciate some clarification on handling of missing values in spark.
- there are two pages with contradicting guideline
is the second link still valid?
is the first link applicable to 0.90 or it is valid only for 1.0?
- in Option2 of the first linke
a) has anyone been able to successfully set missing value in a way that would not affect model accuracy?
b) is this sentence correct? —an irregular value that is not 0, NaN, or Null and set the “missing” parameter to 0. — or it should be —an irregular value that is not 0, NaN, or Null and set the “missing” parameter to the irregular value—?
- if I have zero values in my dataset and have no Nan or Null, what is the best approach? I tried to replace zero with very small numbers, i.e., 1E-15, while the minimum value in my dataset in 1E-3. However, it still affect the accuracy.
- Can anyone provide a sample code for Option 1 on how to convert to dense vector?
- Why version 1.0 is not in maven yet?