Hi,
I would appreciate some clarification on handling of missing values in spark.
- there are two pages with contradicting guideline
https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#dealing-with-missing-values
https://xgboost.readthedocs.io/en/release_0.90/jvm/xgboost4j_spark_tutorial.html
is the second link still valid?
is the first link applicable to 0.90 or it is valid only for 1.0? - in Option2 of the first linke
a) has anyone been able to successfully set missing value in a way that would not affect model accuracy?
b) is this sentence correct? —an irregular value that is not 0, NaN, or Null and set the “missing” parameter to 0. — or it should be —an irregular value that is not 0, NaN, or Null and set the “missing” parameter to the irregular value—? - if I have zero values in my dataset and have no Nan or Null, what is the best approach? I tried to replace zero with very small numbers, i.e., 1E-15, while the minimum value in my dataset in 1E-3. However, it still affect the accuracy.
- Can anyone provide a sample code for Option 1 on how to convert to dense vector?
- Why version 1.0 is not in maven yet?