ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

garbamoussa · August 23, 2019, 11:12am

Hello

I try to run XGBoost Regressor with Sklearn: But when I run the code : I have this error :
ValueError: Input contains NaN, infinity or a value too large for dtype(‘float32’).

Thanks

hcho3 · August 23, 2019, 12:00pm

As the error message suggests, check your input data.

garbamoussa · August 23, 2019, 12:15pm

Thanks for answer but in my data I don’t have Nan Value, I try some method to filting nan but also does’n worked .

I don’t know if you have some solutions because I want ao apply XGBoost regressor and cross validation

hcho3 · August 23, 2019, 1:10pm

The error may be coming from scikit-learn, not XGBoost. See https://datascience.stackexchange.com/questions/11928/valueerror-input-contains-nan-infinity-or-a-value-too-large-for-dtypefloat32

RedfoxKingsley · August 27, 2020, 6:24pm

Please, replace NaN Values with mean

df.fillna(df.mean(), inplace=True)

df is your data. if there are large outliers in your data, use

#Replace NaN Values with median
df1.fillna(df.median(), inplace=True)

Note, also for XGBoost ML, this error usually comes up during the cross-validation (splitting of data) phase but you have to go back and to the import data line and add do the panda replacement.

walshjupes · September 27, 2021, 8:14am

I got the same error message when using sklearn with pandas . My solution is to reset the index of my dataframe df before running any sklearn code:

df = df.reset_index()

jiamingy · November 3, 2021, 7:25pm

XGBoost should be able to handle NaN:

>>> import xgboost as xgb
>>> import pandas as pd
>>> import numpy as np
>>> X = np.arange(100).reshape(10, 10)
>>> X = X.astype(np.float32)
>>> X[0, :] = np.NaN
>>> df = pd.DataFrame(X)
>>> df
      0     1     2     3     4     5     6     7     8     9
0   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
1  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0
2  20.0  21.0  22.0  23.0  24.0  25.0  26.0  27.0  28.0  29.0
3  30.0  31.0  32.0  33.0  34.0  35.0  36.0  37.0  38.0  39.0
4  40.0  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  49.0
5  50.0  51.0  52.0  53.0  54.0  55.0  56.0  57.0  58.0  59.0
6  60.0  61.0  62.0  63.0  64.0  65.0  66.0  67.0  68.0  69.0
7  70.0  71.0  72.0  73.0  74.0  75.0  76.0  77.0  78.0  79.0
8  80.0  81.0  82.0  83.0  84.0  85.0  86.0  87.0  88.0  89.0
9  90.0  91.0  92.0  93.0  94.0  95.0  96.0  97.0  98.0  99.0
>>> y = np.arange(10)
>>> xgb.XGBRegressor().fit(df, y)
XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None,
             colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
             early_stopping_rounds=None, enable_categorical=False,
             eval_metric=None, gamma=0, gpu_id=-1, importance_type=None,
             interaction_constraints='', learning_rate=0.300000012,
             max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=100, n_jobs=16,
             num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,
             reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
             validate_parameters=1, ...)

justanotheruser · November 18, 2021, 4:47pm

I faced the same issue recently. You probably have nan values in your target.