Hello
I try to run XGBoost Regressor with Sklearn: But when I run the code : I have this error :
ValueError: Input contains NaN, infinity or a value too large for dtype(‘float32’).
Thanks
Hello
I try to run XGBoost Regressor with Sklearn: But when I run the code : I have this error :
ValueError: Input contains NaN, infinity or a value too large for dtype(‘float32’).
Thanks
As the error message suggests, check your input data.
Thanks for answer but in my data I don’t have Nan Value, I try some method to filting nan but also does’n worked .
I don’t know if you have some solutions because I want ao apply XGBoost regressor and cross validation
The error may be coming from scikit-learn, not XGBoost. See https://datascience.stackexchange.com/questions/11928/valueerror-input-contains-nan-infinity-or-a-value-too-large-for-dtypefloat32
Please, replace NaN Values with mean
df.fillna(df.mean(), inplace=True)
df is your data. if there are large outliers in your data, use
#Replace NaN Values with median
df1.fillna(df.median(), inplace=True)
Note, also for XGBoost ML, this error usually comes up during the cross-validation (splitting of data) phase but you have to go back and to the import data line and add do the panda replacement.
I got the same error message when using sklearn with pandas . My solution is to reset the index of my dataframe df
before running any sklearn code:
df = df.reset_index()
XGBoost should be able to handle NaN:
>>> import xgboost as xgb
>>> import pandas as pd
>>> import numpy as np
>>> X = np.arange(100).reshape(10, 10)
>>> X = X.astype(np.float32)
>>> X[0, :] = np.NaN
>>> df = pd.DataFrame(X)
>>> df
0 1 2 3 4 5 6 7 8 9
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 10.0 11.0 12.0 13.0 14.0 15.0 16.0 17.0 18.0 19.0
2 20.0 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0
3 30.0 31.0 32.0 33.0 34.0 35.0 36.0 37.0 38.0 39.0
4 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0
5 50.0 51.0 52.0 53.0 54.0 55.0 56.0 57.0 58.0 59.0
6 60.0 61.0 62.0 63.0 64.0 65.0 66.0 67.0 68.0 69.0
7 70.0 71.0 72.0 73.0 74.0 75.0 76.0 77.0 78.0 79.0
8 80.0 81.0 82.0 83.0 84.0 85.0 86.0 87.0 88.0 89.0
9 90.0 91.0 92.0 93.0 94.0 95.0 96.0 97.0 98.0 99.0
>>> y = np.arange(10)
>>> xgb.XGBRegressor().fit(df, y)
XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None,
colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
early_stopping_rounds=None, enable_categorical=False,
eval_metric=None, gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=16,
num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,
reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
validate_parameters=1, ...)
I faced the same issue recently. You probably have nan values in your target.