Cannot make XGBoost library work on IBM AIX

Hello, experts! I could not find binaries for IBM AIX, and also I found no evidence XGBoost does not work. I made a successful compilation of XGBoost 0.9 with gcc 8.3 on IBM AIX 7.1, with minor code changes. But sorry to tell, the resulting binaries do not work too well.

Can you please share a successful compilation experience or advise on how to fix?

Scenario:

  1. Tests from demo/binary_classification/ fail:
:~/xgboost/demo/binary_classification
$ python mapfeat.py
:~/xgboost/demo/binary_classification
$ python mknfold.py agaricus.txt 1
:~/xgboost/demo/binary_classification
$ ../../xgboost mushroom.conf
[09:37:32] 6513x126 matrix with 143286 entries loaded from agaricus.txt.train
[09:37:32] 1611x126 matrix with 35442 entries loaded from agaricus.txt.test
[09:37:32] [0]  test-error:0.016139     train-error:0.014433
[09:37:32] [1]  test-error:0.000000     train-error:0.001228
:~/xgboost/demo/binary_classification
$ ../../xgboost mushroom.conf task=pred model_in=0002.model

Result:

[09:40:28] 1611x126 matrix with 35442 entries loaded from agaricus.txt.test
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
IOT/Abort trap (core dumped)
  1. import XGBoost in python also fails:
>>> import xgboost
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/freeware/lib/python3.7/site-packages/xgboost-0.83.dev0-py3.7.egg/xgboost/__init__.py", line 11, in <module>
    from .core import DMatrix, Booster
  File "/opt/freeware/lib/python3.7/site-packages/xgboost-0.83.dev0-py3.7.egg/xgboost/core.py", line 161, in <module>
    _LIB = _load_lib()
  File "/opt/freeware/lib/python3.7/site-packages/xgboost-0.83.dev0-py3.7.egg/xgboost/core.py", line 152, in _load_lib
    'Error message(s): {}\n'.format(os_error_list))
xgboost.core.XGBoostError: XGBoost Library (libxgboost.so) could not be loaded.
Likely causes:
  * OpenMP runtime is not installed (vcomp140.dll or libgomp-1.dll for Windows, libgomp.so for UNIX-like OSes)
  * You are running 32-bit Python on a 64-bit OS
Error message(s): ['Could not load module /opt/freeware/lib/python3.7/site-packages/xgboost-0.83.dev0-py3.7.egg/xgboost/libxgboost.so.\nSystem error: Exec format error', 'Could not load module /opt/freeware/lib/python3.7/site-packages/xgboost-0.83.dev0-py3.7.egg/xgboost/./lib/libxgboost.so.\nSystem error: Exec format error']

Can you double-check you are using the right GCC? It looks like you may be using cross-compiler GCC, which would produce incompatible binaries.

I also suggest that you use CMake to build XGBoost, so that platform-specific details will be better handled.

Hello, hcho3! Thank you for your feedback.

  1. We took gcc-8.3.0-1 built for aix7.1 from http://www.bullfreeware.com/affichage.php?id=4944. Does not look like cross-compiler, passed self-compilation test during installation.
  2. cmake-3.14.3-1 (same source) did not see libgomp-8.3.0-1 (same source). We could not fix and defaulted to make.
  3. Looks like code is not super-ready for AIX platform, we had to make the following changes during compilation (please see below). I guess this is not enough since we received memory allocation failure in the compiled binary. Also this is not a memory issue, 37Gb RAM free before test start.

=== BACKUP: DIFF listing of min code changes we had to apply before successful compilatin ===

diff -bur ./src_orig/dmlc-core/include/dmlc/endian.h ./src_patch/dmlc-core/include/dmlc/endian.h
--- ./src_orig/dmlc-core/include/dmlc/endian.h	2019-05-18 17:23:44.814216600 +0300
+++ ./src_patch/dmlc-core/include/dmlc/endian.h	2019-05-31 12:57:18.000000000 +0300
@@ -29,6 +29,8 @@
     #else
       #define DMLC_LITTLE_ENDIAN 0
     #endif
+  #elif defined(_AIX)
+    #define DMLC_LITTLE_ENDIAN 0
   #else
     #error "Unable to determine endianness of your machine; use CMake to compile"
   #endif
diff -bur ./src_orig/dmlc-core/include/dmlc/serializer.h ./src_patch/dmlc-core/include/dmlc/serializer.h
--- ./src_orig/dmlc-core/include/dmlc/serializer.h	2019-05-18 17:23:44.836272200 +0300
+++ ./src_patch/dmlc-core/include/dmlc/serializer.h	2019-05-31 12:57:18.000000000 +0300
@@ -118,6 +118,11 @@
  */
 template<typename T>
 struct UndefinedSerializerFor {
+  inline static void Write(Stream *strm, const T &data) {
+  }
+  inline static bool Read(Stream *strm, T *data) {
+    return false;
+  }
 };
 
 /*!
diff -bur ./src_orig/dmlc-core/Makefile ./src_patch/dmlc-core/Makefile
--- ./src_orig/dmlc-core/Makefile	2019-05-18 17:23:44.763042400 +0300
+++ ./src_patch/dmlc-core/Makefile	2019-05-31 13:21:44.000000000 +0300
@@ -22,13 +22,7 @@
 LDFLAGS+= $(DMLC_LDFLAGS) $(ADD_LDFLAGS)
 CFLAGS+= $(DMLC_CFLAGS) $(ADD_CFLAGS)
 
-ifndef USE_SSE
-	USE_SSE = 1
-endif
-
-ifeq ($(USE_SSE), 1)
-	CFLAGS += -msse2
-endif
+USE_SSE = 0
 
 ifdef DEPS_PATH
 CFLAGS+= -I$(DEPS_PATH)/include
diff -bur ./src_orig/Makefile ./src_patch/Makefile
--- ./src_orig/Makefile	2019-05-18 17:23:27.444739100 +0300
+++ ./src_patch/Makefile	2019-05-31 13:03:46.000000000 +0300
@@ -77,6 +77,8 @@
 	CFLAGS += -g -O0 -fprofile-arcs -ftest-coverage
 else
 	CFLAGS += -O3 -funroll-loops
+
+USE_SSE = 0
 ifeq ($(USE_SSE), 1)
 	CFLAGS += -msse2
 endif
diff -bur ./src_orig/python-package/xgboost/libpath.py ./src_patch/python-package/xgboost/libpath.py
--- ./src_orig/python-package/xgboost/libpath.py	2019-05-18 17:23:28.069662400 +0300
+++ ./src_patch/python-package/xgboost/libpath.py	2019-05-31 13:04:34.000000000 +0300
@@ -33,7 +33,7 @@
             # hack for pip installation when copy all parent source directory here
             dll_path.append(os.path.join(curr_path, './windows/Release/'))
         dll_path = [os.path.join(p, 'xgboost.dll') for p in dll_path]
-    elif sys.platform.startswith('linux') or sys.platform.startswith('freebsd'):
+    elif sys.platform.startswith('linux') or sys.platform.startswith('freebsd') or sys.platform.startswith('aix'):
         dll_path = [os.path.join(p, 'libxgboost.so') for p in dll_path]
     elif sys.platform == 'darwin':
         dll_path = [os.path.join(p, 'libxgboost.dylib') for p in dll_path]
diff -bur ./src_orig/rabit/Makefile ./src_patch/rabit/Makefile
--- ./src_orig/rabit/Makefile	2019-05-18 17:23:46.783429700 +0300
+++ ./src_patch/rabit/Makefile	2019-05-31 13:22:54.000000000 +0300
@@ -9,7 +9,7 @@
 endif
 
 export WARNFLAGS= -Wall -Wextra -Wno-unused-parameter -Wno-unknown-pragmas -std=c++11
-export CFLAGS = -O3 $(WARNFLAGS) -I $(DMLC)/include -I include/
+export CFLAGS = -O3 $(WARNFLAGS) -I $(DMLC)/include -I include/ -pthread
 export LDFLAGS =-Llib
 
 #download mpi
@@ -42,23 +42,7 @@
     endif
 endif
 
-#----------------------------
-# Settings for power and arm arch
-#----------------------------
-ARCH := $(shell uname -a)
-ifneq (,$(filter $(ARCH), powerpc64le ppc64le ))
-	USE_SSE=0
-else
-	USE_SSE=1
-endif
-
-ifndef USE_SSE
-	USE_SSE = 1
-endif
-
-ifeq ($(USE_SSE), 1)
-	CFLAGS += -msse2
-endif
+USE_SSE=0
 
 ifndef WITH_FPIC
 	WITH_FPIC = 1

Yes, XGBoost has not been tested with IBM AIX. Can you install Docker and run a Linux container instead?

Hello, hcho3! Thank you for your reply.
Do I understand correctly the architecture you propose, in my understanding it looks,

either -

(1) [Power PC LPAR [AIX OS environment [x86_64 Linux virtual machine or container [XGBoost x86_64 Linux binary ] ] ] ]

- or -

(2) [Power PC LPAR [AIX OS environment [PPC Linux virtual machine or container [XGBoost PPC Linux binary ] ] ] ]

Regarding these architectures, the two questions arise:

  • Is there a way to run Linux container or virtual (either x86 or PPC) on an AIX (always-PPC) LPAR? I imply here an industrial-ready solution, that is capable with learning and scoring on large data sets, efficiently utilizing a multi-core environment (meaning no software virtualization of processor instructions). Frankly speaking, not aware of such a solution
  • If we consider the (2) option, then do we have a PPC-build of XGBoost? If I’m not mistaken, I see only attempts to make custom Linux@PPC builds of XGBoost, no official package.

I my honest opinion, containers and virtualization would hot help here noticeably. Are there any specific plans for AIX support?

Hmm, I didn’t notice that you were using Power architecture. It seems possible to use Linux+Power combination to compile and run XGBoost, according to this thread: https://github.com/dmlc/xgboost/issues/3495. So you have have some success with Linux (Power) containers.

We do not have a specific plan for supporting AIX at this time.

@AIXBird I just came across a report that the current endian detection logic in XGBoost is not quite air-tight:

I’m preparing a pull request to make the endian logic more portable.

Hello, hcho3! We noticed this peculiarity during the compilation and tried to fix as in the listed patch above. Helps to compile, but anyway not clear why the compiled binary anyway fails.