Getting Started with PyDAAL on Linux Share your comment!

PyDAAL installation

Data mining and machine learning are just two of the in-high-demand functions that PyDAAL can ease coding for. Read this blog by David Bolton to learn how to get started.

The package PyDAAL is Intel’s Data Analytics Acceleration Library (DAAL) for python and provides interfaces to it. DAAL provides building blocks for data management, data preparation, data mining and machine learning. Blocks can be combined across a wide range of Intel processors and be integrated into big data analytic workflows, including computations that are too big to fit into memory using out-of-core algorithms.

I believe you can use PyDAAL with non-Intel Python but it makes more sense to install it with Intel’s Python. That’s because It’s free and includes Intel MKL and Intel TBB that have been integrated with Numpy, SciPy and other modules to make them run much faster than with normal Python.
There are a couple of ways to install it, either download the specific install package from Intel or install Anaconda. Of the two I’ve found the Anaconda way to be better as it can set up a virtual environment for you and much more importantly doesn’t install it Python in a root only area.
The default path for intel Python is /opt/intel/intelpython35 or 27 for Python 2.7. You can’t easily install other packages as this is a root only area so you then have to get virtualenv installed. I found that Anaconda worked very nicely instead.
Note if you want PyDAAL and have the original release of Intel Python installed, you must install the update as it adds PyDAAL .

Here is what to do.

Download the Python 3.5 Anaconda for Linux, it’s a very large .sh file so run it with the bash command. The file I downloaded was Anaconda3-4.2.0-Linux-x86_64.sh so the bash command is

Then add the Intel channel with this command. Do not forget this or you won’t get the Intel optimized versions.

Finally create a virtual env for it but where it says venvname, put your own environment name e.g. ip in the example below. Note the intelpython3_core parameter. This is vital or you’ll end up with the standard Python not Intel’s.

If you want the full Intel python packages set use intelpython3_full instead of intelpython3_core in the line above. If you use intelpython3_core you’ll have to manually install PyDAAL as well. Just a simple one liner does that.

To switch to your virtual environment, use the standard source activate command below and then source deactivate to switch back when you no longer need it.

You can see all the virtual environments setup with this command:

This outputs something like this:

The * is the current active environment.

Now type in Python and you’ll get this:

Make sure it says Intel on the tin!

Testing PyDAAL

I used one of the Intel examples that comes with Intel Python to test it. The file cos_dist_dense_batch.py calculates the cosine distances from the example file distance.csv. Now that’s not provided with Intel Python but is provided with the Intel Parallel Studio 2017 (Look in IntelSWTools\compilers_and_libraries\windows\daal\examples\data\batch) or can be downloaded from Github. I had to modify it as I hadn’t set the DAALROOT environmental variable and it was just as easy to put the file in the same location as the python script.
It imports a function from utils and tracking that module down took a bit of time, even though it’s just a very short file at 16 lines long. In the end I found a version on Github but if you use that you have to swap the 2nd and 3rd parameters of the utils.printNT() function which I renamed to printNumericTable() to match the function name in the example.

The output when run is a multi-dimensional array of distances, each between 0 and 1.

Conclusion

There’s several ways to install Intel Python and PyDAAL. I’m sure I read somewhere that for PyDAAL, you should only use Python 3.5 not 2.7 but I may be wrong.

I got sidetracked by following an outdated set of install instructions. Do not use these or you’ll have Anaconda Python not Intel!

Posted on January 27, 2017 by David Bolton, Slashdot Media Contributing Editor