*This is a guest post by Chris McClanahan from ArrayFire (formerly AccelerEyes).*

ArrayFire is a fast and easy-to-use GPU matrix library developed by ArrayFire. ArrayFire wraps GPU memory into a simple “array” object, enabling developers to process vectors, matrices, and volumes on the GPU using high-level routines, without having to get involved with device kernel code.

# ArrayFire Feature Highlights

- ArrayFire provides a high-level array notation and an extensive set of functions for easily manipulating N-dimensional GPU data.
- ArrayFire provides all basic arithmetic operations (element-wise arithmetic, trigonometric, logical operations, etc.), higher-level primitives (reductions, matrix multiply, set operations, sorting, etc.), and even domain-specific functions (image and signal processing, linear algebra, etc.).
- ArrayFire can be used as a self-contained library, or integrated into and supplement existing CUDA code. The array object can wrap data from CUDA device pointers and existing CPU memory.
- ArrayFire contains built-in graphics functions for data visualization. The graphics library in ArrayFire provides easy rendering of 2D and 3D data, and leverages CUDA OpenGL interoperation, so visualization is fast and efficient. Various visualization algorithms make easy to explore complex data.
- ArrayFire offers a unique “gfor” construct that can drastically speed up conventional “for” loops over data. The gfor loop essentially auto-vectorizes the code inside, and executes all iterations of the loop simultaneously.
- ArrayFire supports C, C++, and Fortran on top of the CUDA platform.
- ArrayFire is built on top of a custom just-in-time (JIT) compiler for efficient GPU memory usage. The JIT back-end in ArrayFire automatically combines many operations behind the scenes, and executes them in batches to minimize GPU kernel launches.
- ArrayFire strives to include only the best performing code in the ArrayFire library. This means that the ArrayFire library uses existing implementations of functions when they are faster—such as Thrust for sorting, CULA for linear algebra, and CUFFT for fft. Continue reading