# Installation ## 1. Pure JAX Version (Easy / Recommended) The easiest way to get started with `jaxDecomp` is via PyPI using the pure JAX backend—**no MPI or GPU-specific setup required**. ### ➤ Step-by-step 1. **Install the appropriate JAX wheel**: - **GPU**: ```bash pip install --upgrade "jax[cuda]" ``` - **CPU**: ```bash pip install --upgrade "jax[cpu]" ``` 2. **Install `jaxDecomp`**: ```bash pip install jaxdecomp ``` This setup uses the JAX backend by default and is ideal for experimentation, development, and most common research workflows. --- ## 2. cuDecomp Backend (Advanced / HPC) If you're working on an HPC cluster and need **MPI-based communication** for large-scale GPU or CPU FFTs, you can build from source with cuDecomp enabled. ### ➤ Install with cuDecomp Make sure your environment provides a **CUDA-aware MPI toolchain**, such as the [NVIDIA HPC SDK](https://developer.nvidia.com/hpc-sdk). ```bash pip install -U pip pip install git+https://github.com/DifferentiableUniverseInitiative/jaxDecomp -Ccmake.define.JD_CUDECOMP_BACKEND=ON ``` If CMake cannot find the NVHPC toolchain, set: ```bash export CMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH:$NVHPC_ROOT/cmake ``` Then re-run the installation. ### Troubleshooting If JAX complains about incompatibility with cuSparse or any other library, the easiest solution is to install JAX locally using the `cuda-local` option: ```bash pip install --upgrade "jax[cuda-local]" ``` Then proceed with installing `jaxDecomp` with cuDecomp support. > ℹ️ You can read more about cuDecomp setup and tuning at the official [cuDecomp GitHub repo](https://github.com/NVIDIA/cuDecomp). --- ## Machine-Specific Installation Notes ### IDRIS [Jean Zay](http://www.idris.fr/eng/jean-zay/cpu/jean-zay-cpu-hw-eng.html) HPE SGI 8600 supercomputer As of February 2026, loading modules **in this exact order** works: ```bash module load nvidia-compilers/25.1 cuda/12.6.3 openmpi/4.1.6-cuda nccl/2.26.2-1-cuda cudnn cmake # Install JAX pip install --upgrade "jax[cuda-local]" # Install jaxDecomp with cuDecomp export CMAKE_PREFIX_PATH=$NVHPC_ROOT/cmake # sometimes needed pip install git+https://github.com/DifferentiableUniverseInitiative/jaxDecomp -Ccmake.define.JD_CUDECOMP_BACKEND=ON ``` **Note**: If using only the pure-JAX backend, you do not need NVHPC. > **Important for JeanZay users** > Make sure to load the correct architecture module before loading the `nvidia-compilers` module. > For example for A100 you need to load `module load arch/a100` first. > You also need to set the CXXFLAGS to `export CXXFLAGS="-tp=zen2 -noswitcherror"` if you are using the H100 or A100 partition or if you are using AMD CPUs in general. > More info in [Jean Zay documentation](http://www.idris.fr/eng/jean-zay/gpu/jean-zay-gpu-exec_partition_slurm-eng.html#a100_partition_gpu_p5). --- ## Backend Selection at Runtime Most functions in `jaxDecomp` support dynamic backend selection via a `backend` keyword argument. For example: ```python from jaxdecomp.fft import pfft3d # Use the default (pure JAX) k_array = pfft3d(x) # Use cuDecomp (if compiled and available) k_array = pfft3d(x, backend="cudecomp") ``` This applies to: * `jaxdecomp.fft.pfft3d` * `jaxdecomp.fft.pifft3d` * `jaxdecomp.halo_exchange` * (and other `jaxdecomp.fft.*` and transposition routines) --- ## cuDecomp Transpose Communication Backends If you're using the cuDecomp backend, you can also **manually choose the transpose communication strategy**, which may significantly affect performance depending on your cluster hardware and MPI configuration. Available options: ```python from jaxdecomp import ( TRANSPOSE_COMM_NCCL, TRANSPOSE_COMM_MPI_A2A, TRANSPOSE_COMM_MPI_P2P, ) # Set transpose communication backend (default is NCCL) jaxdecomp.config.update('transpose_comm_backend', TRANSPOSE_COMM_NCCL) jaxdecomp.config.update('transpose_comm_backend', TRANSPOSE_COMM_MPI_P2P) jaxdecomp.config.update('transpose_comm_backend', TRANSPOSE_COMM_MPI_A2A) ``` > ℹ️ These options are described in more detail in the [cuDecomp GitHub documentation](https://github.com/NVIDIA/cuDecomp#transpose-communication-backends). --- ## Notes on Performance Backend performance varies widely depending on your cluster setup (e.g., interconnect type, topology, NCCL version, MPI implementation). We recommend benchmarking both backends on your target workload to determine the best configuration.