Quansight engineers have implemented support for tracing through NumPy code via
torch.compile in PyTorch 2.1. This feature leverages PyTorch’s compiler to
generate efficient fused vectorized code without having to modify your original
NumPy code. Even more, it also allows for executing NumPy code on CUDA
just by running it through torch.compile under torch.device(