A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support
- Updated
Dec 12, 2025 - Python
A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Computations and statistics on manifolds with geometric structures.
Machine Learning library for the emerging Mojo/Python ecosystem
Implementation of a Transformer, but completely in Triton
Fast deterministic all-Python Lennard-Jones particle simulator that utilizes Numba for GPU-accelerated computation.
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.
Boilerplate for GPU-Accelerated TensorFlow and PyTorch code on M1 Macbook
🌟 Vertex Centric approach for building GNN/TGNNs
pyCUDA implementation of forward propagation for Convolutional Neural Networks
Fundamentals of heterogeneous parallel programming with CUDA C/C++ at the beginner level.
bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码
vgg16 inference implementation using tensorflow, numpy and pycuda
A package to run commands when GPU resources are available
A helper package to easily time Numba CUDA GPU events ⌛
High-performance Triton-based GPU kernels for accelerating core deep learning operations, from matrix multiplication to convolutions and activation functions.
Real-time object detection app using YOLOv5/YOLOv8 with custom UI built from scratch using Pyglet & OpenGL. UI animations made in Adobe After Effects, rendered as GIFs, and integrated via uxElements.py. Multi-core processing enables live capture, detection, and display with low latency. Uses Open Images v7 dataset. Train mode is WIP.
An opinionated, end‑to‑end tutorial project for learning Reinforcement Learning (RL) from first principles to deployment. No notebooks. Everything is an explicit, inspectable Python script you can diff, profile, containerize, and ship.
A Bifrost plug-in for the Tensor-Core Correlator.
CUDA accelerated raytracer using PyCUDA in Python
Add a description, image, and links to the gpu-programming topic page so that developers can more easily learn about it.
To associate your repository with the gpu-programming topic, visit your repo's landing page and select "manage topics."