
About
I'm studying Computer Science at New York University, where I spend way too much time trying to understand how to maximize CUDA kernel FLOPs.
I've worked on low-latency C++ systems at AppLovin, built CUDA primitives for the Llama 3 architecture, including attention and communication kernels.
These days I'm exploring how future GPU architectures might shift the balance between compute and communication, half research, half educated guessing, and occasionally just me staring at PTX until it makes sense. I think that I've always been a performance engineer, as nothing is ever too early to optimize, and that also drew me to software as its leverage makes any efficiency increases maximally impactful. This site is where I write about what I learn from working on GPU performance, inference engines, and try to understand, predict, and hopefully be apart of creating architectural shifts.
You can reach me at pwc2029@nyu.edu or find me on GitHub.
