/now
Currently in the lab.
- ResearchingReasoning-model failure modes (7B to 70B)
- StackPyTorch · Slurm · 4× GH200
- Side benchGARCH(1,1) on SPY, walk-forward
- Based inDayton, Ohio
- Emailabhijeetguptaphd@gmail.com

signal · t = now
α = 0.07
Currently exploring
Two disciplines, one practice
LLM evaluation, reinforcement learning robustness, and computer vision pipelines on multi-GPU HPC infrastructure.
Apr 2026
KSE 2024, Published
Options pricing, GARCH volatility modeling, and market-neutral statistical arbitrage.
More →/now
Currently reading and considering
Manufacturing Automation
Integrating reproducible machine learning and statistical process control into legacy production lines, where the operational bottleneck is rarely the model itself.
AGI
Examining what 'general' substantively means once it is operationalized for measurement. Currently reading critiques of scaling laws and recent work on evaluation design.
LLM Interpretability
Investigating reasoning-model failure modes, family-level differences between Llama and Qwen, and the conditions under which forced re-entry helps or harms performance.
Most recent publication: “Enhancing Sustainability and Construction Safety Research in the Era of Artificial Intelligence,” ASME Journal of Engineering for Sustainable Buildings and Cities 2026. View all →
Explore the work
Every project I have shipped, tagged by the technologies it employs. Select a skill to see where it has been applied.
Project explorer · interactive
20 skills · 8 projects · click to filter
Writing
Occasional writing on research, tooling, and learning in public.
All writing →Jun 12, 2026
A practical guide to running, calibrating, and reporting LLM-as-a-Judge results — covering judge selection, position bias, pairwise vs scoring setups, and the statistics that actually belong in the paper.
Jun 10, 2026
An annotated bibliography of foundational and recent work in LLM evaluation and reinforcement learning, with notes on why each paper matters in practice.
Apr 20, 2026
Observations from disentangling reasoning length effects from forced re-entry across Llama and Qwen distilled models.
Nov 2, 2025
Small operational habits that yield significant returns when several collaborators share the same GPU resources.
Get in touch