CS + AI/ML student

Devaansh Pathak

AI/ML and systems builder interested in reliable software, thoughtful evaluation, and practical research tools.

I work across LLM agents, reinforcement learning environments, evaluation systems, AI infrastructure, and applied engineering. This site collects my projects, writing, publications, and research notes as they develop.

Devaansh Pathak

Profile

Research-minded engineering

I like problems where models, tools, data, and systems meet, especially when behavior needs to be measured carefully rather than only demoed.

I use this space as a working record of what I am building and learning: research prototypes, software projects, implementation notes, and longer-form writeups. The common thread is a preference for systems that can be inspected, tested, and improved over time.

Interests

Technical interests

A few areas I keep returning to while building projects and reading research.

Reliable LLM systems
Reinforcement learning environments
Evaluation pipelines and benchmarks
AI infrastructure and tooling
Full-stack product engineering
Failure analysis and debugging

Current research thread

SRE-Zero

An environment-grounded benchmark for evaluating reliable tool-using agents in simulated incident-response workflows. The project focuses on sequential decisions, safe tool use, partial evidence, remediation quality, and operational reliability metrics.

LLM AgentsRL EnvironmentsEvaluationAI Systems
Project page

Writing

Latest blog posts

Research diary entries, project notes, and implementation writeups.

All posts