Clara — Latest Exploration

After a fulfilling summer working in the industry, coming back to school I felt I wanted to stay connected to the AI field and continue exploring my interest. It felt like the perfect step was to get involved in undergrad research. I wasn't sure on how exactly to do this, but I decided to start by just cold emailing professors whose work I found interesting. One of which was Professor Rafatirad's work on reinforcement learning algorithms.

About a week after my original email and no response, I realized in a previous quarter I had met a student working in the same lab during a study group session. I reached out to him and asked what he did to get his position in the lab. His advice was to email the graduate student leading the lab directly. I got a response back almost immediately and interviewed that same day. I was offered a position in the lab a few hours later and put on the team working on RL algorithms.

It was a whirlwind of information at first, as I was joining just as they were starting up their next research focus: the GRPO algorithm. I had to read through a lot of papers to get up to speed on the current state of RL algorithms and benchmarks, but after a very long weekend, I was caught up to the rest of the lab members and we began formulating our plan.

I've lately been working with two other undergrads in the lab to gather new data that hasn't been used before in any math benchmarks. The master's students had the grand idea to scrape the Gaokao and JEE exams. The past few weeks have been filled with hunting for datasets, figuring out how to best scrape some pdfs and clean/format the data, and also confirm accuracy. Next steps are to run other LLMs on this new dataset and see how it compares to other benchmarks.

It's been a really eye-opening experience working alongside these other highly motivated individuals to try and figure out how this all works, and how we can possibly improve things from the data gathering steps, to how we implement the GRPO reward algorithm.