Posts

New top story on Hacker News: Show HN: RULER – Easily apply RL to any agent

Show HN: RULER – Easily apply RL to any agent 10 by kcorbitt | 0 comments on Hacker News. Hey HN, Kyle here, one of the co-founders of OpenPipe. Reinforcement learning is one of the best techniques for making agents more reliable, and has been widely adopted by frontier labs. However, adoption in the outside community has been slow because it's so hard to implement. One of the biggest challenges when adapting RL to a new task is the need for a task-specific "reward function" (way of measuring success). This is often difficult to define, and requires either high-quality labeled data and/or significant domain expertise to generate. RULER is a drop-in reward function that works across different tasks without any of that complexity. It works by showing N trajectories to an LLM judge and asking it to rank them relative to each other. This sidesteps the calibration issues that plague most LLM-as-judge approaches. Combined with GRPO (which only cares about relative scores wit...

New top story on Hacker News: Bret Victor on why current trend of AIs is at odds with his work

Bret Victor on why current trend of AIs is at odds with his work 48 by prathyvsh | 3 comments on Hacker News.

New top story on Hacker News: Google fails to dismiss wiretapping claims on SJ, settles with app users

Google fails to dismiss wiretapping claims on SJ, settles with app users 15 by 1vuio0pswjnm7 | 1 comments on Hacker News. https://ift.tt/gpqoHzb

New top story on Hacker News: Desktop Publishing Tools That Didn't Make It

Desktop Publishing Tools That Didn't Make It 13 by rbanffy | 1 comments on Hacker News.

New top story on Hacker News: Bootstrapping a side project into a profitable seven-figure business

Bootstrapping a side project into a profitable seven-figure business 25 by jonkuipers | 5 comments on Hacker News.

New top story on Hacker News: Brainwash '72 [video]

Brainwash '72 [video] 7 by petethomas | 2 comments on Hacker News.