Submissions from github.com/policy-gradient

		Implementing DeepSeek R1's GRPO algorithm from scratch (github.com/policy-gradient)
		192 points by xcodevn 5 months ago \| past \| 3 comments