Ask HN: What's something interesting you learned from training your own GPT?

2 points | by amadeuswoo 13 hours ago ago

2 comments

dlcarrier 3 hours ago
Neural network development platforms are even more bloated and broken than the record set by FPGA development platforms and even mobile phone development platforms.
linolevan 13 hours ago
For tiny models, the SFT data mixture is unbelievably critical to usability. They are unable to generalize in almost any way. If you don't have multi-turn conversations, they will not be able to do multi-turn conversations. If you have multi-turn conversations which are just chatting, and then single turn conversations for math, it will be unable to do math in a multi-turn setting. This is much less true for bigger models.