Thank you for this interesting work, which enables the 7B - sized model to achieve such good results on BIRD as well.
The evaluation metrics of the BIRD leaderboard are relatively strict. Specifically: floating - point numbers must show full precision, the display of columns must be the same as the quantity and order arrangement in the question, and so on. These conditions and the uncontrollability of LLM have caused a relatively obvious decline in the performance of my work results. And OmniSQL - 7B can achieve an impressive score of 69.04 on DEV. May I ask if your model has solved the problems in this regard? Or assuming that an external auxiliary alignment correction is added, your model can still gain improvement again (and it will be a surprising improvement). Forgive me for not being able to fully run your model to verify this problem for the time being.
I would be very grateful if you could reply. Best regards.

Thank you for this interesting work, which enables the 7B - sized model to achieve such good results on BIRD as well.
The evaluation metrics of the BIRD leaderboard are relatively strict. Specifically: floating - point numbers must show full precision, the display of columns must be the same as the quantity and order arrangement in the question, and so on. These conditions and the uncontrollability of LLM have caused a relatively obvious decline in the performance of my work results. And OmniSQL - 7B can achieve an impressive score of 69.04 on DEV. May I ask if your model has solved the problems in this regard? Or assuming that an external auxiliary alignment correction is added, your model can still gain improvement again (and it will be a surprising improvement). Forgive me for not being able to fully run your model to verify this problem for the time being.
I would be very grateful if you could reply. Best regards.