Roles & Responsibilities

Responsible for the core technology development in the Post-Training phase of large language models, building and optimizing a high-quality Reward System. Continuously enhance the model's capabilities in complex instruction adherence, logical reasoning, and value alignment through Reward Modeling (RM) and Reinforcement Learning (RL) algorithms.
Conduct in-depth research and optimization of post-training algorithms such as RLHF to improve model training stability and final outcomes.
Manage and synthesize data in the post-training phase, design an efficient data feedback loop mechanism, utilize techniques like SFT and Self-Instruct to generate high-quality training data, and establish a closed-loop signal modeling system from User Feedback to model iteration.
Perform comprehensive evaluation and analysis of post-training models, develop scientific evaluation metrics, and keep up with cutting-edge technology trends, quickly translating the latest research results into business value.

Knowledge & Competencies:

Master's degree or higher in Computer Science, Software Engineering, Artificial Intelligence, or related fields.
Deep understanding of the Transformer architecture and the principles of large language model training, with substantial research and practical experience in one of the post-training areas such as LLM Alignment, RLHF, or Reward Modeling.
Solid foundation in algorithms and engineering implementation capabilities, proficient in Python, and familiar with deep learning frameworks such as PyTorch or TensorFlow.
Practical experience in distributed training, familiar with large-scale training and inference frameworks like Megatron-LM, DeepSpeed, and vLLM. Experience in training or tuning models with billions or hundreds of billions of parameters is preferred.
Excellent research skills, with a record of high-quality publications (NeurIPS, ICLR, ICML, ACL, EMNLP, etc.) or contributions to high-impact projects in the open-source community (e.g., HuggingFace) preferred.

Strong technical enthusiasm and self-motivation, adept at analyzing and solving complex problems, with good teamwork and communication skills.

Large Model Algorithm Researcher

类别