sg happening
← 返回职位列表
WECHAT INTERNATIONAL PTE. LTD.

Large Model Algorithm Researcher

Executive Permanent 0 年以上经验

月薪

$14,000 – $22,000

发布时间

2026年4月6日

截止 2026年4月20日

技能

TensorFlowLogical ReasoningACLArtificial IntelligenceSoftware EngineeringOpen Source DevelopmentPyTorchResearch SkillsFeedback LoopsTrainingAlgorithmsTotal Rewards Strategies

职位描述

Roles & Responsibilities

  1. Responsible for the core technology development in the Post-Training phase of large language models, building and optimizing a high-quality Reward System. Continuously enhance the model's capabilities in complex instruction adherence, logical reasoning, and value alignment through Reward Modeling (RM) and Reinforcement Learning (RL) algorithms.
  2. Conduct in-depth research and optimization of post-training algorithms such as RLHF to improve model training stability and final outcomes.
  3. Manage and synthesize data in the post-training phase, design an efficient data feedback loop mechanism, utilize techniques like SFT and Self-Instruct to generate high-quality training data, and establish a closed-loop signal modeling system from User Feedback to model iteration.
  4. Perform comprehensive evaluation and analysis of post-training models, develop scientific evaluation metrics, and keep up with cutting-edge technology trends, quickly translating the latest research results into business value.

Knowledge & Competencies:

  1. Master's degree or higher in Computer Science, Software Engineering, Artificial Intelligence, or related fields.
  2. Deep understanding of the Transformer architecture and the principles of large language model training, with substantial research and practical experience in one of the post-training areas such as LLM Alignment, RLHF, or Reward Modeling.
  3. Solid foundation in algorithms and engineering implementation capabilities, proficient in Python, and familiar with deep learning frameworks such as PyTorch or TensorFlow.
  4. Practical experience in distributed training, familiar with large-scale training and inference frameworks like Megatron-LM, DeepSpeed, and vLLM. Experience in training or tuning models with billions or hundreds of billions of parameters is preferred.
  5. Excellent research skills, with a record of high-quality publications (NeurIPS, ICLR, ICML, ACL, EMNLP, etc.) or contributions to high-impact projects in the open-source community (e.g., HuggingFace) preferred.

Strong technical enthusiasm and self-motivation, adept at analyzing and solving complex problems, with good teamwork and communication skills.