Principal Research Software Engineer

Are you interested in developing and optimizing deep learning systems? Are you interested in designing novel technology to accelerate their training and serving for cutting edge models and applications? Do you want to scale large Artificial Intelligence models to their limits on massive supercomputers? Are you interested in being part of an exciting open-source library for deep learning systems? The DeepSpeed team is hiring!

Microsoft’s DeepSpeed is an open-source library built on the PyTorch (machine learning framework) ecosystem that combines numerous research innovations and technology advancements to make deep learning efficient and easier to use. DeepSpeed can parallelize across thousands of GPUs and train models with trillions of parameters. Our OSS (Open Source Software) has powered many advanced models like MT-530B and BLOOM, and it supports unprecedented scale and speed for both training and inference.

The DeepSpeed team is also part of the larger Microsoft AI at Scale initiative, which is pioneering the next-generation AI capabilities that are scaled across the company’s products and AI platforms.

The DeepSpeed team is looking for a Principal Research Software Engineer. As a tech lead with passion for innovations and for building high-quality systems that will make significant impact inside and outside of Microsoft. Our team is highly collaborative, innovative, and end-user obsessed. We are looking for candidates with systems skills and passionate about driving innovations to improve the efficiency and effectiveness of deep learning systems. We value creativity, agility, accountability, and a desire to learn new technologies.

Qualifications:
Required Qualifications:

  • Bachelor’s Degree in Computer Science, or related technical discipline AND 8+ years technical engineering experience with coding in languages including, but not limited to, C++, CUDA, Python
    • OR equivalent experience.
  • 8+ years of experience in designing and/or building high-performance computing systems.
  • 5+ years of experience with distributed systems.

Preferred Qualifications:

  • Experience in technology leadership in machine learning systems or large-scale distributed systems
  • Experience with deep learning (DL) and familiarity with an existing DL framework (e.g., PyTorch, TF)
  • Experience with performance analysis and optimization for CPUs and GPUs
  • Experience on different hardware such as both Nvidia and AMD GPUs is a plus
  • History of open source contributions and working with open source communities
  • Ph.D. in Computer Science or related field is desirable
  • Passionate about delivering high-quality software
  • Ability to effectively communicate highly technical concepts and insights to a non-technical audience.

Software Engineering IC6 – The typical base pay range for this role across the U.S. is USD $158,500 – $276,600 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $202,800 – $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form .

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

#WebXTPrincipal#

Responsibilities:

  • Drive and lead ground-breaking innovations to advance deep learning systems.
  • Drive cutting-edge research prototypes and assist in preparing for production deployments.
  • Discover/solve impactful technical problems, advance the state of the art, and turn ideas into production.
  • Develop and maintain a cutting-edge open-source project to advance massive-scale deep learning.
  • Develop, write concise, robust, and clean code.
Job Category
Software Engineering
Job Type
Full Time/Permanent
Salary
USD 304,200.00 per year
Country
United States
City
Redmond
Career Level
unspecified
Company
Microsoft
JOB SOURCE
https://jobs.careers.microsoft.com/global/en/job/1623240/