Would you like to work with hyperscale services to deliver to some of Microsoft’s most critical customers? We’re looking for a Senior Site Reliability Engineering Manager with the right mix of software development, online service experience and drive for quality to envision, design, and deliver Office 365 government cloud service offerings.
Office 365 is at the center of Microsoft’s cloud first, devices first strategy as it brings together cloud versions of our most trusted communication and collaboration products like Exchange, SharePoint, and Teams with our cross-platform desktop suites and mobile apps. The Office 365 Enterprise Cloud team works with Microsoft’s largest enterprise and government customers to deliver features that meet their specific needs and enable cloud adoption. As you would expect, our customers have the highest expectations for feature quality, security, reliability, availability, and performance.
The Site Reliability Engineering (SRE) team provides leadership, direction and accountability for application architecture, system design, and end-to-end implementation. As a Senior Site Reliability Engineering Manager, you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design. Strong collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our government customers and users.
At Microsoft, we can offer you a strong team, exciting challenges, and a fun place to work. The work environment empowers you to have a positive impact on millions of end users.
Responsibilities:
- Manages a team of Site Reliability Engineers (SREs) using performance and resource monitoring tools to analyze telemetry and identify whether there is a need to optimize system, platform, and/or product code – or if changes to compute resources are required; provides guidance on the use of modeling and analysis tools to verify the efficacy of changes at scale. Facilitates collaboration between SREs and relevant engineering teams to propose solutions that are aligned with customer/business needs.
- Shares insights and best practices that can be applied to improve development and operations across related sets of systems, platforms, and/or products. Continues to develop their understanding of insights and best practices through interactions with more experienced SREs and members of product engineering teams. Mentors and coaches less experienced engineers to help them identify and propose relevant solutions.
- Oversees a team of Site Reliability Engineers (SREs) using existing tools and/or models to identify contributing factors and points of failure affecting availability, reliability, performance, and/or efficiency of systems, platform, and/or products; provides guidance, recommendations, and feedback to SREs to help them troubleshoot problem and to identify and test scalable solutions that can prevent the occurrence of similar issues in related products within their organization.
- Demonstrates end-to-end expertise in distributed systems design, interactions between cloud technology layers and components, functions of physical network devices, and dependencies at scale. Drives efforts within an organization to identify and recommend optimal configurations of cloud technology solutions and develops or modifies the code base that defines infrastructures to improve the reliability and operability of supported products.
Qualifications:
Required/Minimum Qualifications
- 6+ years technical experience in software engineering, network engineering, or systems administration
- OR Bachelor’s Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
- OR Master’s Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
Security Clearance Requirements: Candidates must be able to meet Microsoft, customer and/or government security screening requirements that are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with or without a polygraph, and be willing up upgrade to and maintain a U.S. Government TS/SCI with polygraph. Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Clearance Verification: This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment
Additional or Preferred Qualifications
- 7+ years technical experience in software engineering, network engineering, or systems administration
- OR Bachelor’s Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration
- OR Master’s Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
- OR Doctorate Degree in Computer Science, Information Technology, or related field.
- 3+ years technical experience working with large-scale cloud or distributed systems.
- 3+ years people management experience.
- Experience working with large-scale cloud or distrubuted systems.
- Experience managing a team of engineers
Site Reliability Engineering M4 – The typical base pay range for this role across the U.S. is USD $112,000 – $218,400 per year.There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $145,800 – $238,600 per year.
Site Reliability Engineering M5 – The typical base pay range for this role across the U.S. is USD $133,600 – $256,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $173,200 – $282,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.