Senior Cloud Operations and Site Reliability Engineering (SRE) Azure Cloud, Enterprise Cloud Pl[...]
Company: Bank of America
Location: Jersey City
Posted on: November 25, 2022
Job Description:Senior Cloud Operations and Site Reliability
Engineering (SRE) Azure Cloud, Enterprise Cloud PlatformsAbout Bank
of America - Global Technology:Global Technology delivers
technology services globally across the bank's eight lines of
business that serve individuals, companies, and institutions. The
team also focuses on digital banking, payments, infrastructure,
data management and technology that enhances cyber security, and
risk and capital management. Innovation is at the heart of all
Global Technology does.Enterprise Cloud Platforms Team:Enterprise
Cloud Platforms team in the CTO organization offers Private and
Public Cloud platforms for Bank of America's developers to drive
faster time-to-market, innovation with private and public cloud
capabilities, and reduce complexity with bult-in integrations. We
believe in high quality engineering culture to engineer our
platforms with customer and platform mindset, design for large
enterprise scale and resilience, and accelerate market innovation
into the technical platforms we deliver.As part of this team, you
will have a large impact on the evolution of next generation Cloud
services for Bank of America and explore an extensive list of new
technologies that will drive innovation across our company.We are
seeking an experienced Senior Cloud Site Reliability Engineer (SRE)
to support and administration of our Public Cloud (Azure /AWS
/Google) and Containers (OpenShift) platform.Our Cloud Service
Reliability Engineers (cSREs) ensure that our Cloud services meet
the reliability and uptime requirements of our demanding enterprise
customers. This is achieved with, the best engineering practices
and resilient design and through a well-defined and effective
global on-call rotation that runs 24x7.The role provides
opportunity to work with wide range of technologies and unique
perspective on how various services (on-prem/external) interact
with each other. You will work with colleagues that are as smart,
hardworking, and driven as you. You will get an opportunity to work
in a team that keeps growing, innovating, and giving you room to be
proactive and creative.Are you ready for the next step in your
career? Then we'd love to hear from you!Position Summary:
- Responsible for reliability and support of Cloud Platform
including Public Cloud (Azure /AWS /Google) services.
- Monitor and troubleshoot Azure/AWS /Google environment
performance issues, connectivity issues, security issues, etc.
- Perform deep dives into systemic and latent reliability issues,
incident management, problem management
- Identifying, analyzing, and resolving infrastructure
vulnerabilities and application deployment issues.
- Perform blameless RCA, partner with engineering and operation
teams across the organization to roll out fixes.
- Identify and drive opportunities to improve automation for the
cloud services; scope and create automation for deployment,
management, and visibility of our services.
- Evaluating and automating the scaling and capacity requirements
within Azure environments
- Engage with engineering teams throughout the full lifecycle
from design, engineering, deployment, & operations.
- Partner with risk and compliance teams to bring visibility and
implement right controls and policies in the Cloud Platform
- Ensure resiliency during implementation and identify/fix
resiliency problems by collaborating with engineering teams
- Be a key stakeholder in the design of cloud services and work
with Architecture, engineering, product teams
- Participate in 24x7 on-call coverage follow the sun
- BS /MS degree in Computer Science or related technical field
involving systems or equivalent practical experience.
- Minimum 8+ years of hands-on experience maintaining cloud
platforms on a major cloud service provider.
- Experience working on Azure operations and Administration.
- Azure /Terraform /AWS /Google certifications are a plus
- Strong experience in implementing, monitoring, and maintaining
Microsoft Azure solutions, including major services related to
Compute, Storage, Network and Security
- Experience with monitoring tools such as Prometheus or
Dynatrace, as well as cloud native tools like Azure Monitor and Log
- Understanding of cost management, inventory management, FinOps
- Strong understanding and background of working with a complex
IAM infrastructure, including Active Directory, Azure AD Connect,
Azure AD, and PingIdentity, Okta, or other SSO solutions.
- Advanced knowledge of DNS, DHCP, Kerberos and Windows
- Experience with IaC with Terraform
- Python, Ansible and shell scripting
- Experience with CI/CD tools such as git andJenkins, familiarity
with using a GitOps model
- Excellent understanding of Linux /Windows operating systems
- Systematic problem-solving approach, sense of ownership and
- Excellent interpersonal, organizational and communication
(written, verbal, and presentation) skills are a must.Desired Job
- Experience in Terraform, Ansible
- Experience working in a highly available multi-datacenter
- Proven ability to work independently with minimal supervision
and as part of a team with direct responsibilities.
- Ability to juggle competing priorities and adapt to changes in
project scope.Shift:1st shift (United States of America)Hours Per
Keywords: Bank of America, Jersey City , Senior Cloud Operations and Site Reliability Engineering (SRE) Azure Cloud, Enterprise Cloud Pl[...], Professions , Jersey City, New Jersey
Didn't find what you're looking for? Search again!