8 Reliability Engineer jobs in Australia
Site Reliability Engineer, Spanner
Posted 2 days ago
Job Viewed
Job Description
**Minimum qualifications:**
+ Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
+ 1 year of experience in coding in one or more of the following programming languages: C, C++, Java, Python, Go.
+ Experience in optimizing code for stability, functionality and scalability (e.g., crawling, search, troubleshooting).
**Preferred qualifications:**
+ 1 year of experience in coding in one or more of the following programming languages: C, C++, Java, Python, Go.
+ Experience in one or more of the following: C++, TyperScript, and Go
+ Experience in analyzing and troubleshooting large-scale distributed systems.
+ Ability to manage periodic on-call duty as well as out-of-band requests.
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance.
Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
**Responsibilities:**
+ Manage Spanner SRE and deliver critical projects.
+ Oversee Spanner customers help themselves with debugging and mitigation.
+ Expand Spanner to serve customers in new ways under new conditions and restrictions.
+ Improve the overall Spanner observability.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also and If you have a need that requires accommodation, please let us know by completing our Accommodations for Applicants form:
Site Reliability Engineer - SPP

Posted 10 days ago
Job Viewed
Job Description
**Do you**
+ know Linux in various levels of diagnostics and troubleshooting?
+ write code to automate repetitive tasks every time you face repetitive work?
+ smile when you solve an issue in Frankfurt from your laptop in Sydney?
Answer 'yes' to these questions and we would like to hear from you. Go ahead, hit the Apply button and let's have a chat about your skills and experiences.
**Want to know more about us?**
Now that we have set the pace, keep reading if you want to understand more about the role and the SRE team. We hope it will be helpful.
**Let's start with the role**
**As a Site Reliability Engineer, you will**
+ Provide relief and sustainable resolution to issues within our infrastructure.
+ Use your experience in software development, systems engineering and networking to proactively prevent repeatable issues.
+ Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design.
+ Drive a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.
**_Note:_** _This is a full-time position with a four-day workweek. Working hours are from 11:00 PM to 9:00 AM. Weekend shifts are fixed and will be discussed in detail during the interview process._
**This is what we require. Take note because they are a must-have** :
+ Knowledge of Linux systems.
+ Coding experience, we normally prefer Python or JavaScript.
+ Networking skills, IP addressing, routing protocols.
+ Monitoring of systems, applications and networks.
+ Uncompromising attention to detail.
**We also have pluses!**
These are not a 'must', but please highlight them on your resume if you have:
+ Experience in cloud architecture or web applications engineering.
+ Experience in databases performance, replication, high availability.
+ A bachelor's or master's degree in a technical area.
**_Note: Australian Citizenship and the capability to obtain a baseline security clearance is a requirement for this role_** _._
**Now a bit about the SRE team**
The SRE team is a group of highly technical engineers who are tasked with maintaining and developing the reliability, scalability and performance of the ServiceNow infrastructure. The SRE is empowered to drive technical resolutions across the technology stack from hardware through to application and all stops in between. They are also tasked with driving forward the operability of the platform to drive down the number of incidents and to reduce MTTR.
To accomplish this the team combines software development, networking and systems engineering expertise with a strong desire to be challenged by problems of scale and complexity and to make services better for our customers.
**Work Personas**
We approach our distributed world of work with flexibility and trust. Work personas (flexible, remote, or required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work and their assigned work location. Learn more here ( . To determine eligibility for a work persona, ServiceNow may confirm the distance between your primary residence and the closest ServiceNow office using a third-party service.
**Equal Opportunity Employer**
ServiceNow is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, national origin or nationality, ancestry, age, disability, gender identity or expression, marital status, veteran status, or any other category protected by law. In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements.
**Accommodations**
We strive to create an accessible and inclusive experience for all candidates. If you require a reasonable accommodation to complete any part of the application process, or are unable to use this online application and need an alternative method to apply, please contact for assistance.
**Export Control Regulations**
For positions requiring access to controlled technology subject to export control regulations, including the U.S. Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals. All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities.
From Fortune. ©2025 Fortune Media IP Limited. All rights reserved. Used under license.
Senior Site Reliability Engineer

Posted 10 days ago
Job Viewed
Job Description
25WD88723
**Position Overview**
Do you want the opportunity to be part of a startup environment working on a new product seeking to become a world-leading integration platform? Are you looking to be at the forefront of innovative new technology that will ultimately help people imagine, design, and make a better world? If so, come join the Tandem Connect team at Autodesk! Working with the Tandem team, our mission is to create integration technology and solutions that will transform how buildings are designed, built, and operated.
We are seeking a creative Senior Site Reliability Engineer who has experience building and maintaining scalable, reliable and modern cloud services to join our team today.
**Responsibilities**
+ Maintain a secure, scalable and resilient platform that our customers can trust. This includes the implementation of Autodesk and industry best practices and standards
+ Manage and optimise the security, performance, reliability, and scalability of Kubernetes clusters on Amazon EKS
+ Administer and troubleshoot MongoDB Atlas, AWS MemoryDB (Redis), RabbitMQ on Amazon MQ, and Kafka on Amazon MSK.
+ Design, implement and maintain effective monitoring of the platform and associated components
+ Support other teams with the implementation of their infrastructure requirements
+ Contribute to the design and implement resilient and scalable architectures, including high availability and disaster recovery strategies
+ Provision and manage infrastructure using Terraform, ensuring meticulous configuration management and documentation
+ Set up and maintain monitoring and logging systems, such as Prometheus, Dynatrace, Amazon Cloudwatch and other tools
+ Collaborate with cross-functional teams to resolve complex issues and mentor junior engineers
+ Share your knowledge and learnings with the infrastructure guild
+ Partner closely with the product development, architecture teams and other stakeholders to identify and implement improvements to the product infrastructure and operations
+ Contribute to improvements in processes, tools, and technical methodologies that increase the effectiveness and efficiency of the team in responding to customer and business needs, with an emphasis on having an efficient CI/CD process
+ Provide technical guidance and constructive feedback to team members and stakeholders, which includes writing, reading, and reviewing plans, designs and scripts, and participating in the various technical feedback loops happening within the organisation
+ Contribute to technical product roadmaps
+ On Call support as part of a rostered escalation process
**Minimum Qualifications**
+ BS or MS in computer science, related technology field, or equivalent experience
+ You have at least 7 years of hands-on experience with operating and managing virtual software (with the majority managing containerised workloads) and high traffic customer-facing enterprise solutions in production environments
+ Expertise in defining and managing Kubernetes-based workloads that scale
+ Ability to configure and customize Linux-based operating environments based on application needs
+ Strong understanding of TCP/IP and virtual networking technologies, including Kubernetes Network Policies and AWS Cloudfront
+ Ability to perform automated testing using Cypress
+ Experience with performing live database upgrades
+ Adept at writing and managing Helm and Terraform scripts using GitOps principles
+ Knowledge in integrating password management systems with Infrastructure as Code
+ Proficient in using bash and Python to integrate with network services
+ Extensive experience with creating customized Docker images
+ Extensive experience with DevOps and DevSecOps-based SDLC practices
+ Good understanding of security principles at the network, server, and container levels
+ In-depth understanding of the software development lifecycle (SDLC)
+ Working experience with MongoDB, Redis, Kafka, RabbitMQ, Vault, Consul and equivalent AWS services, including live data migration with minimal downtime
+ Experience with CI/CD and building deployment pipelines using Jenkins and Rundeck.
+ Experience with running load tests and benchmarking tools
+ Strong written and oral communication skills in English
+ Ability to operate effectively and independently in a dynamic, fluid environment
+ Detail-oriented approach to building secure, stable, software
+ Experience with Agile development practices such as Scrum or Kanban
**Preferred Qualifications**
+ Amazon Web Services (AWS) experience.
+ Experience with integration-Platform-as-a-Service (iPaaS) offerings.
+ Ability to read and write in Node.js
+ Experienced with supporting Kubernetes-based MQTT Brokers using the Aedes MQTT software
#LI-CL1
**Learn More**
**About Autodesk**
Welcome to Autodesk! Amazing things are created every day with our software - from the greenest buildings and cleanest cars to the smartest factories and biggest hit movies. We help innovators turn their ideas into reality, transforming not only how things are made, but what can be made.
We take great pride in our culture here at Autodesk - our Culture Code is at the core of everything we do. Our values and ways of working help our people thrive and realize their potential, which leads to even better outcomes for our customers.
When you're an Autodesker, you can be your whole, authentic self and do meaningful work that helps build a better future for all. Ready to shape the world and your future? Join us!
**Salary transparency**
Salary is one part of Autodesk's competitive compensation package. Offers are based on the candidate's experience and geographic location. In addition to base salaries, we also have a significant emphasis on discretionary annual cash bonuses, commissions for sales roles, stock or long-term incentive cash grants, and a comprehensive benefits package.
**Diversity & Belonging**
We take pride in cultivating a culture of belonging and an equitable workplace where everyone can thrive. Learn more here: you an existing contractor or consultant with Autodesk?**
Please search for open jobs and apply internally (not on this external site).
Site Reliability Engineer - IBM Cloud Databases
Posted 3 days ago
Job Viewed
Job Description
A career in IBM Software means you'll be part of a team that transforms our customer's challenges into solutions.
Seeking new possibilities and always staying curious, we are a team dedicated to creating the world's leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.
IBM's product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
**Your role and responsibilities**
As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes.
Your primary responsibilities include:
-24x7 Observability: Be part of a worldwide team that monitors the health of production systems and services around the clock, ensuring continuous reliability and optimal customer experience.
-Cross-Functional Troubleshooting: Collaborate with engineering teams to provide initial assessments and possible workarounds for production issues. Troubleshoot and resolve production issues effectively.
-Deployment and Configuration: Leverage Continuous Delivery (CI/CD) tools to deploy services and configuration changes at enterprise scale.
-Security and Compliance Implementation: Implementing security measures that meet or exceed industry standards for regulations such as GDPR, SOC2, ISO 27001, PCI, HIPAA, and FBA.
-Maintenance and Support: Tasks related to applying Couchbase security patches and upgrades, supporting Cassandra and Mongo for pager duty rotation, and collaborating with Couchbase Product support for issue resolution.
**Required technical and professional expertise**
-System Monitoring and Troubleshooting: Strong skills in monitoring/observability, issue response, and troubleshooting for optimal system performance.
-Automation Proficiency: Proficiency in automation for production environment changes, streamlining processes for efficiency, and reducing toil.
-Operation and Support Experience: Demonstrated experience in handling day-to-day operations, alert management, incident support, migration tasks, and break-fix support.
**Preferred technical and professional experience**
-Kubernetes/OpenShift: Strongly preferred experience in working with production Kubernetes/OpenShift environments.
-Automation/Scripting: In depth experience with the Ansible, Python, Terraform, and CI/CD tools such as Jenkins, IBM Continuous Delivery, ArgoCD
-Monitoring/Observability: Hands on experience crafting alerts and dashboards using tools such as Instana, New Relic, Grafana/Prometheus
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Site Reliability Engineer, Geo Surfaces SRE
Posted 3 days ago
Job Viewed
Job Description
**Minimum qualifications:**
+ Bachelor's degree or equivalent practical experience.
+ 1 year of experience coding in one or more of the following: C, C++, Java, Go or Python.
**Preferred qualifications:**
+ Bachelor's degree in Computer Science or a related technical field.
+ Experience with algorithms, data structures, complexity analysis, and software design.
+ Interest in designing, analyzing and troubleshooting large-scale distributed systems.
+ Ability to debug and optimize code and automate routine tasks.
+ Ability to create an environment where everyone can succeed.
+ Excellent problem-solving approach, coupled with strong communication skills and a sense of ownership.
Geo Surfaces SRE is dedicated to ensuring Google Maps is reliable, performant, and efficient across iOS, Android, and the Web. We embrace the challenge of safeguarding end-to-end user experiences, from the moment a user opens Maps, through the interplay of client and server Google technologies. What you work on will impact over a billion users globally, shaping the reliability of their daily journeys. We delve into client-side monitoring, influence application architecture, and build automation to proactively ensure a seamless experience on one of Google's most iconic products. If you're excited by complex distributed systems, passionate about user experience, and eager to work on software at an immense scale, come join us!
**Responsibilities:**
+ Collaborate with other engineers to build reliable systems that meet customer needs.
+ Manage end-to-end availability and performance by measuring the entire system and developing automated solutions to improve it.
+ Involve the whole lifecycle of services, from inception and design, through deployment, operation and refinement.
+ Work together to deliver the team's Objectives and Key Results (OKRs) and promote reuse and best practices across teams when selecting from different design approaches.
+ Participate in a sustainable oncall incident response team and practice blameless postmortems.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also and If you have a need that requires accommodation, please let us know by completing our Accommodations for Applicants form:
Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia

Posted 10 days ago
Job Viewed
Job Description
Sydney, Australia
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge
Refer a friend
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge ( Description:**
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.
Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to being a diverse and inclusive workplace, attracting and developing exceptional talent, supporting our teammates' physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.
At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!
**Enterprise Cloud Platforms Team:**
Our team designs, builds, and maintains Public Cloud platforms for Bank of America's. We provide our customers an innovative platform with bult-in integrations that allow for a faster time-to-market with reduced complexity. We believe in a high-quality engineering culture, a customer focused mindset, and building for scale and resiliency. As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company.
We are seeking Site Reliability Engineers (SREs) to design, build, and maintain our next-gen platforms. The role provides opportunity to work with wide range of technologies and build a unique perspective that comes with integrating disparate services (both on-prem/off-prem) which must interact seamlessly with each other. You will work with colleagues that are fun, smart, hardworking, and driven. You will be part of a global team that is growing, giving you room to innovate and be creative.
**Position Summary**
+ Collaborates with a diverse set of engineers, architects, and teams to design, develop, test, and implement secure, robust, highly available and scalable solutions for BofA's External Cloud Platform
+ Collaborates other software engineers and teams to design and implement deployment approaches using highly scalable, automated, continuous integration and continuous delivery pipelines.
+ Responsible for all aspects of reliability, collaborates with technical experts, key stakeholders, and team members to resolve complex problems, owning the issue until you are sure it will not reoccur.
+ Deep understanding of SRE practices, service level indicators, and service level objectives; proactively utilize them to resolve issues before they impact customers.
+ Gather, analyze, synthesize, and develop visualizations and reporting from large, diverse data sets in service of continuous improvement of the platform.
+ Implement infrastructure, configuration, and network as code for the applications and platforms in your remit.
+ Identify opportunities to eliminate toil and automate the triage of issues to improve overall operational stability.
+ Collaborate with a global team to identify, analyze, and resolve platform vulnerabilities.
+ Proactively promotes the adoption of site reliability engineering best practices within the team and organization.
+ Participate in 24x7 on-call coverage follow the sun model and performs blameless Postmortems (RCAs) as needed.
**Required Skills:**
+ 7 years of combined experience in either SRE, software development, or infrastructure engineering (4 years with an advanced degree in Computer Science or related technical field).
+ 3+ years of hands-on experience building and maintaining cloud platforms on a major cloud service provider.
+ Strong experience in implementing, monitoring, and maintaining a highly scalable and resilient Data Services platform on major CSP's like AWS, Azure or GCP.
+ Strong experience with monitoring tools such as Grafana, Prometheus, Splunk, or Dynatrace, as well as cloud native tools like CloudWatch & CloudTrail, Azure Monitor and Log Analytics
+ Proficiency in implementing, monitoring, and maintaining a Databricks, RDS, or OpenAI platform.
+ Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net; 5+ years applied experience in Python/Java
+ Proficiency in implementing CI/CD pipelines with tools such as git and Jenkins, familiarity with using a GitOps model.
+ Advanced knowledge of networking (firewalls, DNS, Load Balancing, Proxies, etc.)
+ Advanced understanding of Linux & Windows operating systems including shell scripting
+ Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
+ Proven ability to work independently with minimal supervision and as part of a global team with direct responsibilities and an ability to juggle competing priorities and adapt to changes in project scope.
**Desired Skills**
+ Strong experience working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and PingIdentity, Okta, or other SSO solutions.
+ Proficiency in creating automation using Python, Terraform, or Ansible
+ Proficiency in implementing, monitoring, and maintaining a Databricks, CosmosDB, or OpenAI platform.
+ Experience in implementing, monitoring, and maintaining a highly scalable and resilient enterprise platform on Microsoft Azure using native services related to compute, storage, networking, security, and observability.
+ Experience with containerization technologies such as EC2, EKS, Fargate, Openshift, or Kubernetes.
+ Understanding of cost management, inventory management, FinOps model
Bank of America and its affiliates consider for employment and hire qualified candidates without regard to race, religious creed, religion, color, sex, sexual orientation, genetic information, gender, gender identity, gender expression, age, national origin, ancestry, citizenship, protected veteran or disability status or any factor prohibited by law, and as such affirms in policy and practice to support and promote the concept of equal employment opportunity, in accordance with all applicable federal, state, provincial and municipal laws. The company also prohibits discrimination on other bases such as medical condition, marital status or any other factor that is irrelevant to the performance of our teammates.
To view the "Know your Rights" poster, CLICK HERE ( .
View the LA County Fair Chance Ordinance ( .
Bank of America aims to create a workplace free from the dangers and resulting consequences of illegal and illicit drug use and alcohol abuse. Our Drug-Free Workplace and Alcohol Policy ("Policy") establishes requirements to prevent the presence or use of illegal or illicit drugs or unauthorized alcohol on Bank of America premises and to provide a safe work environment.
To view Bank of America's Drug-free Workplace and Alcohol Policy, CLICK HERE .
Bank of America is committed to an in-office culture with specific requirements for office-based attendance and which allows for an appropriate level of flexibility for our teammates and businesses based on role-specific considerations. Should you be offered a role with Bank of America, your hiring manager will provide you with information on the in-office expectations associated with your role. These expectations are subject to change at any time and at the sole discretion of the Company. To the extent you have a disability or sincerely held religious belief for which you believe you need a reasonable accommodation from this requirement, you must seek an accommodation through the Bank's required accommodation request process before your first day of work.
This communication provides information about certain Bank of America benefits. Receipt of this document does not automatically entitle you to benefits offered by Bank of America. Every effort has been made to ensure the accuracy of this communication. However, if there are discrepancies between this communication and the official plan documents, the plan documents will always govern. Bank of America retains the discretion to interpret the terms or language used in any of its communications according to the provisions contained in the plan documents. Bank of America also reserves the right to amend or terminate any benefit plan in its sole discretion at any time for any reason.
Senior Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia
Posted 2 days ago
Job Viewed
Job Description
Sydney, Australia
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge
Refer a friend
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge ( Description:**
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.
Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to being a diverse and inclusive workplace, attracting and developing exceptional talent, supporting our teammates' physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.
At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!
**Enterprise Cloud Platforms Team:**
Our team designs, builds, and maintains Public Cloud platforms for Bank of America's. We provide our customers an innovative platform with bult-in integrations that allow for a faster time-to-market with reduced complexity. We believe in a high-quality engineering culture, a customer focused mindset, and building for scale and resiliency. As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company.
We are seeking Senior Site Reliability Engineers (SREs) to design, build, and maintain our next-gen platforms. The role provides opportunity to work with wide range of technologies and build a unique perspective that comes with integrating disparate services (both on-prem/off-prem) which must interact seamlessly with each other. You will work with colleagues that are fun, smart, hardworking, and driven. You will be part of a global team that is growing, giving you room to innovate and be creative.
**Position Summary**
+ Collaborates with a diverse set of engineers, architects, and teams to design, develop, test, and implement secure, robust, highly available and scalable solutions for BofA's External Cloud Platform
+ Collaborates other software engineers and teams to design and implement deployment approaches using highly scalable, automated, continuous integration and continuous delivery pipelines.
+ Responsible for all aspects of reliability, collaborates with technical experts, key stakeholders, and team members to resolve complex problems, owning the issue until you are sure it will not reoccur.
+ Deep understanding of SRE practices, service level indicators, and service level objectives; proactively utilize them to resolve issues before they impact customers.
+ Gather, analyze, synthesize, and develop visualizations and reporting from large, diverse data sets in service of continuous improvement of the platform.
+ Implement infrastructure, configuration, and network as code for the applications and platforms in your remit.
+ Identify opportunities to eliminate toil and automate the triage of issues to improve overall operational stability.
+ Collaborate with a global team to identify, analyze, and resolve platform vulnerabilities.
+ Proactively promotes the adoption of site reliability engineering best practices within the team and organization.
+ Participate in 24x7 on-call coverage follow the sun model and performs blameless Postmortems (RCAs) as needed.
**Required Skills:**
+ 15 years of combined experience in either SRE, software development, or infrastructure engineering (10 years with an advanced degree in Computer Science or related technical field).
+ 7+ years of hands-on experience building and maintaining cloud platforms on a major cloud service provider.
+ Strong experience in implementing, monitoring, and maintaining a highly scalable and resilient Data Services platform on major CSP's like AWS, Azure or GCP.
+ Strong experience with monitoring tools such as Grafana, Prometheus, Splunk, or Dynatrace, as well as cloud native tools like CloudWatch & CloudTrail, Azure Monitor and Log Analytics
+ Proficiency in implementing, monitoring, and maintaining a Databricks, RDS, or OpenAI platform.
+ Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net; 5+ years applied experience in Python/Java
+ Proficiency in implementing CI/CD pipelines with tools such as git and Jenkins, familiarity with using a GitOps model.
+ Advanced knowledge of networking (firewalls, DNS, Load Balancing, Proxies, etc.)
+ Advanced understanding of Linux & Windows operating systems including shell scripting
+ Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
+ Proven ability to work independently with minimal supervision and as part of a global team with direct responsibilities and an ability to juggle competing priorities and adapt to changes in project scope.
**Desired Skills**
+ Strong experience working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and PingIdentity, Okta, or other SSO solutions.
+ Proficiency in creating automation using Python, Terraform, or Ansible
+ Proficiency in implementing, monitoring, and maintaining a Databricks, CosmosDB, or OpenAI platform.
+ Experience in implementing, monitoring, and maintaining a highly scalable and resilient enterprise platform on Microsoft Azure using native services related to compute, storage, networking, security, and observability.
+ Experience with containerization technologies such as EC2, EKS, Fargate, Openshift, or Kubernetes.
+ Understanding of cost management, inventory management, FinOps model
Bank of America and its affiliates consider for employment and hire qualified candidates without regard to race, religious creed, religion, color, sex, sexual orientation, genetic information, gender, gender identity, gender expression, age, national origin, ancestry, citizenship, protected veteran or disability status or any factor prohibited by law, and as such affirms in policy and practice to support and promote the concept of equal employment opportunity, in accordance with all applicable federal, state, provincial and municipal laws. The company also prohibits discrimination on other bases such as medical condition, marital status or any other factor that is irrelevant to the performance of our teammates.
To view the "Know your Rights" poster, CLICK HERE ( .
View the LA County Fair Chance Ordinance ( .
Bank of America aims to create a workplace free from the dangers and resulting consequences of illegal and illicit drug use and alcohol abuse. Our Drug-Free Workplace and Alcohol Policy ("Policy") establishes requirements to prevent the presence or use of illegal or illicit drugs or unauthorized alcohol on Bank of America premises and to provide a safe work environment.
To view Bank of America's Drug-free Workplace and Alcohol Policy, CLICK HERE .
Bank of America is committed to an in-office culture with specific requirements for office-based attendance and which allows for an appropriate level of flexibility for our teammates and businesses based on role-specific considerations. Should you be offered a role with Bank of America, your hiring manager will provide you with information on the in-office expectations associated with your role. These expectations are subject to change at any time and at the sole discretion of the Company. To the extent you have a disability or sincerely held religious belief for which you believe you need a reasonable accommodation from this requirement, you must seek an accommodation through the Bank's required accommodation request process before your first day of work.
This communication provides information about certain Bank of America benefits. Receipt of this document does not automatically entitle you to benefits offered by Bank of America. Every effort has been made to ensure the accuracy of this communication. However, if there are discrepancies between this communication and the official plan documents, the plan documents will always govern. Bank of America retains the discretion to interpret the terms or language used in any of its communications according to the provisions contained in the plan documents. Bank of America also reserves the right to amend or terminate any benefit plan in its sole discretion at any time for any reason.
Be The First To Know
About the latest Reliability engineer Jobs in Australia !
Senior Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia

Posted 10 days ago
Job Viewed
Job Description
Sydney, Australia
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge
Refer a friend
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge ( Description:**
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.
Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to being a diverse and inclusive workplace, attracting and developing exceptional talent, supporting our teammates' physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.
At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!
**Enterprise Cloud Platforms Team:**
Our team designs, builds, and maintains Public Cloud platforms for Bank of America's. We provide our customers an innovative platform with bult-in integrations that allow for a faster time-to-market with reduced complexity. We believe in a high-quality engineering culture, a customer focused mindset, and building for scale and resiliency. As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company.
We are seeking Senior Site Reliability Engineers (SREs) to design, build, and maintain our next-gen platforms. The role provides opportunity to work with wide range of technologies and build a unique perspective that comes with integrating disparate services (both on-prem/off-prem) which must interact seamlessly with each other. You will work with colleagues that are fun, smart, hardworking, and driven. You will be part of a global team that is growing, giving you room to innovate and be creative.
**Position Summary**
+ Collaborates with a diverse set of engineers, architects, and teams to design, develop, test, and implement secure, robust, highly available and scalable solutions for BofA's External Cloud Platform
+ Collaborates other software engineers and teams to design and implement deployment approaches using highly scalable, automated, continuous integration and continuous delivery pipelines.
+ Responsible for all aspects of reliability, collaborates with technical experts, key stakeholders, and team members to resolve complex problems, owning the issue until you are sure it will not reoccur.
+ Deep understanding of SRE practices, service level indicators, and service level objectives; proactively utilize them to resolve issues before they impact customers.
+ Gather, analyze, synthesize, and develop visualizations and reporting from large, diverse data sets in service of continuous improvement of the platform.
+ Implement infrastructure, configuration, and network as code for the applications and platforms in your remit.
+ Identify opportunities to eliminate toil and automate the triage of issues to improve overall operational stability.
+ Collaborate with a global team to identify, analyze, and resolve platform vulnerabilities.
+ Proactively promotes the adoption of site reliability engineering best practices within the team and organization.
+ Participate in 24x7 on-call coverage follow the sun model and performs blameless Postmortems (RCAs) as needed.
**Required Skills:**
+ 15 years of combined experience in either SRE, software development, or infrastructure engineering (10 years with an advanced degree in Computer Science or related technical field).
+ 7+ years of hands-on experience building and maintaining cloud platforms on a major cloud service provider.
+ Strong experience in implementing, monitoring, and maintaining a highly scalable and resilient Data Services platform on major CSP's like AWS, Azure or GCP.
+ Strong experience with monitoring tools such as Grafana, Prometheus, Splunk, or Dynatrace, as well as cloud native tools like CloudWatch & CloudTrail, Azure Monitor and Log Analytics
+ Proficiency in implementing, monitoring, and maintaining a Databricks, RDS, or OpenAI platform.
+ Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net; 5+ years applied experience in Python/Java
+ Proficiency in implementing CI/CD pipelines with tools such as git and Jenkins, familiarity with using a GitOps model.
+ Advanced knowledge of networking (firewalls, DNS, Load Balancing, Proxies, etc.)
+ Advanced understanding of Linux & Windows operating systems including shell scripting
+ Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
+ Proven ability to work independently with minimal supervision and as part of a global team with direct responsibilities and an ability to juggle competing priorities and adapt to changes in project scope.
**Desired Skills**
+ Strong experience working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and PingIdentity, Okta, or other SSO solutions.
+ Proficiency in creating automation using Python, Terraform, or Ansible
+ Proficiency in implementing, monitoring, and maintaining a Databricks, CosmosDB, or OpenAI platform.
+ Experience in implementing, monitoring, and maintaining a highly scalable and resilient enterprise platform on Microsoft Azure using native services related to compute, storage, networking, security, and observability.
+ Experience with containerization technologies such as EC2, EKS, Fargate, Openshift, or Kubernetes.
+ Understanding of cost management, inventory management, FinOps model
Bank of America and its affiliates consider for employment and hire qualified candidates without regard to race, religious creed, religion, color, sex, sexual orientation, genetic information, gender, gender identity, gender expression, age, national origin, ancestry, citizenship, protected veteran or disability status or any factor prohibited by law, and as such affirms in policy and practice to support and promote the concept of equal employment opportunity, in accordance with all applicable federal, state, provincial and municipal laws. The company also prohibits discrimination on other bases such as medical condition, marital status or any other factor that is irrelevant to the performance of our teammates.
To view the "Know your Rights" poster, CLICK HERE ( .
View the LA County Fair Chance Ordinance ( .
Bank of America aims to create a workplace free from the dangers and resulting consequences of illegal and illicit drug use and alcohol abuse. Our Drug-Free Workplace and Alcohol Policy ("Policy") establishes requirements to prevent the presence or use of illegal or illicit drugs or unauthorized alcohol on Bank of America premises and to provide a safe work environment.
To view Bank of America's Drug-free Workplace and Alcohol Policy, CLICK HERE .
Bank of America is committed to an in-office culture with specific requirements for office-based attendance and which allows for an appropriate level of flexibility for our teammates and businesses based on role-specific considerations. Should you be offered a role with Bank of America, your hiring manager will provide you with information on the in-office expectations associated with your role. These expectations are subject to change at any time and at the sole discretion of the Company. To the extent you have a disability or sincerely held religious belief for which you believe you need a reasonable accommodation from this requirement, you must seek an accommodation through the Bank's required accommodation request process before your first day of work.
This communication provides information about certain Bank of America benefits. Receipt of this document does not automatically entitle you to benefits offered by Bank of America. Every effort has been made to ensure the accuracy of this communication. However, if there are discrepancies between this communication and the official plan documents, the plan documents will always govern. Bank of America retains the discretion to interpret the terms or language used in any of its communications according to the provisions contained in the plan documents. Bank of America also reserves the right to amend or terminate any benefit plan in its sole discretion at any time for any reason.