11 Reliability Engineer jobs in Australia
Reliability Engineer - Mechancial

Posted 1 day ago
Job Viewed
Job Description
We work together to transform essential resources into critical ingredients for mobility, energy, connectivity and health. Join our values-led organization committed to building a more resilient world with people and planet in mind. Our core values ( are the foundation that make us successful for ourselves, our customers and the planet.
**Job Description**
Albemarle is hiring for a Mechanical Reliability Engineer. This position is located on site at the Kemerton Processing Plant.
**What You Will Do**
+ Primary lead in the development/review/improvement of maintenance and spares strategies that work to drive the safe reliable operation of production equipment to help deliver required plant performance while helping to reduce the overall costs per tonne.
+ Participates in maintenance planning and with plant knowledge help predict major maintenance requirements, developing KPIs and reporting on them.
+ Assist in the management and implementation of the CMMS architecture in line with site standards and in light of best practice.
+ Provide mechanical engineering support of refinery static and rotating equipment in the form of equipment/material selection, design calculations, fitness for service reviews, repair methods, mechanical design approval, and equipment specification with the aid of and adherence to relevant codes and standards including OSHA 1984, AS 3788, API 579, ASME PCC-2, etc.
+ Maintaining and fostering collaborative effort with all departments that will work to ensure success in the role.
+ Ensure relevant equipment is operated and maintained according to the applicable standards and statutory compliance is adhered to.
+ Lead activities/projects that focus on reliability and cost improvement of poor performing assets.
+ Assist with project implementation to ensure the reliability and maintainability of new and modified installations is in line with plant needs.
+ Participate in the development of design and installation specifications.
+ Support, utilise, and follow the site management of change process when required, to ensure changes are adequately risk assessed and required updates/changes are correctly implemented.
+ Provide input to Risk Management processes, including management of change, which helps to anticipate reliability-related and non-reliability-related risks that could adversely impact plant operation.
**What You Bring**
**Required:**
+ Tertiary level qualification in Mechanical Engineering or other relevant discipline coupled with a minimum of five years of post-graduate reliability and/or maintenance engineering experience.
+ Working understanding of asset management practices and experience in the development effective maintenance strategies.
+ Excellent problem solving, investigative, and data analysis skills.
+ A sound understanding of available NDT and condition monitoring tools and equipment, and how they can be utilised to support reliability improvement and plant management.
+ Experience in the development effective maintenance strategies.
**Preferred:**
+ Post graduate study in best practice RCM techniques and relevant equipment
**Benefits of Joining Albemarle**
+ Competitive compensation
+ Comprehensive benefits package
+ A diverse array of resources to support you professionally and personally.
We are partners to one another in pioneering new ways to be better for ourselves, our teams, and our communities. When you join Albemarle, you become our most essential element and you can anticipate competitive compensation, a comprehensive benefits package, and resources that foster your well-being and fuel your personal growth. Help us shape the future, build with purpose and grow together.
Associate Site Reliability Engineer

Posted 1 day ago
Job Viewed
Job Description
As an Associate Site Reliability Engineer, you will support the reliability, scalability, and performance of our applications in SEAu. This entry-level role is ideal for candidates with foundational experience in software engineering and/or system operations who are eager to grow in a high-impact, collaborative DevSecOps environment.
**What you'll be doing**
Key Responsibilities:
+ Assist in monitoring and maintaining production systems and services.
+ Support incident response efforts and contribute to root cause analysis.
+ Participate in automating deployment, monitoring, and operational tasks.
+ Collaborate with development and QA teams to support new feature rollouts.
+ Contribute to documentation of operational procedures and runbooks.
+ Learn and apply best practices in system reliability, observability, and performance tuning.
**What you bring**
To succeed in the role, you will have:
+ Relevant years of experience in software engineering, DevOps, or IT operations (internships or academic projects acceptable).
+ Familiarity with basic shell scripting.
+ Exposure to cloud platforms (e.g., Azure, AWS)
+ Basic knowledge of programming languages such as Python, Go, or Java.
+ Understanding of CI/CD pipelines and version control systems (e.g., Git).
+ Strong problem-solving and communication skills.
+ Bachelor's degree in Computer Science, Engineering, or a related field.
Desirable:
+ Exposure to monitoring tools (e.g., Dynatrace, Elastic, Grafana).
+ Understanding of networking fundamentals and distributed systems.
+ Interest in automation, self-healing, infrastructure as code, and SRE principles.
**What we offer**
You bring your skills and experience to Shell and in return you work with talented, committed people on one of the most important challenges facing our planet. You'll have the opportunity to develop the skills you need to grow in an environment where we value honesty, integrity, and respect for one another. You'll be able to balance your priorities as you become the best version of yourself.
+ Progress as a person as we work on the energy transition together.
+ Continuously grow the transferable skills you need to get ahead.
+ Work at the forefront of technology, trends, and practices.
+ Collaborate with experienced colleagues with unique expertise.
+ Achieve your balance in a values-led culture that encourages you to be the best version of yourself.
+ Benefit from flexible working hours, and the possibility of remote/mobile working.
+ Perform at your best with a competitive starting salary and annual performance-related salary increase - our pay and benefits packages are considered to be among the best in the world.
+ Take advantage of paid parental leave, including for non-birthing parents.
+ Join an organisation working to become one of the most diverse and inclusive in the world. We strongly encourage applicants of all genders, ages, ethnicities, cultures, abilities, sexual orientation, and life experiences to apply.
+ Grow as you progress through diverse career opportunities in national and
+ international teams.
+ Gain access to a wide range of training and development programmes.
Note: We are keen to support flexible working arrangements, subject to local regulations and legislative frameworks. If this is of interest to you, please describe in your application the type of flexible working arrangements for which you would like to be considered (e.g., part-time, job share).
We'd like you to know that Shell has a bold goal: to become one of the world's most diverse and inclusive companies. You can get to know more about how we're working towards that goal, click here ( .
We are committed to attracting a broader and more diverse pool of candidates. If this position doesn't feel like the perfect fit for your qualifications right now, we'd still love to hear from you. Consider creating a profile in our Talent Community ( so we can keep you in mind for future opportunities that may align with your skills.
**Shell in Australia**
Shell has operated in Australia since 1901. From operating Australia's first oil refinery, which was central to meeting Australia's fuel needs, to fuelling the first Qantas commercial flight in the 1920s, to playing a foundation role in building some of Australia's largest and most innovative natural resource developments.
Throughout this 124-year relationship the needs of our customers and the nation have changed and we have continued transforming our portfolio to meet these needs. Today, we are a leading natural gas producer and are playing our part in the transition to a low-carbon future ( by investing in the power sector, renewable energy sources and carbon abatement activities.
Shell has a significant Liquefied Natural Gas (LNG) business in Australia that makes a valuable contribution to today's energy supply. This integrated gas portfolio includes our two Shell-operated gas production and liquefaction businesses, Shell QGC ( in Queensland and Prelude Floating LNG ( offshore in Western Australia, and our joint venture interests in Gorgon and North West Shelf in Western Australia and Arrow Energy in Queensland.
Today, Shell's portfolio in Australia also includes zero- and low-carbon energy businesses such as commercial and industrial retailer, Shell Energy carbon farming specialist, Select Carbon the 120MW Gangarri solar development; residential energy retailer, Powershop Australia a 49% stake in WestWind Australia a 50% share of Kondinin Energy and several grid-scale Battery Energy Storage Solutions projects. High quality Shell branded fuels and lubricants are available right across Australia, through an exclusive brand license arrangement with Viva Energy. (
Site Reliability Engineer - SPP

Posted 1 day ago
Job Viewed
Job Description
**Do you**
+ know Linux in various levels of diagnostics and troubleshooting?
+ write code to automate repetitive tasks every time you face repetitive work?
+ smile when you solve an issue in Frankfurt from your laptop in Sydney?
Answer 'yes' to these questions and we would like to hear from you. Go ahead, hit the Apply button and let's have a chat about your skills and experiences.
**Want to know more about us?**
Now that we have set the pace, keep reading if you want to understand more about the role and the SRE team. We hope it will be helpful.
**Let's start with the role**
**As a Site Reliability Engineer, you will**
+ Provide relief and sustainable resolution to issues within our infrastructure.
+ Use your experience in software development, systems engineering and networking to proactively prevent repeatable issues.
+ Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design.
+ Drive a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.
**_Note:_** _This is a full-time position with a four-day workweek. Working hours are from 11:00 PM to 9:00 AM. Weekend shifts are fixed and will be discussed in detail during the interview process._
**This is what we require. Take note because they are a must-have** :
+ Knowledge of Linux systems.
+ Coding experience, we normally prefer Python or JavaScript.
+ Networking skills, IP addressing, routing protocols.
+ Monitoring of systems, applications and networks.
+ Uncompromising attention to detail.
**We also have pluses!**
These are not a 'must', but please highlight them on your resume if you have:
+ Experience in cloud architecture or web applications engineering.
+ Experience in databases performance, replication, high availability.
+ A bachelor's or master's degree in a technical area.
**_Note: Australian Citizenship and the capability to obtain a baseline security clearance is a requirement for this role_** _._
**Now a bit about the SRE team**
The SRE team is a group of highly technical engineers who are tasked with maintaining and developing the reliability, scalability and performance of the ServiceNow infrastructure. The SRE is empowered to drive technical resolutions across the technology stack from hardware through to application and all stops in between. They are also tasked with driving forward the operability of the platform to drive down the number of incidents and to reduce MTTR.
To accomplish this the team combines software development, networking and systems engineering expertise with a strong desire to be challenged by problems of scale and complexity and to make services better for our customers.
**Work Personas**
We approach our distributed world of work with flexibility and trust. Work personas (flexible, remote, or required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work and their assigned work location. Learn more here ( . To determine eligibility for a work persona, ServiceNow may confirm the distance between your primary residence and the closest ServiceNow office using a third-party service.
**Equal Opportunity Employer**
ServiceNow is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, national origin or nationality, ancestry, age, disability, gender identity or expression, marital status, veteran status, or any other category protected by law. In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements.
**Accommodations**
We strive to create an accessible and inclusive experience for all candidates. If you require a reasonable accommodation to complete any part of the application process, or are unable to use this online application and need an alternative method to apply, please contact for assistance.
**Export Control Regulations**
For positions requiring access to controlled technology subject to export control regulations, including the U.S. Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals. All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities.
From Fortune. ©2025 Fortune Media IP Limited. All rights reserved. Used under license.
Site Reliability Engineer, Spanner

Posted 1 day ago
Job Viewed
Job Description
Minimum qualifications:
+ Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
+ 1 year of experience in coding in one or more of the following programming languages: C, C++, Java, Python, Go.
+ Experience in optimizing code for stability, functionality and scalability (e.g., crawling, search, troubleshooting).
Preferred qualifications:
+ 1 year of experience in coding in one or more of the following programming languages: C, C++, Java, Python, Go.
+ Experience in one or more of the following: C++, TyperScript, and Go
+ Experience in analyzing and troubleshooting large-scale distributed systems.
+ Ability to manage periodic on-call duty as well as out-of-band requests.
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance.
Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
+ Manage Spanner SRE and deliver critical projects.
+ Oversee Spanner customers help themselves with debugging and mitigation.
+ Expand Spanner to serve customers in new ways under new conditions and restrictions.
+ Improve the overall Spanner observability.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also and If you have a need that requires accommodation, please let us know by completing our Accommodations for Applicants form:
Senior Site Reliability Engineer

Posted 1 day ago
Job Viewed
Job Description
25WD88723
**Position Overview**
Do you want the opportunity to be part of a startup environment working on a new product seeking to become a world-leading integration platform? Are you looking to be at the forefront of innovative new technology that will ultimately help people imagine, design, and make a better world? If so, come join the Tandem Connect team at Autodesk! Working with the Tandem team, our mission is to create integration technology and solutions that will transform how buildings are designed, built, and operated.
We are seeking a creative Senior Site Reliability Engineer who has experience building and maintaining scalable, reliable and modern cloud services to join our team today.
**Responsibilities**
+ Maintain a secure, scalable and resilient platform that our customers can trust. This includes the implementation of Autodesk and industry best practices and standards
+ Manage and optimise the security, performance, reliability, and scalability of Kubernetes clusters on Amazon EKS
+ Administer and troubleshoot MongoDB Atlas, AWS MemoryDB (Redis), RabbitMQ on Amazon MQ, and Kafka on Amazon MSK.
+ Design, implement and maintain effective monitoring of the platform and associated components
+ Support other teams with the implementation of their infrastructure requirements
+ Contribute to the design and implement resilient and scalable architectures, including high availability and disaster recovery strategies
+ Provision and manage infrastructure using Terraform, ensuring meticulous configuration management and documentation
+ Set up and maintain monitoring and logging systems, such as Prometheus, Dynatrace, Amazon Cloudwatch and other tools
+ Collaborate with cross-functional teams to resolve complex issues and mentor junior engineers
+ Share your knowledge and learnings with the infrastructure guild
+ Partner closely with the product development, architecture teams and other stakeholders to identify and implement improvements to the product infrastructure and operations
+ Contribute to improvements in processes, tools, and technical methodologies that increase the effectiveness and efficiency of the team in responding to customer and business needs, with an emphasis on having an efficient CI/CD process
+ Provide technical guidance and constructive feedback to team members and stakeholders, which includes writing, reading, and reviewing plans, designs and scripts, and participating in the various technical feedback loops happening within the organisation
+ Contribute to technical product roadmaps
+ On Call support as part of a rostered escalation process
**Minimum Qualifications**
+ BS or MS in computer science, related technology field, or equivalent experience
+ You have at least 7 years of hands-on experience with operating and managing virtual software (with the majority managing containerised workloads) and high traffic customer-facing enterprise solutions in production environments
+ Expertise in defining and managing Kubernetes-based workloads that scale
+ Ability to configure and customize Linux-based operating environments based on application needs
+ Strong understanding of TCP/IP and virtual networking technologies, including Kubernetes Network Policies and AWS Cloudfront
+ Ability to perform automated testing using Cypress
+ Experience with performing live database upgrades
+ Adept at writing and managing Helm and Terraform scripts using GitOps principles
+ Knowledge in integrating password management systems with Infrastructure as Code
+ Proficient in using bash and Python to integrate with network services
+ Extensive experience with creating customized Docker images
+ Extensive experience with DevOps and DevSecOps-based SDLC practices
+ Good understanding of security principles at the network, server, and container levels
+ In-depth understanding of the software development lifecycle (SDLC)
+ Working experience with MongoDB, Redis, Kafka, RabbitMQ, Vault, Consul and equivalent AWS services, including live data migration with minimal downtime
+ Experience with CI/CD and building deployment pipelines using Jenkins and Rundeck.
+ Experience with running load tests and benchmarking tools
+ Strong written and oral communication skills in English
+ Ability to operate effectively and independently in a dynamic, fluid environment
+ Detail-oriented approach to building secure, stable, software
+ Experience with Agile development practices such as Scrum or Kanban
**Preferred Qualifications**
+ Amazon Web Services (AWS) experience.
+ Experience with integration-Platform-as-a-Service (iPaaS) offerings.
+ Ability to read and write in Node.js
+ Experienced with supporting Kubernetes-based MQTT Brokers using the Aedes MQTT software
#LI-CL1
**Learn More**
**About Autodesk**
Welcome to Autodesk! Amazing things are created every day with our software - from the greenest buildings and cleanest cars to the smartest factories and biggest hit movies. We help innovators turn their ideas into reality, transforming not only how things are made, but what can be made.
We take great pride in our culture here at Autodesk - our Culture Code is at the core of everything we do. Our values and ways of working help our people thrive and realize their potential, which leads to even better outcomes for our customers.
When you're an Autodesker, you can be your whole, authentic self and do meaningful work that helps build a better future for all. Ready to shape the world and your future? Join us!
**Salary transparency**
Salary is one part of Autodesk's competitive compensation package. Offers are based on the candidate's experience and geographic location. In addition to base salaries, we also have a significant emphasis on discretionary annual cash bonuses, commissions for sales roles, stock or long-term incentive cash grants, and a comprehensive benefits package.
**Diversity & Belonging**
We take pride in cultivating a culture of belonging and an equitable workplace where everyone can thrive. Learn more here: you an existing contractor or consultant with Autodesk?**
Please search for open jobs and apply internally (not on this external site).
Site Reliability Engineer - SPP
Posted today
Job Viewed
Job Description
== ServiceNow ==
Role Seniority - mid level
More about the Site Reliability Engineer - SPP role at ServiceNow
Company Description
It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone.
Job Description
Do you
know Linux in various levels of diagnostics and troubleshooting?
write code to automate repetitive tasks every time you face repetitive work?
smile when you solve an issue in Frankfurt from your laptop in Sydney?
Answer 'yes' to these questions and we would like to hear from you. Go ahead, hit the Apply button and let's have a chat about your skills and experiences.
Want to know more about us?
Now that we have set the pace, keep reading if you want to understand more about the role and the SRE team. We hope it will be helpful.
Let’s start with the role
As a Site Reliability Engineer, you will
Provide relief and sustainable resolution to issues within our infrastructure.
Use your experience in software development, systems engineering and networking to proactively prevent repeatable issues.
Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design.
Drive a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.
Note: This is a full-time position with a four-day workweek. Working hours are from 11:00 PM to 9:00 AM. Weekend shifts are fixed and will be discussed in detail during the interview process.
Qualifications
This is what we require. Take note because they are a must-have :
Knowledge of Linux systems.
Coding experience, we normally prefer Python or JavaScript.
Networking skills, IP addressing, routing protocols.
Monitoring of systems, applications and networks.
Uncompromising attention to detail.
We also have pluses!
These are not a 'must', but please highlight them on your resume if you have:
Experience in cloud architecture or web applications engineering.
Experience in databases performance, replication, high availability.
A bachelor's or master's degree in a technical area.
Note: Australian Citizenship and the capability to obtain a baseline security clearance is a requirement for this role .
Now a bit about the SRE team
The SRE team is a group of highly technical engineers who are tasked with maintaining and developing the reliability, scalability and performance of the ServiceNow infrastructure. The SRE is empowered to drive technical resolutions across the technology stack from hardware through to application and all stops in between. They are also tasked with driving forward the operability of the platform to drive down the number of incidents and to reduce MTTR.
To accomplish this the team combines software development, networking and systems engineering expertise with a strong desire to be challenged by problems of scale and complexity and to make services better for our customers.
Additional Information
Work Personas
We approach our distributed world of work with flexibility and trust. Work personas (flexible, remote, or required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work and their assigned work location. Learn more here. To determine eligibility for a work persona, ServiceNow may confirm the distance between your primary residence and the closest ServiceNow office using a third-party service.
Equal Opportunity Employer
ServiceNow is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, national origin or nationality, ancestry, age, disability, gender identity or expression, marital status, veteran status, or any other category protected by law. In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements.
Accommodations
We strive to create an accessible and inclusive experience for all candidates. If you require a reasonable accommodation to complete any part of the application process, or are unable to use this online application and need an alternative method to apply, please contact for assistance.
Export Control Regulations
For positions requiring access to controlled technology subject to export control regulations, including the U.S. Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals. All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities.
From Fortune. ©2025 Fortune Media IP Limited. All rights reserved. Used under license.
Before we jump into the responsibilities of the role. No matter what you come in knowing, you’ll be learning new things all the time and the ServiceNow team will be there to support your growth.
Please consider applying even if you don't meet 100% of what’s outlined
Key Responsibilities
- Providing sustainable resolutions
- Proactively preventing issues
- Driving initiatives
Key Strengths
- Linux systems
- Coding experience
- Networking skills
- ️ Cloud architecture
- Database performance
- Technical degree
Why ServiceNow is partnering with Hatch on this role. Hatch exists to level the playing field for people as they discover a career that’s right for them. So when you apply you have the chance to show more than just your resume.
A Final Note: This is a role with ServiceNow not with Hatch.
Site Reliability Engineer, Google Play

Posted 1 day ago
Job Viewed
Job Description
Minimum qualifications:
+ Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
+ 5 years of experience in Unix/Linux systems, Internet Protocol networking, performance and application issues.
+ 5 years of experience programming in one or more of the following languages: C, C++, Java, Python, Go, Perl, or Ruby.
+ 5 years of experience in distributed systems or infrastructure designing.
+ 5 years of experience in troubleshooting and debugging distributed systems.
Preferred qualifications:
+ Excellent communication skills.
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance.
Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.
SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
To learn more: check out our books on Site Reliability Engineering ( or read a career profile ( about why a Software Engineer chose to join SRE.
Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.
+ Own availability and performance for some of Google Play's key products, and be responsible for ensuring an excellent user experience for global users while supporting change.
+ Oversee production support for Google Play games related services.
+ Design solutions to make the Google Play games related services more resistent to failure.
+ Grow our support to handle the new and evolving product features.
+ Provide tools/training/consultation to development teams taking on new production responsibilities.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also and If you have a need that requires accommodation, please let us know by completing our Accommodations for Applicants form:
Be The First To Know
About the latest Reliability engineer Jobs in Australia !
Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia

Posted 1 day ago
Job Viewed
Job Description
Sydney, Australia
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge
Refer a friend
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge ( Description:**
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.
Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to being a diverse and inclusive workplace, attracting and developing exceptional talent, supporting our teammates' physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.
At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!
**Enterprise Cloud Platforms Team:**
Our team designs, builds, and maintains Public Cloud platforms for Bank of America's. We provide our customers an innovative platform with bult-in integrations that allow for a faster time-to-market with reduced complexity. We believe in a high-quality engineering culture, a customer focused mindset, and building for scale and resiliency. As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company.
We are seeking Site Reliability Engineers (SREs) to design, build, and maintain our next-gen platforms. The role provides opportunity to work with wide range of technologies and build a unique perspective that comes with integrating disparate services (both on-prem/off-prem) which must interact seamlessly with each other. You will work with colleagues that are fun, smart, hardworking, and driven. You will be part of a global team that is growing, giving you room to innovate and be creative.
**Position Summary**
+ Collaborates with a diverse set of engineers, architects, and teams to design, develop, test, and implement secure, robust, highly available and scalable solutions for BofA's External Cloud Platform
+ Collaborates other software engineers and teams to design and implement deployment approaches using highly scalable, automated, continuous integration and continuous delivery pipelines.
+ Responsible for all aspects of reliability, collaborates with technical experts, key stakeholders, and team members to resolve complex problems, owning the issue until you are sure it will not reoccur.
+ Deep understanding of SRE practices, service level indicators, and service level objectives; proactively utilize them to resolve issues before they impact customers.
+ Gather, analyze, synthesize, and develop visualizations and reporting from large, diverse data sets in service of continuous improvement of the platform.
+ Implement infrastructure, configuration, and network as code for the applications and platforms in your remit.
+ Identify opportunities to eliminate toil and automate the triage of issues to improve overall operational stability.
+ Collaborate with a global team to identify, analyze, and resolve platform vulnerabilities.
+ Proactively promotes the adoption of site reliability engineering best practices within the team and organization.
+ Participate in 24x7 on-call coverage follow the sun model and performs blameless Postmortems (RCAs) as needed.
**Required Skills:**
+ 7 years of combined experience in either SRE, software development, or infrastructure engineering (4 years with an advanced degree in Computer Science or related technical field).
+ 3+ years of hands-on experience building and maintaining cloud platforms on a major cloud service provider.
+ Strong experience in implementing, monitoring, and maintaining a highly scalable and resilient Data Services platform on major CSP's like AWS, Azure or GCP.
+ Strong experience with monitoring tools such as Grafana, Prometheus, Splunk, or Dynatrace, as well as cloud native tools like CloudWatch & CloudTrail, Azure Monitor and Log Analytics
+ Proficiency in implementing, monitoring, and maintaining a Databricks, RDS, or OpenAI platform.
+ Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net; 5+ years applied experience in Python/Java
+ Proficiency in implementing CI/CD pipelines with tools such as git and Jenkins, familiarity with using a GitOps model.
+ Advanced knowledge of networking (firewalls, DNS, Load Balancing, Proxies, etc.)
+ Advanced understanding of Linux & Windows operating systems including shell scripting
+ Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
+ Proven ability to work independently with minimal supervision and as part of a global team with direct responsibilities and an ability to juggle competing priorities and adapt to changes in project scope.
**Desired Skills**
+ Strong experience working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and PingIdentity, Okta, or other SSO solutions.
+ Proficiency in creating automation using Python, Terraform, or Ansible
+ Proficiency in implementing, monitoring, and maintaining a Databricks, CosmosDB, or OpenAI platform.
+ Experience in implementing, monitoring, and maintaining a highly scalable and resilient enterprise platform on Microsoft Azure using native services related to compute, storage, networking, security, and observability.
+ Experience with containerization technologies such as EC2, EKS, Fargate, Openshift, or Kubernetes.
+ Understanding of cost management, inventory management, FinOps model
Bank of America and its affiliates consider for employment and hire qualified candidates without regard to race, religious creed, religion, color, sex, sexual orientation, genetic information, gender, gender identity, gender expression, age, national origin, ancestry, citizenship, protected veteran or disability status or any factor prohibited by law, and as such affirms in policy and practice to support and promote the concept of equal employment opportunity, in accordance with all applicable federal, state, provincial and municipal laws. The company also prohibits discrimination on other bases such as medical condition, marital status or any other factor that is irrelevant to the performance of our teammates.
To view the "Know your Rights" poster, CLICK HERE ( .
View the LA County Fair Chance Ordinance ( .
Bank of America aims to create a workplace free from the dangers and resulting consequences of illegal and illicit drug use and alcohol abuse. Our Drug-Free Workplace and Alcohol Policy ("Policy") establishes requirements to prevent the presence or use of illegal or illicit drugs or unauthorized alcohol on Bank of America premises and to provide a safe work environment.
To view Bank of America's Drug-free Workplace and Alcohol Policy, CLICK HERE .
Bank of America is committed to an in-office culture with specific requirements for office-based attendance and which allows for an appropriate level of flexibility for our teammates and businesses based on role-specific considerations. Should you be offered a role with Bank of America, your hiring manager will provide you with information on the in-office expectations associated with your role. These expectations are subject to change at any time and at the sole discretion of the Company. To the extent you have a disability or sincerely held religious belief for which you believe you need a reasonable accommodation from this requirement, you must seek an accommodation through the Bank's required accommodation request process before your first day of work.
This communication provides information about certain Bank of America benefits. Receipt of this document does not automatically entitle you to benefits offered by Bank of America. Every effort has been made to ensure the accuracy of this communication. However, if there are discrepancies between this communication and the official plan documents, the plan documents will always govern. Bank of America retains the discretion to interpret the terms or language used in any of its communications according to the provisions contained in the plan documents. Bank of America also reserves the right to amend or terminate any benefit plan in its sole discretion at any time for any reason.
Senior Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia

Posted 1 day ago
Job Viewed
Job Description
Sydney, Australia
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge
Refer a friend
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge ( Description:**
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.
Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to being a diverse and inclusive workplace, attracting and developing exceptional talent, supporting our teammates' physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.
At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!
**Enterprise Cloud Platforms Team:**
Our team designs, builds, and maintains Public Cloud platforms for Bank of America's. We provide our customers an innovative platform with bult-in integrations that allow for a faster time-to-market with reduced complexity. We believe in a high-quality engineering culture, a customer focused mindset, and building for scale and resiliency. As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company.
We are seeking Senior Site Reliability Engineers (SREs) to design, build, and maintain our next-gen platforms. The role provides opportunity to work with wide range of technologies and build a unique perspective that comes with integrating disparate services (both on-prem/off-prem) which must interact seamlessly with each other. You will work with colleagues that are fun, smart, hardworking, and driven. You will be part of a global team that is growing, giving you room to innovate and be creative.
**Position Summary**
+ Collaborates with a diverse set of engineers, architects, and teams to design, develop, test, and implement secure, robust, highly available and scalable solutions for BofA's External Cloud Platform
+ Collaborates other software engineers and teams to design and implement deployment approaches using highly scalable, automated, continuous integration and continuous delivery pipelines.
+ Responsible for all aspects of reliability, collaborates with technical experts, key stakeholders, and team members to resolve complex problems, owning the issue until you are sure it will not reoccur.
+ Deep understanding of SRE practices, service level indicators, and service level objectives; proactively utilize them to resolve issues before they impact customers.
+ Gather, analyze, synthesize, and develop visualizations and reporting from large, diverse data sets in service of continuous improvement of the platform.
+ Implement infrastructure, configuration, and network as code for the applications and platforms in your remit.
+ Identify opportunities to eliminate toil and automate the triage of issues to improve overall operational stability.
+ Collaborate with a global team to identify, analyze, and resolve platform vulnerabilities.
+ Proactively promotes the adoption of site reliability engineering best practices within the team and organization.
+ Participate in 24x7 on-call coverage follow the sun model and performs blameless Postmortems (RCAs) as needed.
**Required Skills:**
+ 15 years of combined experience in either SRE, software development, or infrastructure engineering (10 years with an advanced degree in Computer Science or related technical field).
+ 7+ years of hands-on experience building and maintaining cloud platforms on a major cloud service provider.
+ Strong experience in implementing, monitoring, and maintaining a highly scalable and resilient Data Services platform on major CSP's like AWS, Azure or GCP.
+ Strong experience with monitoring tools such as Grafana, Prometheus, Splunk, or Dynatrace, as well as cloud native tools like CloudWatch & CloudTrail, Azure Monitor and Log Analytics
+ Proficiency in implementing, monitoring, and maintaining a Databricks, RDS, or OpenAI platform.
+ Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net; 5+ years applied experience in Python/Java
+ Proficiency in implementing CI/CD pipelines with tools such as git and Jenkins, familiarity with using a GitOps model.
+ Advanced knowledge of networking (firewalls, DNS, Load Balancing, Proxies, etc.)
+ Advanced understanding of Linux & Windows operating systems including shell scripting
+ Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
+ Proven ability to work independently with minimal supervision and as part of a global team with direct responsibilities and an ability to juggle competing priorities and adapt to changes in project scope.
**Desired Skills**
+ Strong experience working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and PingIdentity, Okta, or other SSO solutions.
+ Proficiency in creating automation using Python, Terraform, or Ansible
+ Proficiency in implementing, monitoring, and maintaining a Databricks, CosmosDB, or OpenAI platform.
+ Experience in implementing, monitoring, and maintaining a highly scalable and resilient enterprise platform on Microsoft Azure using native services related to compute, storage, networking, security, and observability.
+ Experience with containerization technologies such as EC2, EKS, Fargate, Openshift, or Kubernetes.
+ Understanding of cost management, inventory management, FinOps model
Bank of America and its affiliates consider for employment and hire qualified candidates without regard to race, religious creed, religion, color, sex, sexual orientation, genetic information, gender, gender identity, gender expression, age, national origin, ancestry, citizenship, protected veteran or disability status or any factor prohibited by law, and as such affirms in policy and practice to support and promote the concept of equal employment opportunity, in accordance with all applicable federal, state, provincial and municipal laws. The company also prohibits discrimination on other bases such as medical condition, marital status or any other factor that is irrelevant to the performance of our teammates.
To view the "Know your Rights" poster, CLICK HERE ( .
View the LA County Fair Chance Ordinance ( .
Bank of America aims to create a workplace free from the dangers and resulting consequences of illegal and illicit drug use and alcohol abuse. Our Drug-Free Workplace and Alcohol Policy ("Policy") establishes requirements to prevent the presence or use of illegal or illicit drugs or unauthorized alcohol on Bank of America premises and to provide a safe work environment.
To view Bank of America's Drug-free Workplace and Alcohol Policy, CLICK HERE .
Bank of America is committed to an in-office culture with specific requirements for office-based attendance and which allows for an appropriate level of flexibility for our teammates and businesses based on role-specific considerations. Should you be offered a role with Bank of America, your hiring manager will provide you with information on the in-office expectations associated with your role. These expectations are subject to change at any time and at the sole discretion of the Company. To the extent you have a disability or sincerely held religious belief for which you believe you need a reasonable accommodation from this requirement, you must seek an accommodation through the Bank's required accommodation request process before your first day of work.
This communication provides information about certain Bank of America benefits. Receipt of this document does not automatically entitle you to benefits offered by Bank of America. Every effort has been made to ensure the accuracy of this communication. However, if there are discrepancies between this communication and the official plan documents, the plan documents will always govern. Bank of America retains the discretion to interpret the terms or language used in any of its communications according to the provisions contained in the plan documents. Bank of America also reserves the right to amend or terminate any benefit plan in its sole discretion at any time for any reason.
Senior Software Reliability Engineer (Observability) - open to remote across ANZ
Posted today
Job Viewed
Job Description
== Canva ==
Role Seniority - senior
More about the Senior Software Reliability Engineer (Observability) - open to remote across ANZ role at Canva
Job Description
Join the team redefining how the world experiences design.
Hey, g'day, mabuhay, kia ora, 你好, hallo, vítejte!
Thanks for stopping by. We know job hunting can be a little time consuming and you're probably keen to find out what's on offer, so we'll get straight to the point.
Where and how you can work
Our flagship campus is in Sydney. We also have a campus in Melbourne and co-working spaces in Brisbane, Perth and Adelaide. But you have choice in where and how you work, we trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals.
What you’d be doing in this role
As Canva scales change continues to be part of our DNA. But we like to think that's all part of the fun. So this will give you the flavour of the type of things you'll be working on when you start, but this will likely evolve.
At the moment, this role is focused on:
Being responsible for building and improving our observability platform and tooling, which is used by all Canva engineers.
Providing technical leadership and expertise to drive pragmatic solutions and achieve impactful design decisions.
Brainstorming, researching and prototyping to optimize our tracing and exceptions platforms, improve our operational effectiveness and increase reliability.
Being proactive in improving the tracing user experience and advocating for best practices.
Finding ways to improve the use of traces and exceptions, providing better insights to our engineers.
Enhancing our exception workflow to help engineers seamlessly capture errors, gain actionable insights through clear visualizations, and set up high-signal, low-noise alerts.
Participating in team ceremonies, knowledge sharing and brainstorming sessions.
Becoming an observability champion, evangelising best practices and guiding other Canvanauts in the observability space.
You're probably a match if
You are proficient and happy to code in Python, Java or Golang.
You have deep knowledge and understanding of Computer Engineering fundamentals and first principles.
You have a solid knowledge of AWS (EC2, EKS, Lambda, SQS, Kinesis, S3) or equivalent.
You have experience deploying and running containerized workloads on a platform like Kubernetes.
You have experience with Observability Tooling – having competency with tools like Elasticsearch, Grafana, Sentry, Jaegar Tracing or similar.
Experience running highly available and reliable distributed systems, with highly scalable data stores.
You are proficient with infrastructure-as-code - we’re a Terraform shop, but strong experience with other IaC tools will do the trick.
Not essential; but helpful experience!
You have experience with OpenTelemetry because it underpins a lot of the infrastructure and tooling that the team owns.
You have experience writing application code in Java or frontend code in TypeScript, since we also maintain the tracing libraries.
You have experience building and running monitoring infrastructure at scale. For example, Petabyte-scale Elasticsearch clusters or similar databases.
You have experience with data handling at scale.
You have experience with Clickhouse.
You have experience with data security, data obfuscation and PII detection.
About the team
You’ll join The Observability Traces & Exceptions Team, responsible for operational insights inside Canva. Our goal is to provide our development team with world-class tools to view how their services are performing in production. We achieve this by combining industry-leading third-party solutions with our own solutions developed in-house.
We work across the entire stack maintaining our TypeScript and Java tracing libraries, our tracing infrastructure, error reporting libraries and error handling guidelines to name just a few. As we scale all of these areas, we require more sophisticated solutions to ensure that Canva developers can continue to grow without compromising on reliability or availability.
What's in it for you?
Achieving our crazy big goals motivates us to work hard - and we do - but you'll experience lots of moments of magic, connectivity and fun woven throughout life at Canva, too. We also offer a range of benefits to set you up for every success in and outside of work.
Here's a taste of what's on offer:
Equity packages - we want our success to be yours too
Inclusive parental leave policy that supports all parents & carers
An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally
Check out lifeatcanva.com for more info.
Other stuff to know
We see AI as a powerful amplifier of creativity and technology at Canva. We’re evolving how we assess AI skills in our Technology hiring experience - you’ll tackle interactive, real-time challenges that reflect the kind of work we do. In some interviews, you may also be asked to solve a problem using an AI tool to show how you approach challenges with tech by your side. Your recruitment partner will walk you through what to expect. We make hiring decisions based on your experience, skills and passion, as well as how you can enhance Canva and our culture.
When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process. We celebrate all types of skills and backgrounds at Canva, so even if you don’t feel like your skills quite match what’s listed above - we still want to hear from you!
Please note that interviews are conducted virtually.
Before we jump into the responsibilities of the role. No matter what you come in knowing, you’ll be learning new things all the time and the Canva team will be there to support your growth.
Please consider applying even if you don't meet 100% of what’s outlined
Key Responsibilities
- Building and improving the observability platform
- Providing technical leadership
- Optimizing tracing and exceptions platforms
Key Strengths
- Programming in Python, Java, or Golang
- ️ Knowledge of AWS
- Experience with Observability Tooling
- Experience with OpenTelemetry
- Experience with data handling at scale
- Experience with Clickhouse
Why Canva is partnering with Hatch on this role. Hatch exists to level the playing field for people as they discover a career that’s right for them. So when you apply you have the chance to show more than just your resume.
A Final Note: This is a role with Canva not with Hatch.