Didn't find the right job?

Get expert career advice to help you find the ideal role and improve your job search strategy.

39 Reliability Engineer jobs in Australia

Reliability Engineer

Woolloomooloo, New South Wales KBR

Posted 2 days ago

Tap Again To Close

Job Description

Title:
Reliability Engineer
At KBR - We do things that matter.
We deliver science, technology and engineering solutions to governments and companies around the world. KBR employs approximately 38,000 people worldwide with customers in more than 80 countries and operations in over 29 countries.
KBR is proud to work with its customers across the globe to provide technology, value-added services, and long-term operations and maintenance services to ensure consistent delivery with predictable results. At KBR, We Deliver.
Think.KBR.com
KBR in Australia
With over 65 years working on some of Australia's largest and most complex projects, KBR has unmatched experience supporting the nation's critical infrastructure, energy transition and national security priorities. KBR has around 2,000 employees throughout Australia, who are focused on delivering innovative technology and engineering solutions for a safer, more secure and sustainable future.
Learn more about KBR in Australia
Belong, Connect and Grow at KBR
At KBR, we are passionate about our people and our Zero Harm culture. These inform all that we do and are at the heart of our commitment to, and ongoing journey toward being a People First company. That commitment is central to our team of team's philosophy and fosters an environment where everyone can Belong, Connect and Grow. We Deliver - Together.
Capability Life Cycle Manager (CLCM)
The CLCM is integrated into the System Program Office (SPO) and contracted to deliver through-life asset management of Products being supported (PBS). The CLCM performs asset, sustainment and engineering functions to optimise availability and Total Cost of Ownership (TCO) and minimise risk through life of the PBS. The CLCM works collaboratively via the SPO with Regional Maintenance Centre's (RMC) to plan maintenance activities and installation of capability upgrades. The CLCM works collaboratively via the SPO with the Design Services Contract (DSC) to plan and execute engineering changes and services. The CLCM delivers functions against the contract scope across the Asset Class Enterprise to support Defence Maritime Capability.
The Opportunity:
KBR is seeking a dedicated and experienced Reliability Engineer to join our CLCM team to support the delivery of a sovereign sustainment capability to the Royal Australian Navy's Amphibious Combat and Support Ship fleet.
As a Reliability Engineer, you will support the Reliability and MRD Program Lead to deliver asset management services in accordance with the CLCM contract specifically relating to the Reliability and Maintenance Determination Program and Maintenance Effectiveness Reviews.
The Reliability Engineer focusses primarily on identifying and solving problems related to the reliability of systems and equipment to increase the overall uptime and availability, and to reduce the number of failures and downtime.
You will work with and guide all stakeholders on technical matters including LSA, Maintenance Requirements Determination (MRD), and Reliability Centred Maintenance (RCM) on the products being supported.
This role is based out of Garden Island, Sydney.
Responsibilities:
The key responsibilities of the role will include, but is not limited to:
+ Analysing systems and equipment failure historical data to identify patterns and trends.
+ Use statistical analysis to predict system reliability and risk of failure.
+ Conducting failure investigation and root cause analysis (FTA, FBD, RBD, 5 whys analysis, Fishbone, FMECA, FRACAS, RCM analysis etc.) to determine the underlying cause of problems and prevent recurring failures.
+ Conduct standard activity and maintenance task analysis to identify and minimize possible gaps in recommended maintenance.
+ Analyse performance metrics and benchmark performance across the platform against the RAM metrics
+ Document and communicate reliability analysis and testing results
+ Developing and implementing new maintenance strategies and routine maintenance checks to reduce risk, improve system performance
+ Finding new technology and processes that can improve equipment performance and reliability.
+ Collaborating with other departments to ensure that reliability is integrated into all aspects of the organization.
+ Developing and implementing training programs for employees.
As the ideal candidate you will bring:
Essential:
+ Bachelor's degree in Engineering, or related field (or equivalent experience coupled with relevant qualifications).
+ Minimum of 8 years of experience in a Defence or related technical industry such as Maritime, Rail, Manufacturing, Water, Mining or Oil and Gas.
+ Strong knowledge of reliability engineering and statistical analysis.
+ Experience with reliability-centered maintenance (RCM) tools and techniques.
+ Strong knowledge and thorough understanding of systems, equipment, tools and their functionality, safety, application, fault diagnosis, operating and maintenance requirement.
+ Excellent understanding of maintenance strategy, plan, policy and their implementation
+ Proficiency in software tools including MS Office and reliability software (OPUS, MADe, eLESPro etc) to support LSAR
+ Training in LSA or MRD or equivalent vocational training.
Desirable:
+ Experience in the application of RCM2 and RCM3 and MIL-STDs for RCM.
+ Experience in the application of DEF(AUST) 5691 and 5692.
+ Experience in use of DOORS.
+ Experience in ship's systems preferred.
+ Experience with Department of Defence design delivery.
All candidates will be required to hold and maintain an active Baseline Defence Security Clearance. Only candidates holding a Baseline Clearance or above should apply.
What we will offer you:
· A workplace culture certified as a Great Place To Work
· Flexible working
· Competitive salary (including annual reviews)
· Paid parental leave
· Income protection
· Corporate rewards
· Salary packaging/Novated leasing
· Employee stock purchase plans
· Flu shots, skin checks and discounted private health insurance
· Career development: Online learning, mentorship and career pathways
If you're ready to shape tomorrow, let's get started. Apply Now!
KBR acknowledges the Traditional Custodians of Country throughout Australia and their continuing connections to land, sea, community and culture. We pay our respects to Elders past and present.
As a Major Service Provider of the Australian Defence Force, an AGSVA security clearance will be required and compliance to International Traffic in Arms Regulations (ITAR). As such, our hiring decisions are based on the key requirements of each role and candidates are selected based on their unique strengths and experiences.
#LI-AH1

This advertiser has chosen not to accept applicants from your region.

Systems Reliability Engineer

Sydney, New South Wales Nutanix

Posted 1 day ago

Tap Again To Close

Job Description

**Hungry, Humble, Honest, with Heart.**
**The Opportunity**
Nutanix is a global leader in cloud software and a pioneer in hyperconverged infrastructure solutions, making computing invisible anywhere. We are excited to bring in a new wave of talented Technical Support Engineers into our company! If you are someone that: loves technology and innovation; have a customer-first mindset; has an eagerness to learn and wants to enhance your skillset in our leading Hyperconverged Infrastructure (HCI) technology and Hybrid-Cloud solutions, come be part of the family and help us provide our customers with an exceptional support experience.
**About the Team**
Here at Nutanix we drive the success of our customers through passion and teamwork, ensuring quick response times and unparalleled customer satisfaction; our 90/100 NPS score is a testament to such objective. Our diverse, multicultural team of top-notch engineers act as our customers champions, working closely with Engineering, Field, and Sales teams. Our teams love to have constant learning opportunities and aim to become experts in Virtualization, Networking and Storage.
You will report to the Worldwide Support Manager who is dedicated to fostering a positive work environment and providing the necessary guidance and resources for your success.
**Your Role**
+ Troubleshoot, debug, and diagnose customer issues encountered in the field.
+ Improve the serviceability of the product by testing new features and developing tools to scale our field deployment and auto-support infrastructure.
+ Provide analysis of our existing customer base to avoid and minimize risks in the field.
+ Define and lead changes to our product with our development engineering team based on feedback from customers and field implementations.
+ Work with technology partners (e.g. VMware, Citrix, Microsoft) to resolve issues and push improvements in our ecosystem.
+ Develop and work on internal and external knowledge bases.
+ Provide support on weekdays and also off hours on an as-needed and scheduled rotational basis.
+ Be a champion for our customers. Go above and beyond to support their business and use of the Nutanix stack.
**What You´ll Bring**
+ Proven work experience in troubleshooting at least two of the following: Virtualization (preferably VMware ESXi), Networking (preferably layer 2/3), Linux Systems (preferably CLI administration), Storage Analysis
+ Experience supporting external customers and a customer first mindset.
+ Multilingual (Korean / Mandarin) will be highly regarded
+ Passion and ability to learn new things.
**Work Arrangement**
Hybrid: This role operates in a hybrid capacity, blending the benefits of remote work with the advantages of in-person collaboration. In locations where our workplace policy applies (i.e. San Jose, Durham, Mexico City, Bangalore, Pune, Hoofddorp, Belgrade, Barcelona, Singapore, Sydney and Tokyo), employees are expected to work onsite a minimum of 3 days per week to foster collaboration, team alignment, and access to in-office resources. Workplace type may vary based on location and team requirements. Please speak with your recruiter for details. Additional team-specific guidance and norms will be provided by your manager.
We're an Equal Opportunity Employer Nutanix is an Equal Employment Opportunity and (in the U.S.) an Affirmative Action employer. Qualified applicants are considered for employment opportunities without regard to race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, marital status, protected veteran status, disability status or any other category protected by applicable law. We hire and promote individuals solely on the basis of qualifications for the job to be filled. We strive to foster an inclusive working environment that enables all our Nutants to be themselves and to do great work in a safe and welcoming environment, free of unlawful discrimination, intimidation or harassment. As part of this commitment, we will ensure that persons with disabilities are provided reasonable accommodations. If you need a reasonable accommodation, please let us know by contacting

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

New South Wales, New South Wales Dynatrace

Posted 2 days ago

Tap Again To Close

Job Description

**Your role at Dynatrace**
We are strengthening our Site Reliability Engineering team based in Sydney and looking for an SRE to join our innovative team. Your detailed responsibilities in this new team will be:
+ **Automate Manual Tasks** : Leverage your production expertise to translate manual processes into automated solutions, driving operational efficiency.
+ **Optimize Capacity Planning** : Ensure cost-effective resource utilization while maintaining scalability and performance.
+ **Product Release Management** : Oversee and coordinate product release processes, refining workflows through continuous improvement initiatives.
+ **Implement Monitoring & Alerting** : Design and deploy automated monitoring and alerting systems to boost the efficiency, reliability, and scalability of cloud infrastructure.
+ **Incident Resolution:** Support production stability by promptly investigating and resolving production incidents.
+ **Monitoring Configuration:** Configure and maintain robust monitoring solutions to ensure efficient, scalable, and seamless production operations.
+ **On-Call Support:** Participate in On-Call rotations to provide critical support and maintain system stability and uptime.
**What will help you succeed**
+ 3+ years of experience in scripting and/or programming languages such as Go, C, Shell, Python, or Java.
+ Proficiency in at least one Hyperscaler (AWS, Azure, or GCP).
+ Enthusiastic, go-for-it attitude with hands-on experience in Kubernetes (preferred).
+ Strong communication, problem-solving, and critical-thinking skills.
+ Curiosity and interest in working on highly scalable systems.
**Why you will love being a Dynatracer**
+ Dynatrace is a leader in unified observability and security.
+ We provide a culture of excellence with competitive compensation packages designed to recognize and reward performance.
+ Our employees work with the largest cloud providers, including AWS, Microsoft, and Google Cloud, and other leading partners worldwide to create strategic alliances.
+ The Dynatrace platform uses cutting-edge technologies, including our own Davis hypermodal AI, to help our customers modernize and automate cloud operations, deliver software faster and more securely, and enable flawless digital experiences.
+ Over 50% of the Fortune 100 companies are current customers of Dynatrace.
Dynatrace is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, sex, color, gender identity, religion, national origin, ancestry, citizenship, physical abilities, age, sexual orientation, creed, disability status, veteran status, pregnancy, genetic status, or any other characteristic protected by law.

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer, Home

Google

Posted 1 day ago

Tap Again To Close

Job Description

Site Reliability Engineer, Home
_corporate_fare_ Google _place_ Sydney NSW, Australia
**Early**
Experience completing work as directed, and collaborating with teammates; developing knowledge of relevant concepts and processes.
_info_outline_
XAt Google, we have a vision of empowerment and equitable opportunity for all Aboriginal and Torres Strait Islander peoples and commit to building reconciliation through Google's technology, platforms and people and we welcome Indigenous applicants. Please see ourReconciliation Action Plan ( for more information.
**Minimum qualifications:**
+ Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
+ 1 year of experience with software development in one or more programming languages during coursework/projects, research, internships, or practical experience in school, work, or Open Source projects.
+ 1 year of experience with data structures or algorithms.
**Preferred qualifications:**
+ Master's degree in Computer Science or Engineering.
+ Ability to debug, optimize code, and automate routine tasks.
+ Excellent communication skills, and leadership in a distributed team structure.
**About the job**
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance.
Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
With your technical expertise, you will manage project priorities, deadlines, and deliverables. You will design, develop, test, deploy, maintain, and enhance software solutions.
**Responsibilities**
+ Design, launch, and enhance the reliability of Home products and Assistant services utilizing Google's advanced production infrastructure.
+ Engage in software engineering to build and maintain services, while providing expert troubleshooting to resolve complex production issues.
+ Identify opportunities and implement solutions to continuously improve the reliability, performance, and development velocity of Home products.
+ Provide technical leadership and mentorship to team members, guiding engineering decisions and upholding best practices in Site Reliability Engineering (SRE).
+ Lead the initiative to accelerate the migration of legacy services from platforms like Google Cloud Platform (GCP) to Google's internal systems, driving the phase-out of outdated architecture.
Information collected and processed as part of your Google Careers profile, and any job applications you choose to submit is subject to Google'sApplicant and Candidate Privacy Policy (./privacy-policy) .
Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), expecting or parents-to-be, criminal histories consistent with legal requirements, or any other basis protected by law. See alsoGoogle's EEO Policy ( ,Know your rights: workplace discrimination is illegal ( ,Belonging at Google ( , andHow we hire ( .
If you have a need that requires accommodation, please let us know by completing ourAccommodations for Applicants form ( .
Google is a global company and, in order to facilitate efficient collaboration and communication globally, English proficiency is a requirement for all roles unless stated otherwise in the job posting.
To all recruitment agencies: Google does not accept agency resumes. Please do not forward resumes to our jobs alias, Google employees, or any other organization location. Google is not responsible for any fees related to unsolicited resumes.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also and If you have a need that requires accommodation, please let us know by completing our Accommodations for Applicants form:

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (Kafka)

Canberra, Australian Capital Territory NetApp

Posted 2 days ago

Tap Again To Close

Job Description

**Job Summary**
NetApp is looking for a Senior Techops Engineer to join our growing Instaclustr team in Australia. NetApp's Instaclustr offering provides open source as-a-service company, delivering reliability at scale. We manage cutting edge open-source technologies (Cassandra, Kafka, PostgreSQL, Redis/Valkey, OpenSearch, Postgres, ClickHouse and Cadence) for our customers around the world.
NetApp Instaclustr makes it easy for our customers to run powerful open-source applications at the highest levels of scale. We have developed a platform that takes care of the whole lifecycle: provisioning infrastructure, installing applications and, most importantly, keeping the applications running reliably in production. Since being founded in 2013, Instaclustr has grown strongly, with over 300 customers worldwide, and over 19,000 nodes under management.
Our Technical Operations Engineers are the frontline team keeping our large fleet of cloud-hosted open-source clusters up and running. Your work will ensure the security, reliability and performance of world-class systems and databases. You will collaborate with our customer's technical teams, from globally recognised companies in the gaming, banking and logistics industry sectors, ranging from big multinationals to emerging start-ups.
**The Role**
If you have excellent operational knowledge in managing Kafka clusters, look no further !
As a Site Reliability Engineer (Kafka), you are in the frontline team keeping our large fleet of cloud-hosted Kafka clusters up and running. Every day, you will diagnose and solve interesting technical problems, providing Kafka as a Managed Service in a highly automated environment. Our service is relied on by some of the leading global names in Banking and Financial Services, Telecom, IoT and Tech companies that interact with millions of end users.
**Skills & Experience**
We're looking for smart engineers with exceptional communication skills, a positive attitude, and a passion for IT and learning new things. We expect you to be, or quickly become proficient in a range of the technologies we use. Successful candidates for this role will:
+ Have strong experience in Kafka, and a desire to learn more and develop to a true expert level.
+ Ideally should already have experience diagnosing various operational issues through the analysis of logs /graphs.
+ Past experience with abovementioned tech's upgrades and migrations would be favourable.
+ Have good experience working on one Public Cloud provider such as AWS, Azure or GCP.
+ Preferably have past IT Customer service/support experience.
+ Good fundamental Computer science / software engineering skills and knowledge, particularly Operating System internals, memory management, and networking.
+ Strong knowledge and experience with Linux and be comfortable working from the command line (essential)
+ Exceptional ability to communicate clearly and professionally in written and verbal English (essential).
+ Work as part of a team and use your initiative to get things done.
+ Ability to follow required processes and procedures.
+ Investigating/researching issues by reviewing the source code.
+ Programming skills in Python or Java, and source code control using Git would be a plus.
**I'm interested. What else will I be doing?**
+ Provide expert operational support to our nodes running in the cloud (AWS, Azure and GCP), using technologies such as Linux (Debian), Docker, and languages including Java, Python and bash.Liaise with our customers' engineers in resolving interesting issues related to Kafka usage and other supported technologies.
+ Participate in on-call Level 2 roster.
+ Liaise with our customers' engineers in resolving interesting issues related to Kafka.
+ Undertake complex cluster operations such as migrations, upgrades and maintenance on our fleet.
+ Develop and continually improve our suite of internal automation tools, applications, and processes.

At NetApp, we embrace a hybrid working environment designed to strengthen connection, collaboration, and culture for all employees. This means that most roles will have some level of in-office and/or in-person expectations, which will be shared during the recruitment process.
**Equal Opportunity Employer:**
NetApp is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all laws that prohibit employment discrimination based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability or genetic information, pregnancy, and any protected classification.
**Why NetApp?**
We are all about helping customers turn challenges into business opportunity. It starts with bringing new thinking to age-old problems, like how to use data most effectively to run better - but also to innovate. We tailor our approach to the customer's unique needs with a combination of fresh thinking and proven approaches.
We enable a healthy work-life balance. Our volunteer time off program is best in class, offering employees 40 hours of paid time off each year to volunteer with their favourite organizations. We provide comprehensive benefits, including health care, life and accident plans, emotional support resources for you and your family, legal services, and financial savings programs to help you plan for your future. We support professional and personal growth through educational assistance and provide access to various discounts and perks to enhance your overall quality of life.
If you want to help us build knowledge and solve big problems, let's talk.

This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer, Google Play

Google

Posted 23 days ago

Tap Again To Close

Job Description

Senior Site Reliability Engineer, Google Play
_corporate_fare_ Google _place_ Sydney NSW, Australia
**Mid**
Experience driving progress, solving problems, and mentoring more junior team members; deeper expertise and applied knowledge within relevant area.
_info_outline_
X
At Google, we have a vision of empowerment and equitable opportunity for all Aboriginal and Torres Strait Islander peoples and commit to building reconciliation through Google's technology, platforms and people and we welcome Indigenous applicants. Please see ourReconciliation Action Plan ( for more information.
**Minimum qualifications:**
+ Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
+ 5 years of experience with software development in one or more programming languages.
+ 3 years of experience in designing, analyzing, and troubleshooting distributed systems.
+ 2 years of experience leading projects and providing technical leadership.
**Preferred qualifications:**
+ Experience in problem-solving and analyzing large-scale distributed systems.
+ Experience in mobile development and large-scale application deployment.
+ Proficiency in algorithms, data structures, software design, or expertise in Unix/Linux systems and IP networking.
+ Ability to set strategy, provide technical guidance, and motivate the engineering team to execute effectively.
+ Ability to debug complex code and automate routine tasks to improve efficiency.
+ Excellent problem-solving approach, coupled with strong communication skills.
**About the job**
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance.
Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
Google Play offers music, movies, books, apps and games for devices, powered by the cloud. It syncs across devices and on the web. As part of the Android and Mobile team, Googlers working on Google Play do everything from engineering our backend systems, to shaping product strategy, to forming great content partnerships. They make it possible for people to do things like buy an ebook or song on their Android phone, then have it instantly available on their laptop. The Google Play team enhances the Android ecosystem by giving developers and partners a premium store where they can reach millions of users.
**Responsibilities**
+ Drive product excellence by owning and improving the availability and performance of key products, ensuring a great experience for our global users.
+ Envision the reliability roadmap and influence stakeholders to enhance the availability, latency, scalability, and efficiency of core platform services.
+ Develop and maintain monitoring systems, including instrumentation and dashboards, to accurately measure reliability and the end-user experience.
+ Build insightful software tools and influence the design, architecture, and operating standards for large-scale distributed systems.
+ Engage in strategic planning, including service capacity planning, demand forecasting, performance analysis, and system tuning.
Information collected and processed as part of your Google Careers profile, and any job applications you choose to submit is subject to Google'sApplicant and Candidate Privacy Policy (./privacy-policy) .
Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), expecting or parents-to-be, criminal histories consistent with legal requirements, or any other basis protected by law. See alsoGoogle's EEO Policy ( ,Know your rights: workplace discrimination is illegal ( ,Belonging at Google ( , andHow we hire ( .
If you have a need that requires accommodation, please let us know by completing ourAccommodations for Applicants form ( .
Google is a global company and, in order to facilitate efficient collaboration and communication globally, English proficiency is a requirement for all roles unless stated otherwise in the job posting.
To all recruitment agencies: Google does not accept agency resumes. Please do not forward resumes to our jobs alias, Google employees, or any other organization location. Google is not responsible for any fees related to unsolicited resumes.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also and If you have a need that requires accommodation, please let us know by completing our Accommodations for Applicants form:

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer, Google One and Photos

Google

Posted 1 day ago

Tap Again To Close

Job Description

Site Reliability Engineer, Google One and Photos
_corporate_fare_ Google _place_ Sydney NSW, Australia
**Early**
Experience completing work as directed, and collaborating with teammates; developing knowledge of relevant concepts and processes.
_info_outline_
XAt Google, we have a vision of empowerment and equitable opportunity for all Aboriginal and Torres Strait Islander peoples and commit to building reconciliation through Google's technology, platforms and people and we welcome Indigenous applicants. Please see ourReconciliation Action Plan ( for more information.
**Minimum qualifications:**
+ Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
+ 1 year of experience with software development in one or more programming languages during coursework/projects, research, internships, or practical experience in school, work, or Open Source projects.
**Preferred qualifications:**
+ 1 year of experience with data structures or algorithms.
+ Experience in troubleshooting and debugging of distributed systems.
+ Experience in one or more of the following such as C++, Java, and Go.
+ Understanding of distributed computing systems.
+ Excellent communication skills.
**About the job**
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance.
Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
With your technical expertise, you will manage project priorities, deadlines, and deliverables. You will design, develop, test, deploy, maintain, and enhance software solutions.
***Use this template only when existing Area Templates don't apply. For one-off use cases, not ongoing roles. If you need to create a NEW template for your team submit a request at go/jdreview.***
**Responsibilities**
+ Collaborate with other engineers to build reliable systems that meet customer needs and deliver the team's Objectives and Key Results (OKR).
+ Manage availability and performance by measuring the entire system and developing automated solutions for improvement.
+ Drive involvement in the entire service life-cycle, from inception and design through implementation, deployment, operation and refinement.
+ Promote reuse and best practices across teams when selecting from different design approaches.
+ Participate in a sustainable on-call incident response team and conduct blameless postmortems.
Information collected and processed as part of your Google Careers profile, and any job applications you choose to submit is subject to Google'sApplicant and Candidate Privacy Policy (./privacy-policy) .
Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), expecting or parents-to-be, criminal histories consistent with legal requirements, or any other basis protected by law. See alsoGoogle's EEO Policy ( ,Know your rights: workplace discrimination is illegal ( ,Belonging at Google ( , andHow we hire ( .
If you have a need that requires accommodation, please let us know by completing ourAccommodations for Applicants form ( .
Google is a global company and, in order to facilitate efficient collaboration and communication globally, English proficiency is a requirement for all roles unless stated otherwise in the job posting.
To all recruitment agencies: Google does not accept agency resumes. Please do not forward resumes to our jobs alias, Google employees, or any other organization location. Google is not responsible for any fees related to unsolicited resumes.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also and If you have a need that requires accommodation, please let us know by completing our Accommodations for Applicants form:

This advertiser has chosen not to accept applicants from your region.

Be The First To Know

About the latest Reliability engineer Jobs in Australia !

Set Email Alert:

Enter your email

Job title

Location

Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia

Sydney, New South Wales Bank of America

Posted 10 days ago

Tap Again To Close

Job Description

Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia
Sydney, Australia
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge
Refer a friend
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge ( Description:**
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.
Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to being a diverse and inclusive workplace, attracting and developing exceptional talent, supporting our teammates' physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.
At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!
**Enterprise Cloud Platforms Team:**
Our team designs, builds, and maintains Public Cloud platforms for Bank of America's. We provide our customers an innovative platform with bult-in integrations that allow for a faster time-to-market with reduced complexity. We believe in a high-quality engineering culture, a customer focused mindset, and building for scale and resiliency. As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company.
We are seeking Site Reliability Engineers (SREs) to design, build, and maintain our next-gen platforms. The role provides opportunity to work with wide range of technologies and build a unique perspective that comes with integrating disparate services (both on-prem/off-prem) which must interact seamlessly with each other. You will work with colleagues that are fun, smart, hardworking, and driven. You will be part of a global team that is growing, giving you room to innovate and be creative.
**Position Summary**
+ Collaborates with a diverse set of engineers, architects, and teams to design, develop, test, and implement secure, robust, highly available and scalable solutions for BofA's External Cloud Platform
+ Collaborates other software engineers and teams to design and implement deployment approaches using highly scalable, automated, continuous integration and continuous delivery pipelines.
+ Responsible for all aspects of reliability, collaborates with technical experts, key stakeholders, and team members to resolve complex problems, owning the issue until you are sure it will not reoccur.
+ Deep understanding of SRE practices, service level indicators, and service level objectives; proactively utilize them to resolve issues before they impact customers.
+ Gather, analyze, synthesize, and develop visualizations and reporting from large, diverse data sets in service of continuous improvement of the platform.
+ Implement infrastructure, configuration, and network as code for the applications and platforms in your remit.
+ Identify opportunities to eliminate toil and automate the triage of issues to improve overall operational stability.
+ Collaborate with a global team to identify, analyze, and resolve platform vulnerabilities.
+ Proactively promotes the adoption of site reliability engineering best practices within the team and organization.
+ Participate in 24x7 on-call coverage follow the sun model and performs blameless Postmortems (RCAs) as needed.
**Required Skills:**
+ 7 years of combined experience in either SRE, software development, or infrastructure engineering (4 years with an advanced degree in Computer Science or related technical field).
+ 3+ years of hands-on experience building and maintaining cloud platforms on a major cloud service provider.
+ Strong experience in implementing, monitoring, and maintaining a highly scalable and resilient Data Services platform on major CSP's like AWS, Azure or GCP.
+ Strong experience with monitoring tools such as Grafana, Prometheus, Splunk, or Dynatrace, as well as cloud native tools like CloudWatch & CloudTrail, Azure Monitor and Log Analytics
+ Proficiency in implementing, monitoring, and maintaining a Databricks, RDS, or OpenAI platform.
+ Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net; 5+ years applied experience in Python/Java
+ Proficiency in implementing CI/CD pipelines with tools such as git and Jenkins, familiarity with using a GitOps model.
+ Advanced knowledge of networking (firewalls, DNS, Load Balancing, Proxies, etc.)
+ Advanced understanding of Linux & Windows operating systems including shell scripting
+ Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
+ Proven ability to work independently with minimal supervision and as part of a global team with direct responsibilities and an ability to juggle competing priorities and adapt to changes in project scope.
**Desired Skills**
+ Strong experience working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and PingIdentity, Okta, or other SSO solutions.
+ Proficiency in creating automation using Python, Terraform, or Ansible
+ Proficiency in implementing, monitoring, and maintaining a Databricks, CosmosDB, or OpenAI platform.
+ Experience in implementing, monitoring, and maintaining a highly scalable and resilient enterprise platform on Microsoft Azure using native services related to compute, storage, networking, security, and observability.
+ Experience with containerization technologies such as EC2, EKS, Fargate, Openshift, or Kubernetes.
+ Understanding of cost management, inventory management, FinOps model
Bank of America and its affiliates consider for employment and hire qualified candidates without regard to race, religious creed, religion, color, sex, sexual orientation, genetic information, gender, gender identity, gender expression, age, national origin, ancestry, citizenship, protected veteran or disability status or any factor prohibited by law, and as such affirms in policy and practice to support and promote the concept of equal employment opportunity, in accordance with all applicable federal, state, provincial and municipal laws. The company also prohibits discrimination on other bases such as medical condition, marital status or any other factor that is irrelevant to the performance of our teammates.
To view the "Know your Rights" poster, CLICK HERE ( .
View the LA County Fair Chance Ordinance ( .
Bank of America aims to create a workplace free from the dangers and resulting consequences of illegal and illicit drug use and alcohol abuse. Our Drug-Free Workplace and Alcohol Policy ("Policy") establishes requirements to prevent the presence or use of illegal or illicit drugs or unauthorized alcohol on Bank of America premises and to provide a safe work environment.
Bank of America is committed to an in-office culture with specific requirements for office-based attendance and which allows for an appropriate level of flexibility for our teammates and businesses based on role-specific considerations. Should you be offered a role with Bank of America, your hiring manager will provide you with information on the in-office expectations associated with your role. These expectations are subject to change at any time and at the sole discretion of the Company. To the extent you have a disability or sincerely held religious belief for which you believe you need a reasonable accommodation from this requirement, you must seek an accommodation through the Bank's required accommodation request process before your first day of work.
This communication provides information about certain Bank of America benefits. Receipt of this document does not automatically entitle you to benefits offered by Bank of America. Every effort has been made to ensure the accuracy of this communication. However, if there are discrepancies between this communication and the official plan documents, the plan documents will always govern. Bank of America retains the discretion to interpret the terms or language used in any of its communications according to the provisions contained in the plan documents. Bank of America also reserves the right to amend or terminate any benefit plan in its sole discretion at any time for any reason.

This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia

Sydney, New South Wales Bank of America

Posted 10 days ago

Tap Again To Close

Job Description

Senior Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia
Sydney, Australia
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge
Refer a friend
**To proceed with your application, you must be at least 18 years of age.**
Acknowledge ( Description:**
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.
Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to being a diverse and inclusive workplace, attracting and developing exceptional talent, supporting our teammates' physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.
At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!
**Enterprise Cloud Platforms Team:**
Our team designs, builds, and maintains Public Cloud platforms for Bank of America's. We provide our customers an innovative platform with bult-in integrations that allow for a faster time-to-market with reduced complexity. We believe in a high-quality engineering culture, a customer focused mindset, and building for scale and resiliency. As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company.
We are seeking Senior Site Reliability Engineers (SREs) to design, build, and maintain our next-gen platforms. The role provides opportunity to work with wide range of technologies and build a unique perspective that comes with integrating disparate services (both on-prem/off-prem) which must interact seamlessly with each other. You will work with colleagues that are fun, smart, hardworking, and driven. You will be part of a global team that is growing, giving you room to innovate and be creative.
**Position Summary**
+ Collaborates with a diverse set of engineers, architects, and teams to design, develop, test, and implement secure, robust, highly available and scalable solutions for BofA's External Cloud Platform
+ Collaborates other software engineers and teams to design and implement deployment approaches using highly scalable, automated, continuous integration and continuous delivery pipelines.
+ Responsible for all aspects of reliability, collaborates with technical experts, key stakeholders, and team members to resolve complex problems, owning the issue until you are sure it will not reoccur.
+ Deep understanding of SRE practices, service level indicators, and service level objectives; proactively utilize them to resolve issues before they impact customers.
+ Gather, analyze, synthesize, and develop visualizations and reporting from large, diverse data sets in service of continuous improvement of the platform.
+ Implement infrastructure, configuration, and network as code for the applications and platforms in your remit.
+ Identify opportunities to eliminate toil and automate the triage of issues to improve overall operational stability.
+ Collaborate with a global team to identify, analyze, and resolve platform vulnerabilities.
+ Proactively promotes the adoption of site reliability engineering best practices within the team and organization.
+ Participate in 24x7 on-call coverage follow the sun model and performs blameless Postmortems (RCAs) as needed.
**Required Skills:**
+ 15 years of combined experience in either SRE, software development, or infrastructure engineering (10 years with an advanced degree in Computer Science or related technical field).
+ 7+ years of hands-on experience building and maintaining cloud platforms on a major cloud service provider.
+ Strong experience in implementing, monitoring, and maintaining a highly scalable and resilient Data Services platform on major CSP's like AWS, Azure or GCP.
+ Strong experience with monitoring tools such as Grafana, Prometheus, Splunk, or Dynatrace, as well as cloud native tools like CloudWatch & CloudTrail, Azure Monitor and Log Analytics
+ Proficiency in implementing, monitoring, and maintaining a Databricks, RDS, or OpenAI platform.
+ Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net; 5+ years applied experience in Python/Java
+ Proficiency in implementing CI/CD pipelines with tools such as git and Jenkins, familiarity with using a GitOps model.
+ Advanced knowledge of networking (firewalls, DNS, Load Balancing, Proxies, etc.)
+ Advanced understanding of Linux & Windows operating systems including shell scripting
+ Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
+ Proven ability to work independently with minimal supervision and as part of a global team with direct responsibilities and an ability to juggle competing priorities and adapt to changes in project scope.
**Desired Skills**
+ Strong experience working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and PingIdentity, Okta, or other SSO solutions.
+ Proficiency in creating automation using Python, Terraform, or Ansible
+ Proficiency in implementing, monitoring, and maintaining a Databricks, CosmosDB, or OpenAI platform.
+ Experience in implementing, monitoring, and maintaining a highly scalable and resilient enterprise platform on Microsoft Azure using native services related to compute, storage, networking, security, and observability.
+ Experience with containerization technologies such as EC2, EKS, Fargate, Openshift, or Kubernetes.
+ Understanding of cost management, inventory management, FinOps model
Bank of America and its affiliates consider for employment and hire qualified candidates without regard to race, religious creed, religion, color, sex, sexual orientation, genetic information, gender, gender identity, gender expression, age, national origin, ancestry, citizenship, protected veteran or disability status or any factor prohibited by law, and as such affirms in policy and practice to support and promote the concept of equal employment opportunity, in accordance with all applicable federal, state, provincial and municipal laws. The company also prohibits discrimination on other bases such as medical condition, marital status or any other factor that is irrelevant to the performance of our teammates.
To view the "Know your Rights" poster, CLICK HERE ( .
View the LA County Fair Chance Ordinance ( .
Bank of America aims to create a workplace free from the dangers and resulting consequences of illegal and illicit drug use and alcohol abuse. Our Drug-Free Workplace and Alcohol Policy ("Policy") establishes requirements to prevent the presence or use of illegal or illicit drugs or unauthorized alcohol on Bank of America premises and to provide a safe work environment.
Bank of America is committed to an in-office culture with specific requirements for office-based attendance and which allows for an appropriate level of flexibility for our teammates and businesses based on role-specific considerations. Should you be offered a role with Bank of America, your hiring manager will provide you with information on the in-office expectations associated with your role. These expectations are subject to change at any time and at the sole discretion of the Company. To the extent you have a disability or sincerely held religious belief for which you believe you need a reasonable accommodation from this requirement, you must seek an accommodation through the Bank's required accommodation request process before your first day of work.
This communication provides information about certain Bank of America benefits. Receipt of this document does not automatically entitle you to benefits offered by Bank of America. Every effort has been made to ensure the accuracy of this communication. However, if there are discrepancies between this communication and the official plan documents, the plan documents will always govern. Bank of America retains the discretion to interpret the terms or language used in any of its communications according to the provisions contained in the plan documents. Bank of America also reserves the right to amend or terminate any benefit plan in its sole discretion at any time for any reason.

This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia

Sydney, New South Wales Bank of America

Posted 10 days ago

Tap Again To Close

Job Description

This advertiser has chosen not to accept applicants from your region.

Industry

View All Reliability Engineer Jobs

Menu

Search Suggestions

Recent Searches

Popular Searches

Location Suggestions

Popular Locations

Nearby Locations

Other Jobs Near Me

Industry

39 Reliability Engineer jobs in Australia

Reliability Engineer

Job Description

Systems Reliability Engineer

Job Description

Site Reliability Engineer

Job Description

Site Reliability Engineer, Home

Job Description

Site Reliability Engineer (Kafka)

Job Description

Senior Site Reliability Engineer, Google Play

Job Description

Site Reliability Engineer, Google One and Photos

Job Description

Be The First To Know

Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia

Job Description

Senior Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia

Job Description

Senior Site Reliability Engineer, Enterprise Cloud Platforms, Global Technology, Australia

Job Description

Nearby Locations

Other Jobs Near Me

Industry