How to become a successful SRE in Australia

Computerworld Australia asked site reliability engineers about what they do and to share tips for those looking to follow this career.

noops code developer devops html web developer by mazimusnd getty
Maximusnd / Getty Images

LinkedIn’s annual Emerging Jobs Report 2020 for Australia places the site reliability engineer role as the fifth emerging job in the country.

But the role seems relatively new in Australia as the largest job seeking websites in the country don’t offer data on the role itself. A LinkedIn search for the SRE role returns some 10,000 results and 212 jobs posted on the social media platform in Australia.

Computerworld Australia spoke to several SREs to learn how they got started in the role and what they suggest for those looking to follow this path.

SRE defined: What a site reliability engineer does

The site reliability engineer (SRE) role is a concept created by Google in the early 2000s. To ensure the reliability of the company platform, an SRE performs several roles, including consulting on the designs for new features, breaking a test environment to see how the systems built will respond, figuring out what the service level indicators are, and coordinating the response for high-severity incidents.

There are many flavours of site reliability engineering, but the one thing that does seem to hold consistent is that SRE is less about specific technologies than it is about the process and practices.

LinkedIn describes the role of an SRE as the use of a software engineer approach to system administration topics. According to the jobs board Seek, the common annual salary of a software engineer in New South Wales is $130,000 while a systems administrator get roughly $110,000 a year.

The skills needed to be an SRE

Broad practical skills. Tigerspike SRE Tony Wong said that all the core SRE books are free and filled with knowledge for those seeking the career. “Knowing how to utilise the public cloud and work on a few side projects and work towards certifications is very important in solidifying practical skills. Be cloud-agnostic, be program-language-agnostic, because most variations are quite similar; it’s knowing the fundamentals first that allow for easy transitioning,” Wong said.

“Working on personal projects like setting up a web application, monitoring the application, and sourcing the best practises to do certain things have always been key in my professional development,” Wong said.

To become an SRE specifically, Joshua Hui, an application infrastructure engineer at Mortgage Choice who performs the work of an SRE among other functions, advises to start by learning a scripting language, as it is important in systems administration to find a way to automate things.

Dan Adams, an SRE at Google Australia, suggests finding a technology of interest and learning to look at it both broadly and in detail where there will be interesting problems to learn how to solve or fix it.

Continual learning. “Being autonomous and a self-learner is critical as our understanding of the discipline and the technologies we interact with are always changing,” said Anton Engelstad, an SRE with Stax. As with any other industry, to keep learning is essential, especially in IT as technologies change and evolve fast. He explained that an SRE can cover a wide range of personal and technical skills, but rather than being able to cover it all—which it isn’t easy—what is vital is having a passion for continuous learning.

“Having knowledge of how IT systems work in terms of front end, back end, data pipelines, networking, OS, and architecture is still fundamentally core in the SRE’s toolbelt,” said Tigerspike’s Wong. He also believes that networking with other professionals is also very important to see where the trends are going.

“I have seen others left behind because their skill set has become redundant. New technologies emerge, and old ones become obsolete. It is important to stay relevant, otherwise time will leave you behind,” Hui said. His manager has helped by encouraging him to learn skills across other IT functions, making him an asset to the business and also helping his personal development.

“Coming into Google, I didn’t have experience with the scale and types of problems you have with a globally distributed computer system,” Adams said. “I had to learn a different way of thinking about fault tolerance and system consistency, which has been an exciting and interesting challenge.”

Strong communications. Everyone in the business is in a position where they rely on system reliability, so communicating well is crucial especially when things aren’t working. So, for an SRE, “it is important to explain to stakeholders and be transparent about downtime,” Hui said. Most of the time stakeholders get more frustrated from the lack of communication than from systems being down. “Excellent communication skills and willingness to work within a team is vital as SREs act as consultants across development teams,” said Stax’s Engelstad. 

Problem solving. “Also, a passion for solving problems with sound methodology when facing the unknown and the ability to manage risk is essential,” Engelstad said. For example, an SRE has to be able to effectively handle deployments gone bad. “Always have a rollback plan. If things fail—they will—always think of the unhappy path and contingency plans. Know when to roll back, and how to roll back—always super important—especially when it comes to databases,” Hui advises.

“Working with others and understanding different viewpoints will get you very far,” Wong said. “The IT industry is not easy, it takes a lot of trial and error and many hours. Be as flexible as possible. Some may say specialisation is key, but I’m a big believer in being a generalist. It’s the generalist who has the far reaching outlook of the connecting parts and allows for a more fulfilling career.”

Formal qualifications. Engelstad said that when he first started many companies had strict requirements for formal qualifications to make it to the interview stage but after he’d started working, it was much less important “except in the way that it informs how you approach your work.”

Even if a formal qualificaton is not required, Wong said a diploma is a great entry method to tip your toes into an IT career and experiment with all the different niches in the industry.

Building your way to become an SRE

As a fresh role, IT professionals who become site reliability engineers come from different backgrounds having worked in many different roles in information technology before landing an SRE role.

Today, there’s not a defined path to the role other than gathering experience that demonstrates the skills of being an SRE. Testing, development, and operations are all valid paths into the SRE role, Stax’s Engelstad said. In general terms, working as a software developer or systems administrator are both great start but “being able to speak about how reliable the services you built or operated were is even better,” he said.

Engelstad studied network engineering at RMIT where his interest in software development first showed. After graduating, he joined NAB’s graduate program as a programmer, where he stayed for eight years moving through different roles and ending up in the operations team for the internet banking and mobile apps platform.

Having worked mostly in the maintenance side, he was ready for a new challenge and some practical experience building reliable systems. “Turns out there is a lot to it,” Engelstad told Computerworld Australia. “Last year, an opportunity came up to move [from Versent] to Stax and build their site reliability team, which felt like what all my work so far had been leading up to—being a software engineer tasked with ensuring the reliability of Stax, through the whole life cycle of development.”

With an interest in IT and business from a young age, Tigerspike’s Wong got a bachelor of business information system degree from Monash University. In his first job, he worked with financial data for Oxfam, later moving to NAB where he was able to explore agile software development in the banking industry.

After going through the Thales graduate program, he joined Telstra where he explored report analysis and cloud infrastructure risk management and where he was introduced to the site reliability engineering mindset and ways of working.

Mortgage Choice’s Hui had been working at a bank, which he found monotonous and unchallenging. After some time he also noticed he could not see a clear progression within his role and so he left the job and enrolled at university where he studied for a bachelor of science degree.

After his first year at university, he applied for a help desk role with Mortgage Choice, which gave him the “challenge I was after”. “It taught me time management and also prioritisation skills. After a year or so in help desk, I got promoted to junior infrastructure engineer, and after some long hours and a few years in the junior role, I am now in a team of two, in a role of application infrastructure engineer,” Hui said.

Google Australia’s Adams has a degree in physics and one in computer systems engineering, he said his original plan was to be a physicist but he ended up enjoying the computer side of things more due to the ability to build interesting things.

Adams started his career in embedded systems, later working for a games startup as a game systems engineer. He then worked on software and hardware for robotics and autonomous systems for tele-operation mining in remote locations.

“So a few jobs covering a lot of different problems and technologies at a systems level and product engineering,” Adams explained, adding that this is what led him to SRE as you need to look at problems at a system level but also be able to dig into detail when required.

Copyright © 2020 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon