Site Reliability Engineer
Department
Engineering
Location
Remote (EMEA)
Timezone(s)
GMT+2 to GMT-8
About PostHog
PostHog is the open source Product OS - it's a suite of product and data tools, built on the modern data stack. We provide product analytics, session recording, feature flags, a/b testing, event pipelines and a data warehouse. PostHog can be deployed to the cloud, or self-hosted on existing infrastructure, removing the need to send user data to a 3rd party.
PostHog was created as an open source project during Y Combinator's W20 cohort and had the most successful B2B software launch on HackerNews since 2012 - with a product that was just 4 weeks old. Since then, more than 10,000 companies have installed the platform, we've had huge success with our paid upgrades, and we have raised $27m from some of the world's top investors, and have shown strong product-led growth - 97% driven by word of mouth.
Despite the 📉 tech market, we're default alive and doing better than ever!
We've been averaging >20% monthly revenue growth, we are default alive, and we didn't raise a huge / now-overpriced round in 2021. Whilst others are focussed on layoffs and struggling to grow into huge valuations, we're focussing on an awesome product for end users, hiring (a handful of) exceptional team members and seeing fantastic increases in revenue as a result.
About the role
We even wrote a blog post about this one!
PostHog is looking for a creative and experienced Site Reliability Engineer to join our Infrastructure team, with members who drove infrastructure at Uber, Facebook, Twitter, and Slack.
We are actively investing in extending our deployment platform to make it easier to install and operate our stack wherever our customers need to run us. If you are interested in automating how we operate a data warehouse that runs anywhere and everywhere, this could be a great role for you
What we value
We are open source - building a huge community around a free-for-life product is key to PostHog's strategy.
We aim to become the most transparent company, ever. In order to enable teams to make great decisions, we share as much information as we can. In our public handbook everyone can read about our roadmap, how we pay (or even let go of) people, what our strategy is, and who we have raised money from. We also have regular team-wide feedback sessions, where we share honest feedback with each other.
Working autonomously and maximizing impact - we don’t tell anyone what to do, everyone chooses what to work on next based on what is going to have the biggest impact on our customers.
Solve big problems - we haven't built our defining feature yet. We are all about acting fast, innovating and iterating
Requirements
Why we need you
We are investing heavily in making PostHog the go-to data and product analytics application if you want to control the data yourself in your own cloud (AWS, GCP, Azure etc.). We have already invested several developer years in building a scalable Helm Chart but we need your help in scaling it even further.
You will also be responsible - with the rest of the team of course - for scaling the components within the helm chart. Things like ClickHouse, Kafka, Zookeeper (for the time being since dependencies are slowly fading), Postgres, Redis, and PostHog services.
Our goal is to make installing and operating our stack as seamless and straight forward as possible.
What you’ll bring
You must have an interest in infrastructure, cloud platforms (GCP & AWS), large scale data systems, and Kubernetes!
Experience in software development and operations, ideally in programming languages such as Golang, Java, Python, Bash, or Ruby.
Experience working with containers and container runtimes such as Kubernetes or Mesos.
You believe in infrastructure as code - you would love this role if you are excited about building large, dynamically scalable data intensive systems.
If this sounds like what you’d love to be doing, we can’t wait to hear from you. If you’re not sure that you exactly fit the above criteria, get in touch anyway. Alignment with our values is just as important as experience! 🙏
Salary
We have a set system for compensation as part of being transparent. Salary varies based on location and level of experience.
Location (based on market rates)
The benchmark for each role we are hiring for is based on the market rate in San Francisco.
Level
We pay more experienced team members a greater amount since it is reasonable to expect this correlates with an increase in skill
Step
We hire into the Established step by default and believe there's a place to have incremental steps to allow for more flexibility.
Base salary
Benefits
Generous, transparent compensation & equity
Unlimited vacation (with a minimum!)
Two meeting-free days per week
Home office
Coworking Credit
Medical, Dental and Vision Insurance
Training budget
Access to our Hedge House
Carbon offsetting
401k/pension contributions
Spill mental health chat
Company offsites
Get more details about all our benefits on the Careers page.
Typical tasks
Here are some open GitHub issues you could help solve:
Your team's mission and objectives
Make deploying, scaling, and managing PostHog easy, fast, and reliable.
Q1 2023 Goals
- Objective: Reduce TOIL
- KR: Migrate PostgreSQL out of Heroku and into the correct AWS account
- Why? Reduce cost and reduce risk of us maxing out the instance
- KR: Consolidate infrastructure into the correct AWS accounts (specifically US)
- Why? This will reduce the toil and AWS expenses
- KR: Migrate PostgreSQL out of Heroku and into the correct AWS account
- Objective: Support other teams and enable other teams to self-serve
- KR: Most of the pipeline team has written/contributed to their own alerting system
- Once the system is set up well, would be good to get the other teams doing this too
- Why? Enable the pipeline team to self-serve their issues
- Metric to track: % of non-infra hero time supporting other teams
- KR: Most of the pipeline team has written/contributed to their own alerting system
- Objective: Keep infra spend flat at $120k/month
- Such as Guido’s list
- Will likely decrease cost first then increase with usage
- Why? Improving our gross profit margin
Roadmap
3-year
- All infrastructure is managed as code
- Cloud is global
- Best in class security and privacy compliance
- Scale beyond 1 Trillion events / month
- Support non-k8s Deploys 🤖
- Painless developer/contributor experience
6 month
- No Heroku
- 5 Billion / month events
- SOC 2 Ready
- All infra is managed as code on prod / staging + EU Ready 🎈
- Rock solid, reliable infrastructure with regular production load testing to inform scaling plans
Interview process
We do 2-3 short interviews, then pay you to do some real-life (or close to real-life) work.
- 1
Application(You are here)
Our talent team will review your application to see how your skills and experience align with our needs.
- 2
Culture interview30-min video call
Our goal is to explore your motivations to join our team, learn why you’d be a great fit, and answer questions about us.
- 3
Technical interview45 minutes, varies by role
You'll meet the hiring team who will evaluate skills needed to be successful in your role. No live coding.
- 4
PostHog SuperDayPaid day of work
You’ll join a standup, meet the team, and work on a task related to your role, offering a realistic view of what it’s like working at PostHog.
- 5
OfferPop the champagne (after you sign)
If everyone’s happy, we’ll make you an offer to join us - YAY!
Apply
(Now for the fun part...)
Just fill out this painless form and we'll get back to you within a few days. Thanks in advance!
Bolded fields are required