Senior Software engineer- Reliability
Company: Testing Solutions GmbH
Location: Palo Alto
Posted on: November 13, 2024
Job Description:
Luma's mission is to build multimodal AI to expand human
imagination and capabilities.We believe that multimodality is
critical for intelligence. To go beyond language models and build
more aware, capable and useful systems, the next step function
change will come from vision. So, we are working on training and
scaling up multimodal foundation models for systems that can see
and understand, show and explain, and eventually interact with our
world to effect change.The SRE role at Luma AI sits in the
infrastructure team, and is responsible for defining, measuring and
improving the reliability of Luma's GPU clusters. The SRE team
works closely with the research teams to improve the functioning of
the existing research platform and build the future platform. This
is the team that helps make the infrastructure enabling progress at
the leading AI lab.Startup Mindset
- Value velocity and execution.
- Communicate clearly.
- Focus on building what will matter to users and the
product.
- Be resourceful at finding creative ways to overcome
challenges.Experience
- Proven work experience 5+ yrs as a reliability engineer,
production engineer, infrastructure software engineer or a similar
role in a fast-paced, rapidly scaling company.
- Strong proficiency in GPU cloud infrastructure, including the
underlying concepts of scheduling, scaling, cloud storage,
networking and security.
- Proficiency in programming/scripting languages.
- Experience with containerization technologies and container
orchestration platforms like Kubernetes or equivalent.
- Knowledge of IaC tools such as Terraform or CloudFormation or
equivalent.
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration skills.
- Experience with observability tools; examples include DataDog,
Prometheus, Grafana, Splunk and ELK stack or similar.
- Knowledge of security best practices in cloud
environments.
- Good to have experience as an SRE within the AI/ML space is
strongly preferred.
- Please note this role is not meant for recent grads.Benefits
- Equity grant to reflect the incredible value you will bring to
Luma, with annual refreshes.
- Excellent salary and benefits.
- Full health, dental, and vision coverage.
- Latest and greatest gear.
- Stipends towards wellness, house cleaner, and
phone/internet.
- Unlimited paid time off with 12 days minimum.
- Unlimited sick days.Why Join Luma AI
- You will get to work with the world best AI researchers,
shipping their research to millions of users around the world.
- You will be equipped with all the tools, technologies,
resources and AI tools you need to get the job done.
- We build. We ship. Your work will matter to people.
- We are building a very widely usable product, and you'll get to
work on equally wide variety of challenging problems.
- We have fantastic traction from early customers, whom you'll
get to work directly with.
- We have backing of some of the best VCs in Silicon Valley, and
Angels from across the industry.In addition to cash base pay,
you'll also receive a sizable grant of Luma's equity.The pay range
for this position is $180000-250000/yr for Bay Area. Base pay
offered may vary depending on job-related knowledge, skills,
candidate location, and experience.Your application is reviewed by
real people.
#J-18808-Ljbffr
Keywords: Testing Solutions GmbH, Tracy , Senior Software engineer- Reliability, IT / Software / Systems , Palo Alto, California
Didn't find what you're looking for? Search again!
Loading more jobs...