Senior Machine Learning Research Engineer/Scientist
Company: Metr
Location: Berkeley
Posted on: November 6, 2024
Job Description:
METR is developing evaluations for AI R&D capabilities, such
that evaluators can determine if further AI development risks a
"capabilities explosion", which could be extraordinarily
destabilizing if realized.METR is hiring ML research
engineers/scientists to drive these AI R&D evaluations
forward.Responsibilities:
- Produce tasks/benchmarks that can determine if a model is
dangerous
- Run experiments to determine how elicitation techniques affect
results on our evaluation suite
- Run evaluations internally, and potentially support external
partners in governments and/or frontier AI companies in performing
evaluations for autonomous capabilities
- Improve the tooling that researchers use for designing and
running evaluations
- Collaborate closely with evaluation science researchers to
develop a robust evaluation procedureYour work will increase the
odds that METR's evaluation protocols can robustly predict whether
a new frontier model poses catastrophic risks.
- Understand what kinds of abilities we need to be evaluating
for, and what properties we most need our evaluations to have
- Find the most promising directions to explore; design
experiments and research roadmap
- Collaborate with the threat modeling team, sharing your ML
domain expertise to help us evaluate models for AI R&D
skills
- Research execution
- Rapidly execute experiments, obtain reliable results
- Design sensible pipelines and workflows (know which things are
going to be reused and need to be good versus what things it's ok
to do scrappily)
- Quickly interpret results - recognize what is signal vs noise,
notice when things don't look right and there might be a bug, know
where to look for bugs in ML experiments
- Know how much work different approaches are likely to be and
how promising they are; when you have uncertainties, get
information as quickly as possibleWhat we're looking for
- Have substantial ML research engineering experience, for
example:
- Have research publications related to machine learning in which
you played a major role,
- Have professional experience optimizing compute for inference
or training of a large model, and/or
- Have played a significant role in the training or optimization
of a large modelAn ideal candidate would be a machine learning
researcher with extensive experience working with frontier LLMs and
a track record of successful execution-heavy research projects.We
expect to hire multiple people for this position, and their work
will focus more on either the engineering or research side of the
role, depending on their strengths.About METRMETR is a non-profit
which does empirical research to determine whether frontier AI
models pose a significant threat to humanity. It's robustly good
for civilization to have a clear understanding of what types of
danger AI systems pose, and know how high the risk is. You can
learn more about our goals from our videos (overall goals, recent
update).Some highlights of our work so far:
- Establishing autonomous replication evals: Thanks to our work,
it's now taken for granted that autonomous replication (the ability
for a model to independently copy itself to different servers,
obtain more GPUs, etc) should be tested for. For example, labs
pledged to evaluate for this capability as part of the White House
commitments.
- Pre-release evaluations: We've worked with OpenAI and Anthropic
to evaluate their models pre-release, and our research has been
widely cited by policymakers, AI labs, and within government.
- Inspiring lab evaluation efforts: Multiple leading AI companies
are building their own internal evaluation teams, inspired by our
work.
- Early commitments from labs: Anthropic credited us for their
recent Responsible Scaling Policy (RSP), and OpenAI recently
committed to releasing a Risk-Informed Development Policy (RDP).
These fit under the category of "evals-based governance", wherein
AI labs can commit to things like, "If we hit capability threshold
X, we won't train a larger model until we've hit safety threshold
Y".We've been mentioned by the UK government, Obama, and others.
We're sufficiently connected to relevant parties (labs,
governments, and academia) that any good work we do or insights we
uncover can quickly be leveraged.LogisticsDeadline to apply: None.
Applications will be reviewed on a rolling basis.We encourage you
to apply even if you do not believe you meet every single
qualification. Not all strong candidates will meet every single
qualification as listed. Research shows that people who identify as
being from underrepresented groups are more prone to experiencing
imposter syndrome and doubting the strength of their candidacy, so
we urge you not to exclude yourself prematurely and to submit an
application if you're interested in this work. We think AI systems
like the ones we're building have enormous social and ethical
implications. We think this makes representation even more
important, and we strive to include a range of diverse perspectives
on our team.Apply for this jobWe encourage you to apply even if
your background may not seem like the perfect fit! We would rather
review a larger pool of applications than risk missing out on a
promising candidate for the position. If you lack US work
authorization and would like to work in-person (preferred), we can
likely sponsor a cap-exempt H-1B visa for this role.We are
committed to diversity and equal opportunity in all aspects of our
hiring process. We do not discriminate on the basis of race,
religion, national origin, gender, sexual orientation, age, marital
status, veteran status, or disability status. We welcome and
encourage all qualified candidates to apply for our open
positions.Registering interest is quick; our main question can be
answered with a few bullet points. Register interest!
#J-18808-Ljbffr
Keywords: Metr, Tracy , Senior Machine Learning Research Engineer/Scientist, Engineering , Berkeley, California
Didn't find what you're looking for? Search again!
Loading more jobs...