Staff Test Engineer
Company: Support Revolution
Location: San Jose
Posted on: November 13, 2024
|
|
Job Description:
Select how often (in days) to receive an alert: Create Alert
Location: San Jose, California, United States About
Supermicro:Supermicro is a Top Tier provider of advanced server,
storage, and networking solutions for Data Center, Cloud Computing,
Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded
customers worldwide. We are the #5 fastest growing company among
the Silicon Valley Top 50 technology firms. Our unprecedented
global expansion has provided us with the opportunity to offer a
large number of new positions to the technology community. We seek
talented, passionate, and committed engineers, technologists, and
business leaders to join us.
-Job Summary:As a global leader in server technologies, Supermicro
has been growing extremely fast in many key markets such as Cloud
Computing, Big Data, HPC, AI and Storage, etc. To meet the market
demand, Supermicro is developing end to end enterprise IT solutions
with compute, storage, networking all integrated into full rack or
multi-rack level systems. Staff Test Engineer plays an important
role in designing, implementing, testing and deploying rack system
solutions for data center and enterprise customers.Essential Duties
and Responsibilities:Includes the following essential duties and
responsibilities (other duties may also be assigned):
--- Deploy Rack/Cluster infrastructure and execute comprehensive
system level testing on the latest GPUs, CPU processors, Network
and Storage, encompassing functionality, compatibility,
performance, stress, and reliability testing, leveraging
proprietary in-house tools
--- Conduct proof of concept design and testing. Establish
expertise in HPC/AI applications and benchmarks, providing
optimized benchmarks for HPC/AI applications by fine-tuning system
settings, optimizing OS/network configurations, and demonstrating
strong problem-solving skills and building robust processes and
procedures for HPC/AI solutions
--- Lead day-to-day operational support for Cluster, Storage, HPC
and Cloud infrastructure. Identify and document hardware and
software quality issues. Collaborate with product management and
other Engineering teams to integrate enhancements into future
products
--- Write technical documents for test procedures, test reports and
troubleshooting procedures related to servers/networks/clusters
software and hardware to facilitate knowledge sharing
--- Deliver on-site deployment services to ensure customer
acceptance verification and satisfaction
--- Write automation tools for cluster deployment and test
environmentQualifications:--- BS/MS in Electrical Engineering,
Computer Engineering or a related field, MS preferred
--- 12+ years of work-related experience in server/network/storage
hardware configuration, testing, debugging and troubleshooting
--- 12+ years of work-related experience in DevOps or in cloud
environments, including but not limited to Docker/Containers and
Kubernetes
--- Experience with leading AI/ML frameworks such as PyTorch,
TensorFlow, etc.
--- Familiar with TCP/IP protocol stack, UDP, IPv4-IPv6, DNS, DHCP
and other Application protocols
--- Familiar with HPC, AI or Cloud benchmark tests, networking
architecture
--- Excellent Programming skills in Python and shell scripting
--- Strong communication skills and strong sense of teamwork and
good team player
--- Familiar with MLPerf Training/Inference benchmark, LLM, HPL-AI
or RCCL/NCCL is a plus
--- CCNA/CCNP, OpenStack, Openshift, Azure or AWS is a plusSalary
Range$160,000 - $179,000 -The salary offered will depend on several
factors, including your location, level, education, training,
specific skills, years of experience, and comparison to other
employees already in this role. In addition to a comprehensive
benefits package, candidates may be eligible for other forms of
compensation, such as participation in bonus and equity award
programs.EEO StatementSupermicro is an Equal Opportunity Employer
and embraces diversity in our employee population. It is the policy
of Supermicro to provide equal opportunity to all qualified
applicants and employees without regard to race, color, religion,
sex, sexual orientation, gender identity, national origin, age,
disability, protected veteran status or special disabled veteran,
marital status, pregnancy, genetic information, or any other
legally protected status.
#J-18808-Ljbffr
Keywords: Support Revolution, Tracy , Staff Test Engineer, Engineering , San Jose, California
Click
here to apply!
|