Big Data Engineer (The Data Pipeline Innovator)
Company: Unreal Gigs
Location: San Francisco
Posted on: November 11, 2024
Job Description:
Are you passionate about handling massive datasets and building
the infrastructure that enables complex data analysis and machine
learning at scale? Do you excel in creating robust, scalable data
pipelines that fuel data-driven decision-making? If you're ready to
tackle the challenges of big data, our client has the perfect role
for you. We're seeking a Big Data Engineer (aka The Data Pipeline
Innovator) to architect and maintain high-performance data systems
that empower analytics and support advanced data processing
needs.As a Big Data Engineer at our client, you'll collaborate with
data scientists, analysts, and software engineers to design,
implement, and optimize big data platforms. Your expertise in data
engineering, distributed systems, and cloud infrastructure will be
critical to ensuring that our data ecosystem is efficient,
reliable, and scalable.Key Responsibilities:
- Design and Build Scalable Data Pipelines:
- Architect and implement data pipelines for ETL processes using
tools like Apache Spark, Kafka, and Hadoop. You'll create data
workflows that handle high-volume, high-velocity data and ensure
seamless integration across systems.
- Optimize Big Data Storage and Processing:
- Develop and manage data storage solutions (e.g., HDFS, S3,
Cassandra) that are optimized for performance and cost-efficiency.
You'll configure distributed processing systems to support
efficient data retrieval and transformation.
- Collaborate on Data Strategy and Integration:
- Work closely with data scientists, analysts, and other
engineers to align big data architecture with analytics goals.
You'll ensure data availability and integrity across systems to
support business objectives.
- Implement Data Quality and Governance Standards:
- Develop processes and tools to monitor data quality and enforce
data governance policies. You'll ensure data is accurate, reliable,
and secure through regular checks and validation processes.
- Enhance Data Processing with Automation:
- Use tools like Apache Airflow or AWS Glue to automate data
workflows and reduce manual processing. You'll implement scripts
and automation that streamline data handling and improve
efficiency.
- Monitor and Troubleshoot Data Systems:
- Use monitoring tools to track system performance and address
issues proactively. You'll troubleshoot and resolve any bottlenecks
or failures to maintain optimal data processing capabilities.
- Stay Updated on Big Data Trends and Technologies:
- Keep up with advancements in big data technologies and tools.
You'll integrate new techniques and platforms that align with
business needs and promote innovation. Required Skills:
- Big Data Platform Proficiency: Extensive experience with big
data technologies such as Apache Spark, Hadoop, Kafka, and Hive.
You're skilled at handling high-volume data and distributed
processing.
- Data Pipeline and ETL Knowledge: Proven ability to design,
build, and maintain ETL processes for massive datasets. You can
handle both real-time and batch data processing requirements.
- Programming and Scripting: Proficiency in programming languages
like Python, Java, or Scala for data processing and automation.
Experience with SQL for data querying and manipulation is
essential.
- Cloud Data Services Expertise: Familiarity with cloud platforms
such as AWS, GCP, or Azure, including their big data and storage
services (e.g., S3, BigQuery, Azure Data Lake).
- Data Quality and Governance: Strong understanding of data
quality standards and governance practices, with experience in
implementing data validation and monitoring frameworks. Educational
Requirements:
- Bachelor's or Master's degree in Computer Science, Data
Engineering, Information Technology, or a related field. Equivalent
experience in data engineering or big data management may be
considered.
- Certifications in big data or cloud technologies (e.g.,
Cloudera Certified Data Engineer, AWS Certified Big Data -
Specialty, Google Professional Data Engineer) are a plus.
Experience Requirements:
- 5+ years of experience in data engineering, with at least 3+
years focusing on big data technologies and high-scale data
environments.
- Experience in distributed systems and large-scale data storage
management.
- Familiarity with containerization (Docker, Kubernetes) for
deploying data processing environments is advantageous.
- Health and Wellness: Comprehensive medical, dental, and vision
insurance plans with low co-pays and premiums.
- Paid Time Off: Competitive vacation, sick leave, and 20 paid
holidays per year.
- Work-Life Balance: Flexible work schedules and telecommuting
options.
- Professional Development: Opportunities for training,
certification reimbursement, and career advancement programs.
- Wellness Programs: Access to wellness programs, including gym
memberships, health screenings, and mental health resources.
- Life and Disability Insurance: Life insurance and
short-term/long-term disability coverage.
- Employee Assistance Program (EAP): Confidential counseling and
support services for personal and professional challenges.
- Tuition Reimbursement: Financial assistance for continuing
education and professional development.
- Community Engagement: Opportunities to participate in community
service and volunteer activities.
- Recognition Programs: Employee recognition programs to
celebrate achievements and milestones.
#J-18808-Ljbffr
Keywords: Unreal Gigs, Tracy , Big Data Engineer (The Data Pipeline Innovator), Engineering , San Francisco, California
Didn't find what you're looking for? Search again!
Loading more jobs...