Data Platform Engineer

Posted 01 September 2021
Salary Competitive
LocationSan Francisco
Job type Permanent
Discipline Data & AI
Reference48135
Contact NameBethany Hutchins
Remote working Hybrid/Flexible

Job description

Are you looking to join a team who is building and optimizing a data platform that delivers near-real-time content to a global audience?

Join a team of passionate Data Engineers as we deliver honest news on an engaging news media platform to millions of users daily! 


Responsibilities

This is a hybrid role, incl. data engineering and system development:

  • Design, develop, setup and maintain new services, libraries, tools, frameworks for data processing or management, and investigate new algorithms to increase efficiency for Data Processing, such as ETL, Data Pipelines, OLAP DBMS, real-time messages and streams processing, data-sync between systems, etc.
  • Develop tooling for system performance evaluation, monitoring and tuning of the data processing procedures or platforms, get insights of efficiency and stabilizability and make continuous improvement, such as optimizing distributed query engines, computing resource management and isolation, multi-tier storage systems, etc.
  • Own and maintain the key data processing portfolios such as building and taking care of the environment, trouble-shooting and being responsible to the on-call of incidents. Work closely with data architecting/modeling roles to understand ways to implement the data service, and interact with Site Reliability Engineering (SRE) team to deploy the environments and drive production excellence.
  • Devise system, tooling and approaches for data privacy and security. Establish access control, create processes to handle sensitive data.
  • Diagnose and resolve complex technical challenges for data accessing or processing. Using elegant and systematic rather than ad-hoc methods to help other teams tuning the performance and improving stability.

Requirements of Successful Candidates

  • BS/MS degree in computer science or equivalent science/engineering degree with 5 or more years of experience
  • Strong Programming skills and experiences with deep understanding of data structures and algorithms are required for building efficient and stable solutions
  • Rich experiences with one or more programming languages such as Java, Scala, C++ or Python; familiar with agile development and manage testing skills
  • Need certain knowledge on shell scripts and operating systems, especially on Linux
  • Good understanding of modern bigdata technologies and ecosystems
  • Familiar with Hadoop, Spark, Hive, Presto, Redis and Flink
  • Familiar with modern data stores either RDBMS or NoSQL/NewSQL stores (such as MySQL, Cassandra or ScyllaDB, TiDB etc); have experiences in developing application or function-extensions on such data stores
  • Be able to implement and tune complicated heavy-lifting data flows (ETLs or pipelines), familiar with certain toolings
  • Capability of system design with good modularity and extensibility
  • Familiar with system/module design methods and toolings such as UML
  • Be able to draft the user-understandable blueprint and precise, detailed designs
  • Experience of building highly scalable distributed systems
  • Able to design and implement distributed services with scalability and performance in mind
  • Able to debug and troubleshooting performance and reliability problems

Preferred Requirement

  • Experience with cloud based architecture (e.g. Amazon Web Services)
  • Strong interest in high performance, high availability, hive volume data processing
  • Good experience in data management, integration, security, and auditing

Benefits

  • Equity included
  • 100% medical, dental and vision insurance coverage (60% coverage for dependents)
  • 401k matching program
  • Free lunch, snacks, drinks, etc.
  • Pet friendly office