The Data Governance Solutions team, part of Apple Data Platform, is focussed on building cutting edge solutions to support Apple’s Data Governance and Compliance requirements for all data ingested, processed and stored within Apple. Our mission is to provide robust, reliable and easy to use tooling and solutions to enable full enforcement of Privacy and Comliance requirements keeping Apple’s users’ data private and secure at all times. Our team works on solving complex problems in the space of PII detection and Data Quality violation detection, incorporating state of the art Gen AI based LLMs for training and inference.
Are you a passionate about building scalable, reliable and maintainable Governance solutions and solving data problems at scale? Come join us and be part of the Data Infrastructure journey.
This role involves managing petabytes of data and designing and implementing new frameworks to build scalable and efficient data processing workflows. The successful candidate will be responsible for ensuring the completeness of all data ingestion and full metadata enrichment covering data classification annotations, dataset descriptions and all essential required tagging, while optimizing for performance and scalability. You will also be responsible for monitoring the performance of the system, optimizing it for cost and efficiency, and solving any issues that arise. This is an exciting opportunity to work on cutting-edge technology and collaborate with cross-functional teams to deliver high-quality software solutions. The ideal candidate should have a strong background in software development, experience with public cloud platforms, and familiarity with distributed databases.
10+ years of experience in software engineering with deep knowledge in computer science fundamentals.Strong in data structures and algorithms. Must write good quality code with test cases and review PR’s in fast faced environment.Fluent in writing code using PythonExtensive experience building ingestion ETL pipelinesExpert in one or more functional or object-oriented programming languages (Scala, Java)Experience or knowledge in distributed data systems like Hadoop, Spark, Kafka or Flink.Strong collaboration and communication (verbal and written) skillsBS, MS, or PhD degree in Computer Science or equivalent