You will partner with business partners to identity opportunities to leverage big data technologies in support of Pharmacy Personalization with a common set of tools and infrastructure to make analytics faster, more insightful, and more efficient. You will build and architect next-generation Big Data machine learning framework developed on a group of core Hadoop technologies. You will design highly scalable and extensible Big Data platforms which enables collection, storage, modeling, and analysis of massive data sets from numerous channels. You will define and maintain data architecture, focusing on applying technology to enable business solutions. You will assess and provide recommendations on business relevance, with appropriate timing and deployment. You will perform architecture design, data modeling, and implement Big Data platforms and analytic applications. You will bring a DevOps mindset to enable big data and batch/real-time analytical solutions that leverage emerging technologies. You will develop prototypes and proof of concepts for the selected solutions, and implement complex big data projects. You will apply a creative mindset to a focus on collecting, parsing, managing, and automating data feedback loops in support of business innovation.
*Required Qualifications : Bolded ones are important
• 3-5 years of professional IT experience including the following:
Hands-on experience with “big data” platforms
• Debugging issues related to Spark performance in a big data environment.
• Coding and architecting of end-to-end applications on modern data processing technology stack (e.g. Hadoop, Cloud, Spark ecosystem technologies)
• Build continuous integration/continuous delivery, test-driven development, and production deployment frameworks
• Lead conversations with infrastructure teams (on-prem & cloud) on analytics application requirements (e.g., configuration, access, tools, services, compute capacity, etc.)
• Platforms: Hadoop, Spark, Kafka, Kinesis, Oracle, TD
• Proficiency in the following programming languages Languages: Python, PySpark, Hive, Shell Scripting, SQL, Pig, Java / Scala
• Proficiency in “big data” technologies Map-Reduce, Conda, H2O, Spark, Airflow / Oozie / Jenkins, Hbase, Pig, No-SQL, Chef / Puppet, Git
• Familiarity with building data pipelines, data modeling, architecture & governance concepts
• Experience implementing ML models and building highly scalable and high availability systems
• Experience operating in distributed environments including cloud (Azure, GCP, AWS etc.)
• Experience building, launching and maintaining complex analytics pipelines in production
*Preferred Qualifications :
Exposure to Healthcare Domain knowledge. Masters in Data Science or Business Analytics Proficiency in R and/or Python
Experience with cloud computing environment (ideally Microsoft Azure) and the organizational risks of transitioning from on-prem to cloud infrastrucuture.
Experience with automation tools: eg, Jenkins, Airflow or Control-M
B.S. Computer Science, Engineering, Astronomy/Physics, Economics, Math or related fields