What you’ll do…
Position:
Staff Data Scientist
Job Location:
10790 Parkridge Blvd, Reston, VA 20191
Duties:
Data Source Identification: support the understanding of the priority order of requirements and service level agreements. Help
identify
the most suitable source for data that is fit for purpose. Perform initial data quality checks on extracted data. Data Strategy: understand, articulate, and apply principles of the defined strategy to routine business problems that involve a single function. Model Assessment and Validation:
Identify
the model evaluation metrics. Apply best practice techniques for model testing and tuning to assess accuracy, fit, validity, and robustness for multi-stage models and model ensembles. Data Visualization: generate
appropriate graphical
representations of data and model outcomes. Understand customer requirements to design
appropriate data
representation for multiple data sets. Work with User Experience designers and User Interface engineers as
required
to build front end applications. Present to and influence the team and business audience using the
appropriate data
visualization frameworks and conveys clear messages through business and stakeholder understanding. Customize communication style based on stakeholder under guidance and
leverages
rational arguments. Guide and mentor junior associates on story types, structures, and techniques based on context. Understanding Business Context: Provide recommendations to business stakeholders to solve complex business issues. Develop business cases for projects with a projected return on investment or cost savings. Translate business requirements into projects, activities, and tasks and aligns to overall business strategy and develops domain specific artifact. Serve as an interpreter and conduit to connect business needs with tangible solutions and results.
Identify
and recommend relevant business insights
pertaining to
their area of work. Tech. Problem Formulation: translate/ co-own business problems within one’s discipline to data related or mathematical solutions. Identify
appropriate methods/tools
to be
leveraged
to provide a solution for the problem. Share use cases and gives examples to
demonstrate
how the method would solve the business problem. Analytical Modeling: select
appropriate modeling
techniques for complex problems with large scale, multiple structured and unstructured data sets. Select and develop variables and features iteratively based on model responses in collaboration with the business. Conducts exploratory data analysis activities (for example, basic statistical analysis, hypothesis testing, statistical inferences) on available data.
Identify
dimensions and designs of experiments and create test and learn frameworks. Interpret data to
identify
trends to go across future data sets. Create continuous, online model learning along with iterative model enhancements. Develop newer techniques (for example, advanced machine learning algorithms, auto ML) by
leveraging
the latest trends in machine
learning,
artificial intelligence to train algorithms to apply models to new data sets. Guide the team on feature engineering, experimentation, and advanced modeling techniques to be used for complex problems with unstructured and multiple data sets (for example, streaming data, raw text data). Model Deployment and Scaling: deploy models to production. Continuously log and track model behavior once it is deployed against the defined metrics.
Identify
model parameters which may need modifications depending on scale of deployment. Code Development and Testing: write code to develop the required solution and application features by
determining
the
appropriate programming
language and
leveraging
business, technical, and data requirements. Create test cases to review and
validate
the proposed solution design. Create
proofs
of concept. Test the code using the
appropriate testing
approach.
Minimum education and experience required:
Master’s degree or the equivalent in Statistics, Mathematics, Computer Science, or a related field plus 2 years of experience in analytics or a related
field OR Bachelor’s degree or the equivalent in Statistics, Mathematics, Computer Science, or a related field plus 4 years of experience in analytics or a related field OR 6 years of experience in analytics or a related field.
Skills required:
Must have experience with: Coding in an object-oriented programming language such as Python, C++, or Java; Implementing backend structures, microservices, and web services using Python frameworks like Fast API and
Plotly
Dash; Testing microservices and model pipelines using tools such as
pytest
; Developing and debugging machine learning models; Selecting appropriate datasets and cleaning data to meet data requirements; Manipulating and drawing insights from large datasets using Python packages like NumPy, SciPy, and Pandas; Querying and analyzing large datasets, optimizing complex queries using advanced knowledge of distributed datastores such as SQL and NoSQL in platforms like Azure, Google Cloud Platform (GCP), Data Bricks, Teradata, and Microsoft SQL Server; Data visualization using tools like Matplotlib,
Pyplot
, ggplot2, Tableau, and Power BI; Generating appropriate graphical representations of data and model outcomes; Conducting statistical analysis, including hypothesis testing and statistical inference, using Python, R, and SQL; Machine learning concepts and the ability to select appropriate models for estimation and evaluation; Feature construction and feature selection for modeling; NLP embedding models in frameworks like Tensor flow,
Keras
, or
PyTorch
to reveal insights hidden within large volumes of textual data; Applying deep learning methods and neural networks using frameworks like
PyTorch
, Tensor flow, or Mx Net; Improving upon existing machine learning methodologies by fine-tuning model parameters using optimization techniques like Hyper opt,
Optuna
, and scikit-learn; Building scalable data pipelines and extract load and transfer (ETL) jobs using pipeline tools like Airflow and Luigi, and large-scale computing systems like Hadoop, Spark, and Hive;
MLOps
practices including good design documentation, unit testing, integration testing and source code control (git), CI/CD development environments and tools like Jenkins, Circle CI; Cloud services like AWS or Azure or GCP; and Influencing and advocating for the adoption of AI solutions to address complex problems in various domains, with a focus on engaging end-users, customers, and associates to drive business outcomes.
Employer
will accept any amount of experience with the required skills.
#LI-DNP #LI-DNI
Wal-Mart is an Equal Opportunity Employer
.