5 questions
You landed you dream job to the highest calling of mankind - MLOps Engineer. Your company has a pipeline in place with a neural network for object classification. You see that the model is deteriorating over time and that there are some new objects. You were preparing the data already for retraining with image normilization.
Which is the most efficient way to retrain your network?
Your pipeline starts a training job from scratch to account for the new data.
Your pipeline starts a transfer learning job to account for the new data.
You train your model locally to account for the new data and update your pipeline respectively.
Your pipeline starts using Spot instances for inference so the model doesn't require the new data.
You are Head of Oscillation. Inside of your experiments inside of your training pipeline you observe that in all training jobs the training accuracy oscillates while doing mini-batch training on a neural network for a classification task.
Which of the following is the MOST LIKELY CAUSE of this problem?
The class distribution is highly imbalanced.
Dataset shuffling is disabled.
The batch size is too big.
The learning rate is too high.
You are Vice President of Learning. Your problems and training jobs all run on SageMaker. Furthermore, you found that all your training jobs can be completed with the SageMaker built-in algorithms.
Which common parameters MUST be given when submitting a training job that use one of the built-in algorithms? (CHOOSE 3.)
The training channel identifying the location of training data on a S3 bucket.
The EC2 instance class specifying whether training will be run on a CPU or GPU instance.
The IAM role that SageMaker can assume to perform tasks on behalf of the users.
Hyperparameters in a JSON array as documented for the algorithm used.
The output path specifying where on a S3 bucket the trained model will persist.
A retail chain has been utilizing Kinesis Data Firehose to ingest purchase details from its network of 20,000 outlets into S3. To facilitate the training of a more advanced machine learning model, training data will need additional but straightforward transformations, and certain characteristics will be merged. Daily retraining of the model is required. Which update will take the LEAST amount of development work, given the vast number of stores and historical data ingestion?
Require that the stores to switch to capturing their data locally on AWS Storage Gateway for loading into S3, then use Glue to do the transformation.
Deploy an EMR cluster running Apache Spark with the transformation logic, and have the cluster run each day on the accumulating records in S3, outputting the transformed records to S3.
Spin up a fleet of EC2 instances with the transformation logic, have them transform the data records accumulating on S3, and output the transformed records to S3.
Insert a Kinesis Data Analytics stream downstream of the Kinesis Data Firehose stream that transforms raw record attributes into simple transformed values using SQL.
You are head of Digits. A client of yours, a producer of automobile engines gathers data from vehicles as they are driven. The time stamp, engine temperature, rotations per minute (RPM), and other sensor measurements are all captured. The business hopes to forecast when an engine may fail, so it can alert drivers in advance to schedule repair. For training purposes, the engine data is placed into a data lake. Which predictive model is the MOST SUITABLE for production deployment?
Add labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem. Use a recurrent neural network (RNN) to train the model to recognize when an engine might need maintenance for a certain fault.
This data requires an unsupervised learning algorithm. Use SageMaker K-Means to cluster the data to recognize when an engine might need maintenance for a certain fault.
Add labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem. Use a convolutional neural network (CNN) to train the model to recognize when an engine might need maintenance for a certain fault.
This data is already formulated as a time series. Use SageMaker seq2seq to model the time series to recognize when an engine might need maintenance for a certain fault.