23 questions
Select the three primary factors which can determine is a Big Data solution is appropriate.
Volatilty
Variety
Volume
Velocity
Value
Which statement best describes Apache Hadoop?
Hadoop is a data warehouse.
Hadoop is a scalable NoSQL database.
Hadoop is a scalable data storage and batch processing framework.
Hadoop is a scalable relational database.
How many copies of each data block are written to HDFS?
3
2
1
4
Which of the following Hadoop modules consists of the common utilities and libraries that support the other Hadoop modules?
Hadoop Common
Hadoop YARN
Hadoop Distributed File System (HDFS)
Hadoop MapReduce
Which of the following Hadoop modules in the distributed file systems which stores the data?
Hadoop YARN
Hadoop Distributed File System (HDFS)
Hadoop MapReduce
Hadoop Common
Which of the following Hadoop modules is the framework for scheduling and execution of data processing?
Hadoop Common
Hadoop MapReduce
Hadoop YARN
Hadoop Distributed File System (HDFS)
Which database type is is most widely adopted?
Relational
NoSQL
Data Warehouse
Which database type provides the greatest flexibility for schema changes?
Relational
NoSQL
Data Warehouse
Which database provides the greatest business intelligence from a variety of sources?
Relational
NoSQL
Data Warehouse
Which database type provides the broadest opportunities for ACID compliance?
Relational
NoSQL
Data Warehouse
Which database type provides horizontal scaling for OLTP databases?
Relational
NoSQL
Data Warehouse
Which database type excels at providing OLAP (Online Analytical Processing) from a variety of data sources?
Relational
NoSQL
Data Warehouse
Select two key advantages of Hadoop 2.X over 1.X!
The Name Node Job Tracker can handle more concurrent jobs
The inclusion of YARN allows for greater scaling of jobs
Faster recovery from Name Node failure do to Standby Name Node
Faster recovery from Name failure do to Secondary Name Node
Task Trackers on Data Nodes can increase the job loads
Which MapReduce join is generally faster?
Map-Side Join
Reduce-Side Join
Which MapReduce join has fewer constraints?
Map-Side join
Reduce-Side join
Select two true statements below which apply to Hive.
Hive requires structured data.
Hive can run on commodity hardware in a distributed fashion.
Hive requires high-end server hardware in order to scale.
Hive queries use SQL.
Hive has the ability to handle unstructured, semi-structured, and structured data.
Which Hive component is used to communicate with the Hadoop framework?
CLI
Metastore
Driver
Thrift server
Which Hive component stores the system catalog?
Thrift server
Driver
CLI
Metastore
Select two true statements below which apply to Pig.
Pig script written or Hadoop 1.X environments must be rewritten for Hadoop 2.X.
Pig programs are written in Pig Latin and must first be compiled by the Pig compiler.
Pig is a scripting language for analyzing large datasets.
Pig Latin scripts can be run without compiling.
Pig requires Java programming.
Which AWS product provides a dynamically scaled, fully managed Hadoop framework?
Amazon Kinesis
Amazon Redshift
Amazon QuickSight
Amazon EMR
Which AWS product provides visualization for business insights?
Amazon Kinesis
Amazon Redshift
Amazon EMR
Amazon QuickSight
Which AWS product provides data warehousing capability?
Amazon Redshift
Amazon Kinesis
Amazon EMR
Amazon QuickSight
Which AWS product provides scalable object storage or data, big or small?
Amazon Redshift
Amazon QuickSight
Amazon S3
Amazon EMR