Description
What is Spark? Why there is a serious buzz going on about this technology? I hope this Spark introduction tutorial will help to answer some of these questions. Apache Spark is an open-source cluster computing system that provides high-level API in Java, Scala, Python and R. It can access data from HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source. And run in Standalone, YARN and Mesos cluster manager. What is Spark tutorial will cover Spark ecosystem components, Spark abstraction - RDD, transformation, and action in Spark RDD? The objective of this introductory guide is to provide Spark Overview in detail, its history, Spark architecture, deployment model and What is Spark? Apache Spark is a general-purpose & lightning fast cluster computing system. It provides high-level API. For example, Java, Scala, Python and R. apache Spark is a tool for Running Spark Applications. Spark is 100 times faster than Bigdata Hadoop and 10 times faster than accessing data from disk. Spark is written in Scala but provides rich APIs in Scala, Java, Python, and R. It can be integrated with Hadoop and can process existing Hadoop HDFS data. • Hadoop MapReduce can only perform batch processing. • Apache Storm / S4 can only perform stream processing. • Apache Impala / Apache Tez can only perform interactive processing • Neo4j / Apache Giraph can only perform graph processing Hence in the industry, there is a big demand for a powerful engine that can process the data in real-time (streaming) as well as in batch mode. There is a need for an engine that can respond in sub-second and perform in-memory processing. Apache Spark Definition says it is a powerful open-source engine that provides real-time stream processing, interactive processing, graph processing, in-memory processing as well as batch processing with very fast speed, ease of use and standard interface. For more information about Apache, Spark Components check this. https://goo.gl/1NwH5B
Apache Spark Tutorial for Beginners Part 1
