Jump to content

What is the Hadoop ecosystem and how does Apache Spark fit in?


Recommended Posts

I'm having a lot of trouble grasping what exactly a 'Hadoop ecosystem' is conceptually. I understand that you have some data processing tasks that you want to run and so you use MapReduce to split the job up into smaller pieces but I'm unsure about what people mean when they say 'Hadoop Ecosystem'. I'm also unclear as to what the benefits of Apache Spark are and why this is seen as so revolutionary? If it's all in-memory calculation, wouldn't that just mean that you would need higher RAM machines to run Spark jobs? How is Spark different than writing some parallelized Python code or something of that nature.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...

Important Information

By using this site, you agree to our Terms of Use & Guidelines and understand your posts will initially be pre-moderated