Название: Writing Beautiful Apache Spark Code: Processing massive datasets with ease Автор: Matthew Powers Издательство: Leanpub Год: 2020 Страниц: 212 Язык: английский Формат: pdf (true), mobi, epub Размер: 11.5 MB
Learn how to analyze big datasets in a distributed environment without being bogged down by theoretical topics. The API is vast and other learning tools make the mistake of trying to cover everything. This book only covers what you need to know, so you can explore other parts of the API on your own!
This book teaches Spark fundamentals and shows you how to build production grade libraries and applications. It took years for the Spark community to develop the best practices outlined in this book. This book will fast track your Spark learning journey and put you on the path to mastery.
It’s easy to follow internet tutorials and write basic Spark code in browser editors, but it’s hard to write Spark code that’s readable, maintainable, debuggable, and testable. Spark error messages can be extremely difficult to decipher. You can spend hours tracking down bugs in Spark codebases, especially if your code is messy. You might also write jobs that run for hours and then fail for unknown reasons. Getting jobs like these to execute successfully can take days of trial and error.
The practices outined in this book will save you a lot of time: - Avoiding Spark design patterns that can cause errors - Reusing functions across your organization - Identifying bottlenecks before running production jobs - Catching bugs in the testing environment
Why Scala? Spark offers Scala, Python, Java, and R APIs. This book covers only the Scala API. The best practices for each language are quite different. Entire chapters of this book are irrelevant for SparkR and PySpark users. The best Spark API for an organization depends on the team’s makeup - a group with lots of Python experience should probalby use the PySpark API.
Who should read this book? Spark newbies and experienced Spark programmers will both find this book useful. Noobs will learn how to write Spark code properly right off the bat and avoid wasting time chasing spaghetti code bugs. Experienced Spark coders will learn how to use best practices to write better code, publish internal libraries, and become the Spark superstar at their company.
Скачать Writing Beautiful Apache Spark Code: Processing massive datasets with ease
|