Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark (Fourth Early Release)

Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark (Fourth Early Release) КНИГИ » ПРОГРАММИНГ

Название: Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark (Fourth Early Release)
Автор: Mahmoud Parsian
Издательство: O’Reilly Media, Inc.
Год: 2021-09-10
Страниц: 390
Язык: английский
Формат: epub
Размер: 10.1 MB

Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark.

Why should we use Spark? Spark is a powerful analytics engine for large scale data processing. The most important reasons for using Spark are listed:

• Spark is simple, powerful, and fast (uses RAM rather than disk — Spark runs workloads 100x faster.)

• Spark is open-source, free, and can solve any big data problem

• Spark runs everywhere (Hadoop, Mesos, Kubernetes, standalone, or in the cloud).

• Spark can read/write data from/to many data sources

• Spark can read/write data in row-based and column-based (such as Parquet and ORC) formats

In a nutshell, Spark unlocks the power of data by handling big data with power, ease of use, and speed. Spark is one of the best choices for large-scale data processing and for solving MapReduce problems and beyond. Spark unlocks the power of data by handling big data with powerful APIs and speed. Using MapReduce/Hadoop to solve big data problems is complex and you have to write ton of low level code to solve primitive problems — this is where he power and simplicity of Spark comes in to solve complex big data problems. Apache Spark is much faster than Apache Hadoop because it uses in-memory caching and optimized execution for fast performance, and it supports general batch processing, streaming analytics, machine learning, graph algorithms, and SQL queries.

Spark’s “native” language is Scala, but you can use language APIs to run Spark code from other programming languages (for example, Java, R, and Python). In this book, I teach you how to use PySpark to solve big data problems in Spark. In this book, you will learn how to solve your big data problems in Spark by expressing your solution in PySpark. You will lean how to read your data and represent it as an RDD and DataFrame. RDD is a fundamental data abstraction of Spark. DataFrame (a distributed table of rows with named columns) in Spark allows developers to impose a structure onto a distributed collection of data, allowing higher-level abstraction. Once your data is represented as an RDD or a DataFrame, then you may apply transformation functions (such as mappers, filters, reducers) on them to transform your data to your desired form. I have presented many Spark transformations, which can be used to solve your ETL, analysis, and data intensive computations.

Скачать Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark (Fourth Early Release)

Скачать с Turbobit

ОТСУТСТВУЕТ ССЫЛКА/ НЕ РАБОЧАЯ ССЫЛКА ЕСТЬ РЕШЕНИЕ, ПИШЕМ СЮДА!

Автор: Ingvar16 11-09-2021, 00:46 | Напечатать |

Уважаемый посетитель, Вы зашли на сайт как незарегистрированный пользователь.

С этой публикацией часто скачивают:

Advanced Analytics with PySpark (Early Release) Название: Advanced Analytics with PySpark (Early Release) Автор: Akash Tandon & Sandy Ryza & Uri Laserson & Sean Owen & Josh Wills...

Data Analysis with Python and PySpark (MEAP) Название: Data Analysis with Python and PySpark (MEAP) Автор: Jonathan Rioux Издательство: Manning Publications Год: 2020 Страниц: 259 Язык:...

Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark (Early Release) Название: Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark (Early Release) Автор: Mahmoud Parsian Издательство:...

Learning Spark: Lightning-Fast Data Analytics 2nd Edition Название: Learning Spark: Lightning-Fast Data Analytics, Second Edition Автор: Jules S. Damji, Brooke Wenig, Tathagata Das, and Denny Lee...

PySpark Algorithms Название: PySpark Algorithms Автор: Mahmoud Parsian Издательство: Amazon Digital Services LLC Год: 2019 Формат: epub Страниц: 682 Размер: 12.8 Mb...

Learning Spark, 2nd Edition (Early Release) Название: Learning Spark, 2nd Edition (Early Release) Автор: Jules Damji, Denny Lee, Brooke Wenig, Tathagata Das Издательство: O'Reilly Media Год:...

Big Data Processing with Apache Spark Название: Big Data Processing with Apache Spark Автор: Srini Penchikala Издательство: Год: 2018 Страниц: 104 Формат: PDF Размер: 10 Mb Язык: English...

Learning Spark Streaming: Best Practices for Scaling and Optimizing Apache Spark Название: Learning Spark Streaming: Best Practices for Scaling and Optimizing Apache Spark Автор: Francois Garillot, Gerard Maas Издательство:...

Advanced Analytics with Spark: Patterns for Learning from Data at Scale, 2nd Edition Название: Advanced Analytics with Spark: Patterns for Learning from Data at Scale, 2nd Edition Автор: Sandy Ryza, Uri Laserson, Sean Owen, Josh...

Big Data Analytic with Spark Название: Big Data Analytic with Spark Автор: Mohammed Guller Издательство: Apress Год: 2015 Формат: PDF Размер: 10 Мб Язык: английский / English ...

Информация

Посетители, находящиеся в группе Гости, не могут оставлять комментарии к данной публикации.