Spark The Definitive Guide
User Manual:
Open the PDF directly: View PDF
Page Count: 601 [warning: Documents this large are best viewed by clicking the View PDF Link!]
- Preface
- I. Gentle Overview of Big Data and Spark
- II. Structured APIs—DataFrames, SQL, and Datasets
- 4. Structured API Overview
- 5. Basic Structured Operations
- Schemas
- Columns and Expressions
- Records and Rows
- DataFrame Transformations
- Creating DataFrames
- select and selectExpr
- Converting to Spark Types (Literals)
- Adding Columns
- Renaming Columns
- Reserved Characters and Keywords
- Case Sensitivity
- Removing Columns
- Changing a Column’s Type (cast)
- Filtering Rows
- Getting Unique Rows
- Random Samples
- Random Splits
- Concatenating and Appending Rows (Union)
- Sorting Rows
- Limit
- Repartition and Coalesce
- Collecting Rows to the Driver
- Conclusion
- 6. Working with Different Types of Data
- 7. Aggregations
- 8. Joins
- 9. Data Sources
- 10. Spark SQL
- 11. Datasets
- III. Low-Level APIs
- 12. Resilient Distributed Datasets (RDDs)
- 13. Advanced RDDs
- 14. Distributed Shared Variables
- IV. Production Applications
- 15. How Spark Runs on a Cluster
- 16. Developing Spark Applications
- 17. Deploying Spark
- 18. Monitoring and Debugging
- The Monitoring Landscape
- What to Monitor
- Spark Logs
- The Spark UI
- Debugging and Spark First Aid
- Spark Jobs Not Starting
- Errors Before Execution
- Errors During Execution
- Slow Tasks or Stragglers
- Slow Aggregations
- Slow Joins
- Slow Reads and Writes
- Driver OutOfMemoryError or Driver Unresponsive
- Executor OutOfMemoryError or Executor Unresponsive
- Unexpected Nulls in Results
- No Space Left on Disk Errors
- Serialization Errors
- Conclusion
- 19. Performance Tuning
- V. Streaming
- 20. Stream Processing Fundamentals
- 21. Structured Streaming Basics
- 22. Event-Time and Stateful Processing
- 23. Structured Streaming in Production
- VI. Advanced Analytics and Machine Learning
- 24. Advanced Analytics and Machine Learning Overview
- 25. Preprocessing and Feature Engineering
- 26. Classification
- 27. Regression
- 28. Recommendation
- 29. Unsupervised Learning
- 30. Graph Analytics
- 31. Deep Learning
- VII. Ecosystem
- Index