External Table
Luca's blog on data engineering, data platforms, and performance.
Links
(Move to ...)
Luca's Home Page
Luca's Twitter
Luca's GitHub
Blog of the database services at CERN
▼
Showing posts with label
Apache Spark
.
Show all posts
Showing posts with label
Apache Spark
.
Show all posts
Wednesday, October 1, 2025
Why I’m Loving Spark 4’s Python Data Source (with Direct Arrow Batches)
›
TL;DR: Apache Spark 4 lets you build first-class data sources in pure Python. If your reader yields Arrow RecordBatch objects, Spark i...
Friday, April 26, 2024
Building an Apache Spark Performance Lab: Tools and Techniques for Spark Optimization
›
Apache Spark is renowned for its speed and efficiency in handling large-scale data processing. However, optimizing Spark to achieve maximum ...
Wednesday, September 27, 2023
Enhancing Apache Spark Performance with Flame Graphs: A Practical Example Using Grafana Pyroscope
›
TL;DR Explore a step-by-step example of troubleshooting Apache Spark job performance using flame graph visualization and profiling. Discove...
Friday, August 11, 2023
Performance Comparison of 5 JDKs on Apache Spark
›
Dive into a comprehensive load-testing exploration using Apache Spark with CPU-intensive workloads. This blog provides a comparative analysi...
Thursday, February 23, 2023
Introduction to Spark APIs for Data Processing
›
Introduction to Apache Spark APIs for Data Processing This is a self-paced and open introduction course to Apache Spark. Theory and demos co...
›
Home
View web version