External Table
Luca's blog on data engineering, data platforms, and performance.
Links
(Move to ...)
Luca's Home Page
Luca's Twitter
Luca's GitHub
Blog of the database services at CERN
▼
Showing posts with label
Pyspark
.
Show all posts
Showing posts with label
Pyspark
.
Show all posts
Wednesday, October 1, 2025
Why I’m Loving Spark 4’s Python Data Source (with Direct Arrow Batches)
›
TL;DR: Apache Spark 4 lets you build first-class data sources in pure Python. If your reader yields Arrow RecordBatch objects, Spark i...
Thursday, April 25, 2019
Machine Learning Pipelines for High Energy Physics Using Apache Spark with BigDL and Analytics Zoo
›
Topic: This post describes a data pipeline for a machine learning task of interest in high energy physics: building a particle classifier t...
Monday, November 21, 2016
IPython/Jupyter SQL Magic Functions for PySpark
›
Topic: this post is about a simple implementation with examples of IPython custom magic functions for running SQL in Apache Spark using PyS...
›
Home
View web version