External Table

Luca's blog on data engineering, data platforms, and performance.

Links

▼
Showing posts with label Pyspark. Show all posts
Showing posts with label Pyspark. Show all posts
Wednesday, October 1, 2025

Why I’m Loving Spark 4’s Python Data Source (with Direct Arrow Batches)

›
TL;DR:  Apache Spark 4 lets you build  first-class  data sources in pure Python. If your reader yields  Arrow  RecordBatch  objects, Spark i...
Thursday, April 25, 2019

Machine Learning Pipelines for High Energy Physics Using Apache Spark with BigDL and Analytics Zoo

›
Topic: This post describes a data pipeline for a machine learning task of interest in high energy physics: building a particle classifier t...
Monday, November 21, 2016

IPython/Jupyter SQL Magic Functions for PySpark

›
Topic: this post is about a simple implementation with examples of IPython custom magic functions for running SQL in Apache Spark using PyS...
›
Home
View web version

About Me

My photo
Luca Canali
Geneva, Switzerland
@LucaCanaliDB

View my complete profile
Powered by Blogger.