From Python to Petabytes, instantly
Apply your own Python functions directly to multimodal datasets without exporting or duplicating data.
Run UDFs across large datasets in parallel using Ray or Spark, cutting feature generation time from days to hours.
Generate, update, and store new features right inside LanceDB for immediate use in training or analytics.
table.add_columns({
"title_frame": extract_key_frame("video", 0),
"description": img2txt("title_frame"),
"embedding": embed("description")
})
Define workflows once and scale to 100K+ cores. No code changes needed to scale.
Feature DocsAdd or change features without re-ingesting. Iterate on models with minimal engineering.
Feature DocsPreemption and cheeckpointing come standard. Keep training resilient without extra work.
Feature DocsRun jobs when GPUs are free or underused. Optimize for cost and throughput.
Feature Docs