Feature Engineering Without the Overhead

From Python to Petabytes, instantly

Feature Engineering Without the Overhead

Introducing LLM-as-UDF

Custom Transformations at Scale

Apply your own Python functions directly to multimodal datasets without exporting or duplicating data.

Distributed Processing

Run UDFs across large datasets in parallel using Ray or Spark, cutting feature generation time from days to hours.

Inline Feature Creation

Generate, update, and store new features right inside LanceDB for immediate use in training or analytics.

table.add_columns({
    "title_frame": extract_key_frame("video", 0),
    "description": img2txt("title_frame"),
    "embedding": embed("description")
})

Key Features

Declarative Pipelines

Define workflows once and scale to 100K+ cores. No code changes needed to scale.

Feature Docs

Schema Evolution

Add or change features without re-ingesting. Iterate on models with minimal engineering.

Feature Docs

Built-in Orchestration

Preemption and cheeckpointing come standard. Keep training resilient without extra work.

Feature Docs

GP Scheduling

Run jobs when GPUs are free or underused. Optimize for cost and throughput.

Feature Docs

Tomorrow's AI is being built on LanceDB today

“We checked lots of other solutions, and they all became exorbitantly expensive for datasets >100M embeddings. LanceDB was the only option that could store 1B embeddings with 100x lower cost and zero ops. That’s why we love LanceDB!”

Chris Moody, CTO & Co-founder

Let your team focus on features, not infrastructure.

Contact Us