A Join node merges data from two or more upstream sources. You connect two or more sources to it, describe how they should join in plain language, and the AI assistant generates the join operation.
Before generating the join, the assistant insists on:
For new-table mode, the assistant also requires a target table or collection name. The assistant asks one question at a time, building on what you've already said. It only proceeds to generate when it's one hundred percent certain from your explicit statements.
The chat groups every available connection by category — Databases, Vector, Cloud Storage, APIs, Local Files — and de-duplicates them so you never see the same connection twice. Local SQL and local document workspaces are always offered alongside your saved connections.
The mode options shown depend on the destination type:
The flow of a join depends entirely on what kind of destination you chose:
DagFlux makes the destination database the venue of the join. For each source that lives outside the destination database, DagFlux creates a temporary table inside the destination, loads the source's rows into it (in batches, with chunked inserts), and then runs the AI-generated join SQL against the temp tables and any tables already living in the destination. After the join completes, the temp tables are dropped.
Temp table names are generated to be unique per node and per source position so two joins running concurrently never collide.
If the destination is the local SQL workspace, the same pattern runs locally — sources are loaded into temporary local tables and the join runs there.
DagFlux loads every source's rows into memory in the background engine, then runs an AI-generated JavaScript join function over them. The function's return value (an array of joined documents) is bulk-inserted into the destination collection. The destination collection is dropped first.
This is what lets DagFlux join data that lives in entirely different systems — a relational source with a document source, for example — into a single document collection.
Similar to document destinations, but with one additional step: after the join function produces the joined rows, DagFlux delegates to the Vector Database Executor skill, which embeds the column you nominated and upserts the resulting vectors into the destination index. The rest of each row is preserved as metadata for Pinecone, or as flat columns for LanceDB.
If one of your sources is a vector store, the assistant detects two distinct intents:
For relational destinations, the SQL fetched from each external SQL source is automatically wrapped with the destination database's preferred pagination syntax (LIMIT/OFFSET, TOP, FETCH, etc.) so large sources can be loaded in chunks without rewriting the query for each dialect.
Before running, the chat shows you the full plan: which sources will be loaded into temp tables, what the join SQL or function looks like, and what gets inserted at the end. You confirm before anything is written.
If a join fails at execution time, the chat surfaces an "Ask AI to fix the error" affordance so you can describe the fix in plain language and let the assistant regenerate the relevant piece — without losing the context of what you've already configured.
The chat saves the full join configuration (destination, mode, target name, per-source temp table names, join SQL, JS function) so reopening the node restores everything you've decided.