A Data Source node points at exactly one input. At execution time it makes sure that input is materialised into a form downstream nodes can read from — usually by writing into the local workspace.
You select a saved connection. The connection's category determines how the node behaves:
The node references a specific table, collection, or index inside that database. No re-import is needed at execution time; downstream nodes read from the live database.
The node references a specific index (and namespace, for Pinecone). For LanceDB, the available tables are read directly from the file.
The node references a file you've already imported (CSV, Excel, Parquet, Avro, JSON, SQLite, NeDB, LanceDB). At execution time the file is re-imported into the local workspace.
The node references a specific object in a bucket. At execution time the object is downloaded and imported into the local workspace.
The node references a specific Google Sheet. At execution time the sheet is fetched and imported into the local workspace.
The node carries an AI-generated fetch function. At execution time the function runs and the resulting rows are written into a local document collection.
When you select a table, collection, or index, DagFlux fetches a small sample (typically five rows) and the inferred column / field list. This is shown in the chat panel so the AI assistant has concrete context for any follow-up question (e.g. "what's in this table?", "summarise the columns").
When you point a Data Source at a REST API connection, you describe in plain language what data you want and the AI assistant generates a custom fetch function. The function:
Once the function is generated, it's saved into the node and is "clean" — the node can be reopened without DagFlux re-asking the AI. If you refine your request, the function is marked dirty and regenerated on the next save.
When the source is a cloud bucket file, DagFlux:
For Excel files with multiple sheets, the chat lets you choose which sheet (or sheets) to import.
Re-importing always drops the existing target table or collection first, so re-running a workflow gives a deterministic result.
When the source is a document store (MongoDB, NeDB), the chat uses collection / filter terminology and routes through the document path. When it's a SQL database, table / query terminology is used. The underlying difference is hidden from you — the same node config flow covers both.