/DB

Bulk COPY

Insert large batches at high throughput with PostgreSQL binary COPY.

updated 26 Jun 20263 min readv0.3.4View as Markdown

Overview

For large inserts, BulkCopy.InsertMultipleCopyAsync streams rows to PostgreSQL using the binary COPY ... FROM STDIN (FORMAT BINARY) protocol. It is substantially faster than the parameterized InsertMultipleAsync path for big batches and is not bound by the 65,535-parameter limit, so a batch of any size is a single COPY operation rather than several chunked INSERT commands.

NEW in 0.3.3

Use it when you are loading many rows and don't need database-generated values back (see Trade-offs). For small inserts, individual writes, or when you need generated keys propagated, use the regular insert path.

Inserting a batch

BulkCopy.InsertMultipleCopyAsync<T> works with any generated [Table] entity. It returns the number of rows written.

using Socigy.OpenSource.DB.Core.Bulk;

var rows = Enumerable.Range(0, 100_000)
    .Select(i => new LogEntry { Id = Guid.NewGuid(), Message = $"event {i}", Level = "info" })
    .ToList();

await using var conn = connectionFactory.Create();
await conn.OpenAsync();

ulong written = await BulkCopy.InsertMultipleCopyAsync(rows, conn);
// written == 100000

The signature:

public static Task<ulong> InsertMultipleCopyAsync<T>(
    IEnumerable<T> rows,
    DbConnection connection,
    DbTransaction? transaction = null,
    bool includeAutoFields = false,
    CancellationToken cancellationToken = default)
    where T : class, IDbTable, IInsertPlanProvider;

Pass includeAutoFields: true to also write auto-increment columns (for example, when you are supplying your own identity values).

Dynamic tables

Dynamic tables ([TableType]) expose the same operation, bound to the runtime table name:

ulong written = await AuditEntry
    .WithTableName("audit_2026_06")
    .WithConnection(conn)
    .InsertMultipleCopyAsync(rows);

How it works

The values written by COPY come from the same per-column pipeline as a normal insert. Each column's value is produced exactly as it would be for InsertMultipleAsync, so special columns are handled identically and there is no risk of the two paths diverging:

  • [Encrypted] columns are written as their bytea ciphertext.
  • [JsonColumn] / [RawJsonColumn] columns are written as jsonb.
  • Value-convertor columns have their convertor applied before writing.
  • NULL values are written as SQL NULL.

Trade-offs

Binary COPY trades a few conveniences for throughput. Keep these in mind:

  • No RETURNING. COPY cannot return database-generated values. Auto-increment and DEFAULT columns are still filled by the database, but those values are not written back to your in-memory instances. When you need generated keys propagated, use InsertMultipleAsync (which supports value propagation) instead.
  • Strict types — DateTime kind matters. COPY does not perform the implicit casts that a parameterized INSERT does. A DateTime written to a timestamp without time zone column must have Kind = Unspecified (or Local); a Kind = Utc value belongs in a timestamp with time zone column. Mismatches throw at write time. The parameterized path masks this by letting the driver infer and the database implicitly cast.
  • All or nothing per call. A COPY is a single streamed operation; a failure aborts the whole batch.

Transactions

COPY participates in the connection's current transaction. Begin a transaction on the same connection (or pass it via transaction) and the rows are committed or rolled back with it:

await using var tx = await conn.BeginTransactionAsync();
await BulkCopy.InsertMultipleCopyAsync(rows, conn, tx);
await tx.CommitAsync();   // RollbackAsync() undoes the COPY

Performance

See Benchmarks for COPY throughput against the parameterized multi-row insert, Dapper, and EF Core.