Stream

A record-oriented reader wrapping a file descriptor. Parsing is configured once at construction; each Read call executes the pre-built contract. Supports single-char delimiters (via bash read directly), multi-char exact-string delimiters, and char-class delimiters with run-collapsing.

NUL bytes. Bash variables cannot hold or detect NUL bytes. Any record or field value containing a NUL will be silently truncated at the first one. See GOTCHAS.md.

Loading
Quick Start
Construction
Methods
The CRLF Contract
Parse Modes in Detail
Field Assignment
Performance
Null Bytes
Examples
Design: Two-Layer Delimiter Architecture
Design: Multi-FD I/O Model (planned)

Loading

. boop Stream

Quick Start

# Direct mode: read lines, IFS splits on colon
into=s Stream.new -P "/etc/passwd"
while IFS=':' $s.read user _ uid gid desc home shell; do
  printf "%s lives at %s\n" "$user" "$home"
done
$s.close

# Buffered mode: -f for char-class field splitting
into=s Stream.new -P "data.csv" -f ',' name age city
while $s.Read; do
  printf "%s is %s\n" "$name" "$age"
done
$s.close

# Buffered mode: CRLF records with colon-separated fields
into=s Stream.new -P "windows.log" -D $'\r\n' -f ':' ts level msg
while $s.Read; do
  printf "[%s] %s\n" "$level" "$msg"
done
$s.close

Construction

into=s Stream.new [options] [field names...]

The constructor determines the parse mode based on the options given and locks it for the object’s lifetime. Three modes exist:

Mode	When chosen	How it reads
direct	Single-char EOL, no `-f`/`-F`	`read -d` from FD directly
regex	Char-class EOL (`-E`) or char-class field delim (`-f`)	Buffered, regex match
pe	Multi-char exact-string EOL (`-D`) or exact-string field delim (`-F`)	Buffered, parameter expansion

Source options (mutually exclusive with fallback)

Option	Meaning
`-P PATH` / `--path=PATH`	Open file for reading
`-u FD` / `--fd=FD`	Use an already-open FD
(neither)	Dup stdin

Record delimiter options (mutually exclusive)

Option	Meaning	Mode
`-d CHAR`	Single-char EOL, exact	direct
`-D STRING`	Multi-char EOL, exact string, non-stacking	pe
`-E CHARS`	EOL char class – any char in set, runs collapse	regex
(default)	`${_EOL:-\n}`. Length determines mode.	auto

Field delimiter options (mutually exclusive)

Option	Meaning	Mode
`-f CHARS`	Any single char in set, non-stacking (empties preserved)	regex
`-F STRING`	Exact multi-char string, non-stacking	pe
`-W CHARS`	Char class, stacking (runs of delimiters collapse into one boundary)	regex
(default, direct mode)	IFS splitting – user controls IFS	direct
(default, buffered mode)	Defaults to record delimiter (one field per record)	buffered

-W is the buffered-mode equivalent of IFS whitespace behavior. Use -W "$IFS" to get collapsing-whitespace field splitting in buffered mode. Unlike IFS, -W treats ALL chars in the set identically (no special whitespace vs non-whitespace distinction).

Note: Stream does NOT read the _Delimiter framework global. Use -f, -F, or -W explicitly. If you want _Delimiter’s value, pass it: -f "$_Delimiter".

Other options

Option	Meaning
`-a NAME`	Array mode: all fields into named array
`-x`	Expose: generate `$o.fieldname` accessors, store field values on the object
`-n N` / `-N N`	Fixed-width: read exactly N chars per record
`-t N`	Timeout in seconds (direct mode: passed to `read`)
`-b N` / `--blockSize=N`	Buffer fill size (default: 1024). Buffered modes only.

Positional arguments

Everything after the options that isn’t recognized as an option is a field name. Field names must be valid bash identifiers. The last field gets “the rest” (same as bash read). Use _ to discard a field.

into=s Stream.new -P "data.csv" -f ',' name age _ city
#                                       ^^^^ ^^^ ^ ^^^^
#                                       f1   f2  discard  f3 (gets remainder)

Methods

`$s.read` (direct mode)

Thin wrapper around bash’s read builtin. Available only on direct-mode objects (no -D, -E, -f, -F, -W, -n). IFS does field splitting.

Returns 0 on success, 1 on EOF.

LSP divergence from raw read: When the final record has no trailing delimiter (common with files that lack a trailing newline), raw read returns non-zero even though it read data – causing while read; do loops to skip the last record. Stream handles this: if read returns non-zero but data was read, $s.read returns 0 (so your loop body runs) and sets EOF internally (so the next call returns 1). This means while $s.read; do always processes every record, including unterminated final records. You do NOT need the || [[ -n "$line" ]] workaround.

while $s.read; do
  # fields populated via IFS splitting
  # EVERY record is processed, including the last one without trailing newline
done

# With custom IFS:
while IFS=':' $s.read; do ...

`$s.Read` (buffered mode)

Buffered framework reader. Available only on buffered-mode objects (constructed with -D, -E, -f, -F, -W, or -n). Field splitting uses the configured delimiter, NOT IFS.

Returns 0 on success, 1 on EOF.

while $s.Read; do
  # fields are populated
done

Only one of $s.read or $s.Read is available per object. Calling the wrong one returns an error. The constructor logs which is live.

`$s.next`

Convenience method: calls $s.read or $s.Read depending on the object’s mode. Use when you don’t care about mode and don’t need maximum speed (adds one branch per call).

while $s.next; do ...

`$s.field INDEX_OR_NAME`

Return a field value by numeric index or field name. Works with both array mode (-a) and named fields with -x (expose).

into=v $s.field 0       # first field by index
into=v $s.field name    # field by name (requires -x or named fields)

`$s.fieldCount`

Return the number of fields from the last read. Use to iterate safely without running off the end.

into=n $s.fieldCount
for (( i=0; i < n; i++ )); do
  into=v $s.field $i
  ...
done

`$s.buffered`

Returns exit code 0 if this stream uses the buffered engine (pe or regex mode). Exit code 1 if direct mode.

`$s.eof`

Returns exit code 0 if the stream is exhausted. Use after a Read returns non-zero to distinguish EOF from error.

`$s.close`

Close the stream’s FD. Always closes – there is no ownership tracking. If you passed in an FD you still need, don’t call close.

`$s.write STR`

Write a string to the stream’s FD. No delimiter appended.

`$s.writeLine STR`

Write a string followed by the record EOL to the stream’s FD.

`$s.putBack STR`

Push a value back onto the front of the read buffer. The next read (Read or next) returns this value before consuming new data from the FD. Multiple calls stack (LIFO — last pushed is first read).

Resets the EOF flag — putting data back means there’s something to read.

Essential for lookahead parsing: read a line, inspect it, decide it belongs to the next section, push it back.

$s.Read line
# ... this line starts a new block, put it back ...
$s.putBack "$line"
# next $s.Read returns $line again

Best used with buffered-mode streams. Direct-mode streams don’t use the internal buffer, so putBack behavior is less predictable there.

The CRLF Contract

This is important. When the record delimiter is set to an exact multi-char string (e.g. \r\n via -D), the delimiter must match IN FULL to trigger a record boundary. A bare \n inside the record is just data – it does NOT split the record.

This means:

-D $'\r\n' with data "line1\nstill line1\r\nline2\r\n" produces two records: "line1\nstill line1" and "line2".
The embedded \n is preserved in the first record.

If you want ANY newline-like character to split records (bare LF, bare CR, CRLF all treated as boundaries), use -E $'\r\n' instead. That’s char-class mode with run-collapsing – any sequence of CR and/or LF characters constitutes one record boundary.

Parse Modes in Detail

Direct Mode

The constructor pre-builds a complete read argument array:

(-r -d "$eol" -u "$fd" field1 field2 field3)

Each $s.Read call is literally:

read "${args[@]}"

One builtin call. No buffering, no string manipulation, no overhead beyond method dispatch. IFS splitting works exactly as it does with bare read – the user controls IFS, we don’t touch it.

When to use: simple line-oriented parsing where read does everything you need. This is the default for newline-delimited data with no multi-char delimiter options.

Regex Mode (buffered)

Used when -E (char-class EOL) or -f (char-class field delimiter) is specified. The constructor builds anchored regexes:

Record regex: ^([^CHARS]*)[CHARS]+ (captures record, consumes delimiter run)
Field regex: ^([^CHARS]*)[CHARS] (captures one field, consumes one delimiter char)

Each Read:

Apply record regex to buffer
No match? Fill buffer from FD, retry
Match? Extract record from BASH_REMATCH[1], advance buffer
Split record into fields using field regex + nameref assignment

PE Mode (buffered)

Used when -D (exact-string EOL) or -F (exact-string field delimiter) is specified. Uses parameter expansion:

Record extraction: ${buf%%"$eol"*} (everything before first EOL)
Buffer advance: ${buf#*"$eol"} (everything after first EOL)
Field extraction: same pattern with field delimiter

Handles arbitrary multi-char delimiters that can’t be expressed as regex character classes (e.g. <>, ::, \r\n).

Field Assignment

Fields are assigned via nameref – no eval, no read <<<, no IFS manipulation in buffered modes. The field names array is stored as a real bash indexed array (not a joined string).

# Internal assignment loop (simplified):
for vname in "${fields[@]}"; do
  [[ "$vname" == "_" ]] && continue
  local -n ref="$vname"
  ref="$value"
done

In direct mode, read handles field assignment natively (field names are passed directly as arguments to read).

Performance

Overhead

Stream adds per-record overhead from method dispatch and data access. Benchmarks on 1000 records:

Mode	Time	vs raw `read`
Direct (whole line)	~1.4s	~10x
Direct (IFS split, 5 fields)	~1.4s	~7x
Buffered PE (whole line)	~2.3s	~16x
Buffered PE (5 fields, -f)	~3.2s	~17x
Buffered regex (-E)	~1.7s	~12x

The overhead is dominated by method dispatch and hash lookups, not by the parsing algorithm. For bulk processing (millions of records), use raw read. Stream is for convenience and correctness on structured data – hundreds to low thousands of records.

Block Size

Benchmarking shows block size has negligible impact in the 256-2048 range. Default is 1024. Override with --blockSize=N if you have a specific reason (e.g. very long records where a larger buffer avoids multiple refills).

Optimization: `__Stream_data`

Stream stores per-object configuration in a single global associative array (__Stream_data) with compound keys ("${objId}.property"). This eliminates the __boop.get function call overhead that would otherwise dominate the hot path. The property system is still used for introspection but not in the read loop.

Null Bytes

Bash variables cannot hold \0. Stream operates on text only. Binary data with embedded nulls is out of scope.

Examples

CSV with header

into=s Stream.new -P "data.csv" -f ','
$s.Read header_line  # first record into a single variable
# Now read data rows with known fields:
while $s.Read name age city; do
  printf "%s (%s) from %s\n" "$name" "$age" "$city"
done
$s.close

Paragraph mode (double-newline separated)

into=s Stream.new -P "document.txt" -D $'\n\n' paragraph
while $s.Read; do
  printf "=== PARAGRAPH ===\n%s\n\n" "$paragraph"
done
$s.close

Mixed line endings (any CR/LF combination)

into=s Stream.new -P "messy.log" -E $'\r\n' line
while $s.Read; do
  process_line "$line"
done
$s.close

Fixed-width records

into=s Stream.new -P "mainframe.dat" -n 80 record
while $s.Read; do
  # Slice fields by position
  type="${record:0:2}"
  account="${record:2:20}"
  amount="${record:22:10}"
done
$s.close

Array mode

into=s Stream.new -P "data.tsv" -f $'\t' -a row
while $s.Read; do
  printf "columns: %d, first: %s\n" "${#row[@]}" "${row[0]}"
done
$s.close

Writing

exec {fd}> "output.txt"
into=s Stream.new --fd="$fd"
$s.writeLine "header line"
$s.writeLine "data line 1"
$s.write "no newline after this"
$s.close

Design: Two-Layer Delimiter Architecture

_EOL and _Delimiter are the universal IO-control variables across the boop framework. Stream implements the complex (Layer 2) path:

Layer 1 – the fast path (single-char, used everywhere):

printf, parameter expansion, read -d – zero overhead
What Map.keys, List.toArray, boop.pass, Config.keys etc. use
Single characters are the common case and stay fast

Layer 2 – Stream (multi-char capable):

Internal buffer, scans for delimiter using parameter expansion or regex
Handles multi-character _EOL (paragraph mode, CRLF, arbitrary patterns)
Handles multi-character field delimiters (-F, -W)
Same caller intent (_EOL, -D, -E), different execution path

The hybrid principle:

Single-char delimiter: direct read (no buffering needed)
Multi-char delimiter or streaming data: buffered Read
Same variable names, same caller intent, different execution paths

Stream does NOT read _Delimiter directly (see docs above). The framework’s _Delimiter is an output-side convention. Stream’s input- side field splitting uses explicit -f/-F/-W options.

Design: Multi-FD I/O Model (planned)

Each Stream object will have up to three FDs:

in – read source (default: stdin dup)
out – write target (default: stdout dup)
err – object-level logging/errors (default: stderr dup)

Each independently configurable. Shell redirections on the constructor persist (existing behavior). No sigil parsing. Designed with Stream::Socket in mind (bidirectional on one FD).

See TODO for full option spec (--fd-in, --path-out, -m mode, etc.)