boop — Coding Standards

Standards and conventions for the boop framework codebase. These apply to the framework itself, all class files, and all test files. Comments are part of the code and must be maintained alongside it.

The goal: a programmer encountering this codebase for the first time should be able to read any function and understand what it does, what it expects, and what it returns — without reading the entire framework first.

See also GOTCHAS.md — the “what goes wrong and why” companion to this document’s “how to write it correctly.”

API Tiers
Comments
Variable Naming
Output
Shell Options
Error Handling
Class File Structure
API Shape
- Primitives Inward, Wrappers Outward
- Cost of an I/O Round-Trip
Liskov Substitution Principle (LSP)
- When to Diverge
Test Files
Refactoring Policy
- Sanitize on Sight
- Don’t Break the Tests
Class Variant Conventions (::Simple and ::Fast)
Class Properties (static/instance)
Constructor Preamble: into= Guard

API Tiers

Every function in the system belongs to one of three tiers. The tier largely determines naming, sets validation expectations, and guides decisions about behavior and interface.

Tier 1 — Private (`__double_underscore, Mixed Case`)

Internal plumbing. Called under controlled conditions by code that knows what it’s doing. Callers are responsible for passing correct inputs.

Methods intended for speed and efficiency may be fast and lean with minimal validation and logging, but should still have an eye toward ease of future debugging.

When reasonable, features like lazy evaluation, caching and memoization, and simple, fast, in-memory built-in tools are always favored.

When heavy, complicated, slow and/or otherwise cumbersome processing is necessary, it should be carefully documented in the comments (see below), and have various levels of logging established.

System-specific variables also use the double-leading-underscore. These should only EvER be accessed through the provided methods.

Examples: __boop.parse, __Math.rawAdd, __boop_classPath

Tier 2 — Semi-Private (`_single_underscore`)

Management interface. Used to configure, inspect, and control the framework. These are public-facing but not the primary user API. Should validate inputs and produce clear error messages.

Methods and variables in this category manage the system in some way, and/or are used as convenient interfaces to the internals. Users are expected to use these, but the single underscore indicates that they are effectively reserved words and should be used only in the prescribed manner. Improper use voids the warranty.

Examples: _Warn, _LogLevel, _Crash, _Self, _Class

Naming: _MixedCase for functions and variables

Tier 3 — Public (`ClassName.method`)

End-user facing. These are the methods people call in scripts and on the command line. Must be robust, intuitive, and helpful:

Validate all inputs; reject garbage with clear error messages
Produce useful output by default (stdout with newline when no into= is specified)
Error messages should tell the user what went wrong AND what they should have done instead
Ideally designed to be comfortable in a .bash_profile for CLI use, but should at least be suitable for semi-casual scripting

Examples: Math.DO, Math.add, $obj.volume, $list.push

Comments

Comments are code. They must be accurate, current, and maintained alongside the code they describe. Stale comments are worse than no comments — they actively mislead.

Function Headers

Every function gets a block comment at the top explaining:

What the function does (one or two sentences AT LEAST - more is better)
Arguments it expects (name, type, purpose)
What it returns or produces (value, side effects, exit code)
Any non-obvious behavior or gotchas

Example:

# Resolve a Math argument to its digits/scale/neg triple.
#
# If the input is an object ID (starts with _ and is in the registry),
# extracts digits, scale, and neg from the object's descriptor.
# Otherwise, parses the input as a literal number string.
#
# Arguments:
#   $1 — input value (object ID or numeric string like "3.14" or "-42")
#   $2 — nameref: receives digit string (no sign, no decimal point)
#   $3 — nameref: receives scale (integer, number of decimal places)
#   $4 — nameref: receives neg flag (0=positive, 1=negative)
#
# Returns: nothing (results via namerefs)
# Crashes: if input is not a valid number or object ID

The header should be written for someone who has never seen the function before. Don’t assume the reader knows the internal representation or the calling conventions — state them.

Inline Comments

Use # for structural comments — section dividers, brief annotations, and anything that explains what the code is doing at a high level.

Use : "explanation" (the : builtin with a string argument) for comments inside heavy logic sections. The : builtin is effectively a no-op but its arguments are parsed by bash, which means they appear in set -vx trace output. This makes them visible during debugging while # comments are stripped by the parser and invisible in traces. There is a miniscule performance cost for this; don’t embed subshells!

# Good: structural comment for a section
# === Scale Alignment ===

# Good: colon-comment in a hot loop or complex logic block
: "pad shorter operand with trailing zeros to match scales"
if (( __as_sA < __as_sB )); then
  ...
fi

Reserve : comments for places where trace visibility has debugging value — complex algorithms, non-obvious control flow, dispatch logic. Don’t use them for simple one-liners where # is fine.

These are code. They can only be used where an actual statement can.

Trailing Inline Comments

Short trailing comments are encouraged for clarity, especially for bash idioms that less experienced developers might not recognize. Align the # markers at a consistent column so they read as a clean margin annotation — code on the left, explanation on the right:

__res_neg=${#BASH_REMATCH[1]}                                # "-" → length 1; empty → 0
local __res_int="${BASH_REMATCH[2]}"                          # digits before the dot
__res_int="${__res_int#"${__res_int%%[!0]*}"}"                # strip leading zeros
: "${__res_int:=0}"                                           # keep at least "0"

This reduces visual clutter and lets the eye scan code and comments independently. Keep them short — if the explanation needs more than a few words, use a block comment above the line instead.

Comment Maintenance

When you change code, update the comments. When you read code and find a comment that’s wrong, fix it on the spot. This is not optional.

Variable Naming

Local Variables

All local variables in methods use the triple-prefix convention:

__ClassName_methodName_varname

This prevents nameref collisions. Bash namerefs resolve by name, not by lexical scope — if two functions in the call stack both have local val, a nameref in the inner function binds to the outer function’s val. The prefix makes every name unique across the entire call stack.

# Bad — will collide with any caller that also has "result"
local result

# Good — unique to this function
local __Box_volume_result

This is ugly. It is also correct. Do not skip the prefix. Caveat scriptor…

Framework Globals

All framework-level globals use the __boop_ prefix:

__boop_registry        # master object/class store
__boop_methodRegistry  # method resolution cache
__boop_logLevel        # global log level

Inherited Identity Variables

_Self and _Class are set by dispatch wrappers before each method call. They are effectively reserved words in class method code.

The framework targets bash 4.3+ and deliberately does not use local -I (a bash 5.0 feature that inherits a variable’s value from the calling scope). Dispatch wrappers instead set _Self and _Class as inline variables directly before calling the underlying function.

Methods read _Self and _Class as ordinary variables; the dispatch wrapper has already set them correctly before the call.
Constructors re-localize with local _Class="${_Class:-ClassName}" so the value is scoped to the constructor frame and defaults correctly.
Internal calls in boop should be explicit about setting or clearing _Self/_Class when the dispatch wrapper won’t run.

User-Facing Variables

Semi-private variables use single underscore with mixed case: _Self, _Class, _LogLevel. See Tier 2 — Semi-Private above.

These are generally used for very specific things. For example, if you explicitly want an object to use a parent’s method instead of its own overridden version, you can effectively “typecast” the method call - _Class=$ParentClass $obj.method This will attempt to use method from $ParentClass instead of the actual class of $obj.

While the system is designed to be useful on the CLI with convenient tools like Math.DO "1/(2+3)x4", it’s still built to work as an actual OOP system, too.

Output

`printf`, Never `echo`

echo interprets backslash escapes on some platforms and has inconsistent behavior across bash versions. printf is predictable everywhere. Use it for all output.

# Bad
echo "$value"

# Good
printf "%s\n" "$value"

Characters and Encoding

Never use em-dashes or other non-ASCII punctuation in code or generated output. Use plain ASCII -- (double hyphen) instead. Em-dashes cause problems with some terminal encodings and are visually ambiguous in monospace fonts.

For everything else, be contextual. Mathematical symbols like x, ^2, pi in comments are fine – they make algorithm documentation more readable and are never parsed by bash. Unicode card suits in PlayingCard output are fine – they’re the natural representation.

The rule: prefer simple ASCII in strings the framework generates for others to consume (error messages, log output, serialized data). Use whatever’s appropriate in comments, documentation, and domain-specific display output where the character serves a clear purpose.

Value Returns

All value-producing functions route through boop.pass:

boop.pass "$value" ${into:-}

The ${into:-} passes the caller’s nameref target if one was provided. If not, the return system uses the current mode (auto, stdout, global, etc.) to deliver the value.

Shell Options

boop does NOT set shell options (set -e, set -u, set -o pipefail, etc.). The framework must never alter the caller’s shell environment.

If boop ever needs to temporarily change a shell option internally, it must save and restore it. The caller’s shell options are their business.

All code in the framework must operate correctly regardless of the user’s shell configuration. A user who sources boop from a script with set -euo pipefail, or from an interactive shell with custom IFS, or with shopt -s failglob — all of these must work. The environment resilience test (tests/environ/test_environ) verifies this across 15 configurations.

errexit Safety (`set -e`)

Under set -e, any command that returns non-zero kills the shell — unless it’s in a conditional context (if, ||, && as part of a compound that succeeds overall, or a while/until condition).

The [[ ]] && action pattern is a landmine. When the test is false, the whole line returns non-zero:

# DANGEROUS under set -e: if digits is NOT all zeros, this kills the shell
[[ "${digits//0/}" == "" ]] && neg=0

# SAFE: the || true ensures the overall expression always succeeds
[[ "${digits//0/}" == "" ]] && neg=0 || true

# ALSO SAFE: if/fi is always a conditional context
if [[ "${digits//0/}" == "" ]]; then neg=0; fi

Use || true when the false case is normal (not an error). Use if/fi when the logic is complex enough to warrant it. The choice is readability — both are errexit-safe.

Arithmetic expressions return their truth value as an exit code. (( 0 )) returns 1. (( x++ )) when x is 0 evaluates to 0 (the old value), which returns exit code 1. Under set -e, this kills.

# DANGEROUS: first iteration when count is 0, (( 0++ )) → exit 1
(( count++ ))

# SAFE: pre-increment evaluates to 1 on first call
(( ++count ))

# SAFE: addition assignment always evaluates to the new value
(( count += 1 ))

# ALSO FINE: inside an assignment context, the exit code doesn't matter
foo[n++]=$x    # the assignment succeeds; the arithmetic is internal

The rule isn’t “never use post-increment” — it’s “understand what the expression evaluates to, because that becomes the exit code.” In an assignment like arr[n++]=val, the assignment’s success is the exit code, not the arithmetic’s. But as a standalone statement, the arithmetic IS the exit code.

IFS Independence

Never rely on the ambient IFS value. The user may have set it to anything — colon, empty, equals sign, or something exotic.

# DANGEROUS: relies on IFS being space/tab/newline for word splitting
local input="$*"
for token in $input; do ...

# SAFE: explicitly set IFS for the scope that needs it
local IFS=$' \t\n'
local input="$*"
for token in $input; do ...

Use local IFS=... to scope IFS to the current function. Bash restores the previous value automatically when the function returns — no manual save/restore needed, no risk of missing a restore path on early return or crash.

When joining array elements with "${array[*]}", always set IFS explicitly for the join:

# DANGEROUS: joins on whatever IFS happens to be
printf '%s\n' "${arr[*]}"

# SAFE: explicit join character
local IFS=','; printf '%s\n' "${arr[*]}"

# PREFERRED when you don't need a join: iterate instead
printf '%s\n' "${arr[@]}"

nounset Safety (`set -u`)

Under set -u, referencing an unset variable is an error. Use ${var:-} (default to empty) or ${var:-default} for any variable that might legitimately be unset:

# DANGEROUS under set -u: crashes if _Class is unset
local _Class="$_Class"

# SAFE: provides a default
local _Class="${_Class:-boop}"

Framework Must Not Alter User Environment

To be explicit: after . boop returns, the user’s IFS, shell options, shopt settings, and trap state must be exactly as they were before. local scoping handles IFS. Shell options should never be changed by framework code at the global level. If a future need arises to temporarily change an option, use a subshell or save/restore — but prefer redesigning to avoid the need.

Error Handling

The Two Error Paths: `_Crash` vs `_Error` + return 1

Every error in the framework falls into one of two categories with distinct handling:

_Crash — always fatal, regardless of _FatalLevel. Reserved for:

Shell injection / invalid identifiers (security boundary)
Framework internal corruption (registry inconsistent, dispatch failure)
Class/mixin declaration errors at load time (boopClass, boopMixin)
Version constraint failures (_Require, boop version guard)
Abstract method stubs called on non-implementing subclasses
Invalid framework API misuse (_Super with no parent, _Cast with no class)

_Error + return 1 — recoverable. Callers check $? and handle it. Used for all runtime data conditions: bad input, missing files, empty collections, invalid arguments to user-facing methods. With the default _FatalLevel crash, _Error logs the message and continues; with _FatalLevel error, it escalates to fatal. Either way, the return 1 ensures the function exits with a failure code so callers can check.

# Framework corruption → _Crash (security/integrity boundary)
[[ "$name" =~ $__boop_validate_pat ]] || _Crash "Invalid identifier: '$name'"

# Runtime data condition → _Error + return 1 (recoverable)
[[ -n "$file" ]] || { _Error "Config.load: file path required"; return 1; }
[[ -f "$file" ]] || { _Error "Config.load: file not found: $file"; return 1; }

# In a case arm:
*) _Error "Signal.strict: expected 0/off or 1/on, got '$1'"; return 1 ;;

Do not silently return empty strings or success codes for invalid input. Do not use _Crash for conditions the caller can reasonably handle.

Tier-Appropriate Validation

Tier 1 (private): minimal validation. Callers are trusted. Should consider context; lazy private code that creates public code should validate appropriately!
Tier 2 (semi-private): validate inputs, use _Error + return 1 for runtime data conditions; _Crash for security/framework violations.
Tier 3 (public): validate everything. Error messages should say what was wrong AND suggest the correct usage.

# Tier 3 error message — helpful
_Error "Math.add: invalid number '${input:-}' — expected a numeric value like '3.14' or '-42'"
return 1

`2>/dev/null` Policy

Only suppress stderr when ALL of these are true:

You know exactly what error will be produced
You are expecting that specific error
The error content has no debugging value

Every 2>/dev/null in the codebase should be reviewable against these criteria. If it doesn’t pass all three, remove it.

Class File Structure

Every class file follows this structure:

#!/bin/bash

# ClassName — one-line description
#
# Longer description if needed. Explain what the class does, what
# it's for, and any important design decisions.

# Load guard — skip if already registered
# NOTE: This pattern is under review for refactoring. The 2>/dev/null
# suppresses "return outside function" when the file is executed
# directly instead of sourced, which is a debugging hazard under
# set -e. A boop.init replacement is planned.
[[ -n "${__boop_registry[ClassName]+set}" ]] && return 2>/dev/null

. boop [Dependencies]

# Class descriptor
__boop_registry["ClassName"]="..."

# Method implementations (each with a function header comment)

# Method registration
__boop.registerMethod ClassName method ClassName.method

# Finalize
__boop.registerClass ClassName

API Shape

Primitives Inward, Wrappers Outward

When a class exposes the same operation over multiple input forms (a string, a file path, a stream), the reduced form is the primitive. Other entry points are thin wrappers that produce the reduced form and delegate.

For text parsing the reduced form is “lines on stdin.” loadFile reads the file and pipes into the parser; fromString feeds the string in via <<<; fromStdin is the parser itself. The parsing logic exists exactly once.

The inverse — making the file variant the primitive and routing in-memory data through mktemp, printf >, and rm to reuse it — is forbidden. It pays for a subshell, two syscalls of disk I/O, and a tmpfile leak window on _Crash, all to skip a one-function refactor. A while read; done < "$file" loop and a while read; done <<< "$str" loop are the same loop — extract it.

The same shape applies elsewhere:

Serializers: the in-memory form (toString) is the primitive; save <file> writes its output. Never save to a tmpfile then cat it back to stdout.
Iteration: a callback/visitor primitive is the core; forEach, map, filter wrap it. Never reimplement the walk.
Constructors: new (empty) is the primitive; fromString, fromFile, fromArray build empty then populate via public methods.

Cost of an I/O Round-Trip

For reference, when judging whether to “just route through the existing function”:

Operation	Approximate cost
`mktemp`	fork + syscalls
`printf '%s' >file`	open/write/close
`done < file` (re-read)	open/read/close
`rm -f file`	fork + unlink
Subshell `$(...)`	fork + pipe + wait

Compare to extracting the loop body into a private helper: zero. The refactor is cheaper than one invocation of the wrong design.

Liskov Substitution Principle (LSP)

The Liskov Substitution Principle states: if class B inherits from class A, then objects of type B should be usable anywhere objects of type A are expected, without breaking the program’s correctness.

In boop, this means:

Inherited methods must work correctly on subclass instances. If Box has a volume method, and Cube inherits from Box, then calling volume on a Cube must produce a correct result — even if Cube doesn’t override it.
Overrides must honor the parent’s contract. If List.pop returns the last element and crashes on empty, then Stack.pop (which wraps a List) must do the same. A caller who expects List behavior shouldn’t be surprised by Stack behavior.
Documented divergences are acceptable but must be explicit. When a class intentionally breaks substitutability — like Stream’s read method, which returns 0 on partial-read-at-EOF where bash’s raw read returns non-zero — that’s an “LSP divergence.” It’s a conscious design choice, not a bug. Document it in the class file and in the relevant docs with the phrase “LSP divergence” so it’s searchable and unambiguous.

When to Diverge

Diverge when the parent’s behavior is wrong for the use case and honoring it would force every caller to work around it. Stream’s read divergence exists because the raw read behavior (returning non-zero on the last record) causes silent data loss in while loops. Correctness wins over substitutability.

When you diverge, document:

What the parent does
What you do instead
Why the divergence is correct for this context

Test Files

All tests use the TestSuite class. Test files should be thorough, especially for infrastructure code (logging, dispatch, return system).

Naming

Test files are named test_<subject>_ts (the _ts suffix indicates TestSuite-based tests). Benchmark and non-TestSuite files omit the suffix (e.g., test_pi_growth, test_matrix).

Zero-Fork Where Possible

Prefer $(<file) (zero-fork builtin read) over $(command) subshell capture in test helpers. Use bash -c only for tests that need process isolation (crash tests, exit code tests).

Verbosity

Default output is quiet (failures + summary only). Full output is available via TESTSUITE_VERBOSE=1. Tests should work correctly in both modes.

Refactoring Policy

Sanitize on Sight

Every file touched for other work gets scanned for:

Unlocalized variables that could inherit unexpected values
Stale comments that no longer match the code
$self/$class references (should be $_Self/$_Class)
Missing function header comments

Fix these on the spot. Don’t create TODO items for them.

Don’t Break the Tests

All changes must pass the full test suite before committing. Currently 514 assertions across 6 TestSuite files.

Class Variant Conventions (`::Simple` and `::Fast`)

Two naming conventions for class variants with specific optimization axes. Not mandatory – not every class has, needs, or will ever need one of these. The convention exists so a reader seeing Foo, Foo::Simple, or Foo::Fast in the tree immediately understands what they’re looking at without having to read the file for a first clue.

`Class::Simple`

A minimal-dependency variant for use inside the dependency graph of core classes. Typically a subset of the full API, no dependencies on other framework classes beyond the root boop, and focused on the one or two operations that matter for the intended caller. Examples:

Collection::Map::Simple – plain key-value hash with set/get/has/keys, no ordering, no delimiters, no Iterator.
Config::Simple – flat key=value parser with # comments and blank-line skipping. No INI sections, no object wrapper, no round-trip.

`Class::Fast`

An optimized-hot-path variant. Already demonstrated by Collection::Map::Fast (flat compound-key store, O(1) get/set, no insertion ordering). The conventional signal is “this sacrifices features for speed.”

What the convention is NOT

Not a mandate. Most classes will have neither variant.
Not a symmetry requirement. A class can have ::Simple without ::Fast, or vice versa.
Not the same as inheritance. The variants don’t share an inheritance chain – each implements what it needs directly.

Class Properties (static/instance)

Design decision (settled): No implicit fallback. Java/C#/Ruby model.

$obj.prop reads the instance’s own value from __boop_static["${objId}.${prop}"]
ClassName.prop reads the class value from __boop_static["${className}.${prop}"]
These are independent. Setting one doesn’t affect the other.
If a developer wants inherited defaults, they call $obj.inheritValueFor prop in their constructor – explicit, not magic.

Constructor Preamble: `into=` Guard

Any constructor (or method) that receives into= from its caller AND calls other framework code internally must save and clear into at the top to prevent leakage into subcalls:

MyClass.new() {
  local __MyClass_new_into="${into:-}"; into=''
  # ... internal work ...
  boop.pass "$_Self" ${__MyClass_new_into:+$__MyClass_new_into}
}

See docs/GOTCHAS.md “Environment Prefix Leakage” for the full explanation of why this is necessary.