boop — Coding Standards
Standards and conventions for the boop framework codebase. These apply to the framework itself, all class files, and all test files. Comments are part of the code and must be maintained alongside it.
The goal: a programmer encountering this codebase for the first time should be able to read any function and understand what it does, what it expects, and what it returns — without reading the entire framework first.
See also GOTCHAS.md — the “what goes wrong and why” companion to this document’s “how to write it correctly.”
Contents
- API Tiers
- Comments
- Variable Naming
- Output
- Shell Options
- Error Handling
- Class File Structure
- API Shape
- Liskov Substitution Principle (LSP)
- Test Files
- Refactoring Policy
- Class Variant Conventions (
::Simpleand::Fast) - Class Properties (static/instance)
- Constructor Preamble:
into=Guard
API Tiers
Every function in the system belongs to one of three tiers. The tier largely determines naming, sets validation expectations, and guides decisions about behavior and interface.
Tier 1 — Private (__double_underscore, Mixed Case)
Internal plumbing. Called under controlled conditions by code that knows what it’s doing. Callers are responsible for passing correct inputs.
Methods intended for speed and efficiency may be fast and lean with minimal validation and logging, but should still have an eye toward ease of future debugging.
When reasonable, features like lazy evaluation, caching and memoization, and simple, fast, in-memory built-in tools are always favored.
When heavy, complicated, slow and/or otherwise cumbersome processing is necessary, it should be carefully documented in the comments (see below), and have various levels of logging established.
System-specific variables also use the double-leading-underscore. These should only EvER be accessed through the provided methods.
Examples: __boop.parse, __Math.rawAdd, __boop_classPath
Tier 2 — Semi-Private (_single_underscore)
Management interface. Used to configure, inspect, and control the framework. These are public-facing but not the primary user API. Should validate inputs and produce clear error messages.
Methods and variables in this category manage the system in some way, and/or are used as convenient interfaces to the internals. Users are expected to use these, but the single underscore indicates that they are effectively reserved words and should be used only in the prescribed manner. Improper use voids the warranty.
Examples: _Warn, _LogLevel, _Crash, _Self, _Class
Naming: _MixedCase for functions and variables
Tier 3 — Public (ClassName.method)
End-user facing. These are the methods people call in scripts and on the command line. Must be robust, intuitive, and helpful:
- Validate all inputs; reject garbage with clear error messages
- Produce useful output by default (stdout with newline when no
into=is specified) - Error messages should tell the user what went wrong AND what they should have done instead
- Ideally designed to be comfortable in a
.bash_profilefor CLI use, but should at least be suitable for semi-casual scripting
Examples: Math.DO, Math.add, $obj.volume, $list.push
Comments
Comments are code. They must be accurate, current, and maintained alongside the code they describe. Stale comments are worse than no comments — they actively mislead.
Function Headers
Every function gets a block comment at the top explaining:
- What the function does (one or two sentences AT LEAST - more is better)
- Arguments it expects (name, type, purpose)
- What it returns or produces (value, side effects, exit code)
- Any non-obvious behavior or gotchas
Example:
# Resolve a Math argument to its digits/scale/neg triple.
#
# If the input is an object ID (starts with _ and is in the registry),
# extracts digits, scale, and neg from the object's descriptor.
# Otherwise, parses the input as a literal number string.
#
# Arguments:
# $1 — input value (object ID or numeric string like "3.14" or "-42")
# $2 — nameref: receives digit string (no sign, no decimal point)
# $3 — nameref: receives scale (integer, number of decimal places)
# $4 — nameref: receives neg flag (0=positive, 1=negative)
#
# Returns: nothing (results via namerefs)
# Crashes: if input is not a valid number or object ID
The header should be written for someone who has never seen the function before. Don’t assume the reader knows the internal representation or the calling conventions — state them.
Inline Comments
Use # for structural comments — section dividers, brief annotations,
and anything that explains what the code is doing at a high level.
Use : "explanation" (the : builtin with a string argument) for
comments inside heavy logic sections. The : builtin is effectively
a no-op but its arguments are parsed by bash, which means they appear
in set -vx trace output. This makes them visible during debugging
while # comments are stripped by the parser and invisible in traces.
There is a miniscule performance cost for this; don’t embed subshells!
# Good: structural comment for a section
# === Scale Alignment ===
# Good: colon-comment in a hot loop or complex logic block
: "pad shorter operand with trailing zeros to match scales"
if (( __as_sA < __as_sB )); then
...
fi
Reserve : comments for places where trace visibility has debugging
value — complex algorithms, non-obvious control flow, dispatch logic.
Don’t use them for simple one-liners where # is fine.
These are code. They can only be used where an actual statement can.
Trailing Inline Comments
Short trailing comments are encouraged for clarity, especially for
bash idioms that less experienced developers might not recognize.
Align the # markers at a consistent column so they read as a
clean margin annotation — code on the left, explanation on the right:
__res_neg=${#BASH_REMATCH[1]} # "-" → length 1; empty → 0
local __res_int="${BASH_REMATCH[2]}" # digits before the dot
__res_int="${__res_int#"${__res_int%%[!0]*}"}" # strip leading zeros
: "${__res_int:=0}" # keep at least "0"
This reduces visual clutter and lets the eye scan code and comments independently. Keep them short — if the explanation needs more than a few words, use a block comment above the line instead.
Comment Maintenance
When you change code, update the comments. When you read code and find a comment that’s wrong, fix it on the spot. This is not optional.
Variable Naming
Local Variables
All local variables in methods use the triple-prefix convention:
__ClassName_methodName_varname
This prevents nameref collisions. Bash namerefs resolve by name, not
by lexical scope — if two functions in the call stack both have
local val, a nameref in the inner function binds to the outer
function’s val. The prefix makes every name unique across the
entire call stack.
# Bad — will collide with any caller that also has "result"
local result
# Good — unique to this function
local __Box_volume_result
This is ugly. It is also correct. Do not skip the prefix. Caveat scriptor…
Framework Globals
All framework-level globals use the __boop_ prefix:
__boop_registry # master object/class store
__boop_methodRegistry # method resolution cache
__boop_logLevel # global log level
Inherited Identity Variables
_Self and _Class are set by dispatch wrappers before each method call.
They are effectively reserved words in class method code.
The framework targets bash 4.3+ and deliberately does not use
local -I (a bash 5.0 feature that inherits a variable’s value from the
calling scope). Dispatch wrappers instead set _Self and _Class as
inline variables directly before calling the underlying function.
- Methods read
_Selfand_Classas ordinary variables; the dispatch wrapper has already set them correctly before the call. - Constructors re-localize with
local _Class="${_Class:-ClassName}"so the value is scoped to the constructor frame and defaults correctly. - Internal calls in
boopshould be explicit about setting or clearing_Self/_Classwhen the dispatch wrapper won’t run.
User-Facing Variables
Semi-private variables use single underscore with mixed case: _Self,
_Class, _LogLevel. See Tier 2 — Semi-Private above.
These are generally used for very specific things. For example, if you
explicitly want an object to use a parent’s method instead of its own
overridden version, you can effectively “typecast” the method call -
_Class=$ParentClass $obj.method
This will attempt to use method from $ParentClass instead of the
actual class of $obj.
While the system is designed to be useful on the CLI with convenient
tools like Math.DO "1/(2+3)x4", it’s still built to work as an actual
OOP system, too.
Output
printf, Never echo
echo interprets backslash escapes on some platforms and has
inconsistent behavior across bash versions. printf is predictable
everywhere. Use it for all output.
# Bad
echo "$value"
# Good
printf "%s\n" "$value"
Characters and Encoding
Never use em-dashes or other non-ASCII punctuation in code or
generated output. Use plain ASCII -- (double hyphen) instead.
Em-dashes cause problems with some terminal encodings and are
visually ambiguous in monospace fonts.
For everything else, be contextual. Mathematical symbols like
x, ^2, pi in comments are fine – they make algorithm
documentation more readable and are never parsed by bash.
Unicode card suits in PlayingCard output are fine – they’re
the natural representation.
The rule: prefer simple ASCII in strings the framework generates for others to consume (error messages, log output, serialized data). Use whatever’s appropriate in comments, documentation, and domain-specific display output where the character serves a clear purpose.
Value Returns
All value-producing functions route through boop.pass:
boop.pass "$value" ${into:-}
The ${into:-} passes the caller’s nameref target if one was
provided. If not, the return system uses the current mode (auto,
stdout, global, etc.) to deliver the value.
Shell Options
boop does NOT set shell options (set -e, set -u, set -o pipefail,
etc.). The framework must never alter the caller’s shell environment.
If boop ever needs to temporarily change a shell option internally, it must save and restore it. The caller’s shell options are their business.
All code in the framework must operate correctly regardless of the
user’s shell configuration. A user who sources boop from a script
with set -euo pipefail, or from an interactive shell with custom
IFS, or with shopt -s failglob — all of these must work. The
environment resilience test (tests/environ/test_environ) verifies
this across 15 configurations.
errexit Safety (set -e)
Under set -e, any command that returns non-zero kills the shell —
unless it’s in a conditional context (if, ||, && as part of a
compound that succeeds overall, or a while/until condition).
The [[ ]] && action pattern is a landmine. When the test is
false, the whole line returns non-zero:
# DANGEROUS under set -e: if digits is NOT all zeros, this kills the shell
[[ "${digits//0/}" == "" ]] && neg=0
# SAFE: the || true ensures the overall expression always succeeds
[[ "${digits//0/}" == "" ]] && neg=0 || true
# ALSO SAFE: if/fi is always a conditional context
if [[ "${digits//0/}" == "" ]]; then neg=0; fi
Use || true when the false case is normal (not an error). Use
if/fi when the logic is complex enough to warrant it. The choice
is readability — both are errexit-safe.
Arithmetic expressions return their truth value as an exit code.
(( 0 )) returns 1. (( x++ )) when x is 0 evaluates to 0 (the
old value), which returns exit code 1. Under set -e, this kills.
# DANGEROUS: first iteration when count is 0, (( 0++ )) → exit 1
(( count++ ))
# SAFE: pre-increment evaluates to 1 on first call
(( ++count ))
# SAFE: addition assignment always evaluates to the new value
(( count += 1 ))
# ALSO FINE: inside an assignment context, the exit code doesn't matter
foo[n++]=$x # the assignment succeeds; the arithmetic is internal
The rule isn’t “never use post-increment” — it’s “understand what the
expression evaluates to, because that becomes the exit code.” In an
assignment like arr[n++]=val, the assignment’s success is the exit
code, not the arithmetic’s. But as a standalone statement, the
arithmetic IS the exit code.
IFS Independence
Never rely on the ambient IFS value. The user may have set it to anything — colon, empty, equals sign, or something exotic.
# DANGEROUS: relies on IFS being space/tab/newline for word splitting
local input="$*"
for token in $input; do ...
# SAFE: explicitly set IFS for the scope that needs it
local IFS=$' \t\n'
local input="$*"
for token in $input; do ...
Use local IFS=... to scope IFS to the current function. Bash
restores the previous value automatically when the function returns —
no manual save/restore needed, no risk of missing a restore path on
early return or crash.
When joining array elements with "${array[*]}", always set IFS
explicitly for the join:
# DANGEROUS: joins on whatever IFS happens to be
printf '%s\n' "${arr[*]}"
# SAFE: explicit join character
local IFS=','; printf '%s\n' "${arr[*]}"
# PREFERRED when you don't need a join: iterate instead
printf '%s\n' "${arr[@]}"
nounset Safety (set -u)
Under set -u, referencing an unset variable is an error. Use
${var:-} (default to empty) or ${var:-default} for any variable
that might legitimately be unset:
# DANGEROUS under set -u: crashes if _Class is unset
local _Class="$_Class"
# SAFE: provides a default
local _Class="${_Class:-boop}"
Framework Must Not Alter User Environment
To be explicit: after . boop returns, the user’s IFS, shell options,
shopt settings, and trap state must be exactly as they were before.
local scoping handles IFS. Shell options should never be changed by
framework code at the global level. If a future need arises to
temporarily change an option, use a subshell or save/restore — but
prefer redesigning to avoid the need.
Error Handling
The Two Error Paths: _Crash vs _Error + return 1
Every error in the framework falls into one of two categories with distinct handling:
_Crash — always fatal, regardless of _FatalLevel. Reserved for:
- Shell injection / invalid identifiers (security boundary)
- Framework internal corruption (registry inconsistent, dispatch failure)
- Class/mixin declaration errors at load time (
boopClass,boopMixin) - Version constraint failures (
_Require,boopversion guard) - Abstract method stubs called on non-implementing subclasses
- Invalid framework API misuse (
_Superwith no parent,_Castwith no class)
_Error + return 1 — recoverable. Callers check $? and handle it.
Used for all runtime data conditions: bad input, missing files, empty
collections, invalid arguments to user-facing methods. With the default
_FatalLevel crash, _Error logs the message and continues; with
_FatalLevel error, it escalates to fatal. Either way, the return 1
ensures the function exits with a failure code so callers can check.
# Framework corruption → _Crash (security/integrity boundary)
[[ "$name" =~ $__boop_validate_pat ]] || _Crash "Invalid identifier: '$name'"
# Runtime data condition → _Error + return 1 (recoverable)
[[ -n "$file" ]] || { _Error "Config.load: file path required"; return 1; }
[[ -f "$file" ]] || { _Error "Config.load: file not found: $file"; return 1; }
# In a case arm:
*) _Error "Signal.strict: expected 0/off or 1/on, got '$1'"; return 1 ;;
Do not silently return empty strings or success codes for invalid input.
Do not use _Crash for conditions the caller can reasonably handle.
Tier-Appropriate Validation
- Tier 1 (private): minimal validation. Callers are trusted. Should consider context; lazy private code that creates public code should validate appropriately!
- Tier 2 (semi-private): validate inputs, use
_Error+ return 1 for runtime data conditions;_Crashfor security/framework violations. - Tier 3 (public): validate everything. Error messages should say what was wrong AND suggest the correct usage.
# Tier 3 error message — helpful
_Error "Math.add: invalid number '${input:-}' — expected a numeric value like '3.14' or '-42'"
return 1
2>/dev/null Policy
Only suppress stderr when ALL of these are true:
- You know exactly what error will be produced
- You are expecting that specific error
- The error content has no debugging value
Every 2>/dev/null in the codebase should be reviewable against
these criteria. If it doesn’t pass all three, remove it.
Class File Structure
Every class file follows this structure:
#!/bin/bash
# ClassName — one-line description
#
# Longer description if needed. Explain what the class does, what
# it's for, and any important design decisions.
# Load guard — skip if already registered
# NOTE: This pattern is under review for refactoring. The 2>/dev/null
# suppresses "return outside function" when the file is executed
# directly instead of sourced, which is a debugging hazard under
# set -e. A boop.init replacement is planned.
[[ -n "${__boop_registry[ClassName]+set}" ]] && return 2>/dev/null
. boop [Dependencies]
# Class descriptor
__boop_registry["ClassName"]="..."
# Method implementations (each with a function header comment)
# Method registration
__boop.registerMethod ClassName method ClassName.method
# Finalize
__boop.registerClass ClassName
API Shape
Primitives Inward, Wrappers Outward
When a class exposes the same operation over multiple input forms (a string, a file path, a stream), the reduced form is the primitive. Other entry points are thin wrappers that produce the reduced form and delegate.
For text parsing the reduced form is “lines on stdin.” loadFile reads
the file and pipes into the parser; fromString feeds the string in
via <<<; fromStdin is the parser itself. The parsing logic exists
exactly once.
The inverse — making the file variant the primitive and routing
in-memory data through mktemp, printf >, and rm to reuse it —
is forbidden. It pays for a subshell, two syscalls of disk I/O, and a
tmpfile leak window on _Crash, all to skip a one-function refactor.
A while read; done < "$file" loop and a while read; done <<< "$str"
loop are the same loop — extract it.
The same shape applies elsewhere:
- Serializers: the in-memory form (
toString) is the primitive;save <file>writes its output. Neversaveto a tmpfile thencatit back to stdout. - Iteration: a callback/visitor primitive is the core;
forEach,map,filterwrap it. Never reimplement the walk. - Constructors:
new(empty) is the primitive;fromString,fromFile,fromArraybuild empty then populate via public methods.
Cost of an I/O Round-Trip
For reference, when judging whether to “just route through the existing function”:
| Operation | Approximate cost |
|---|---|
mktemp |
fork + syscalls |
printf '%s' >file |
open/write/close |
done < file (re-read) |
open/read/close |
rm -f file |
fork + unlink |
Subshell $(...) |
fork + pipe + wait |
Compare to extracting the loop body into a private helper: zero. The refactor is cheaper than one invocation of the wrong design.
Liskov Substitution Principle (LSP)
The Liskov Substitution Principle states: if class B inherits from class A, then objects of type B should be usable anywhere objects of type A are expected, without breaking the program’s correctness.
In boop, this means:
-
Inherited methods must work correctly on subclass instances. If
Boxhas avolumemethod, andCubeinherits fromBox, then callingvolumeon a Cube must produce a correct result — even if Cube doesn’t override it. -
Overrides must honor the parent’s contract. If
List.popreturns the last element and crashes on empty, thenStack.pop(which wraps a List) must do the same. A caller who expects List behavior shouldn’t be surprised by Stack behavior. -
Documented divergences are acceptable but must be explicit. When a class intentionally breaks substitutability — like Stream’s
readmethod, which returns 0 on partial-read-at-EOF where bash’s rawreadreturns non-zero — that’s an “LSP divergence.” It’s a conscious design choice, not a bug. Document it in the class file and in the relevant docs with the phrase “LSP divergence” so it’s searchable and unambiguous.
When to Diverge
Diverge when the parent’s behavior is wrong for the use case and
honoring it would force every caller to work around it. Stream’s
read divergence exists because the raw read behavior (returning
non-zero on the last record) causes silent data loss in while loops.
Correctness wins over substitutability.
When you diverge, document:
- What the parent does
- What you do instead
- Why the divergence is correct for this context
Test Files
All tests use the TestSuite class. Test files should be thorough, especially for infrastructure code (logging, dispatch, return system).
Naming
Test files are named test_<subject>_ts (the _ts suffix indicates
TestSuite-based tests). Benchmark and non-TestSuite files omit the
suffix (e.g., test_pi_growth, test_matrix).
Zero-Fork Where Possible
Prefer $(<file) (zero-fork builtin read) over $(command) subshell
capture in test helpers. Use bash -c only for tests that need
process isolation (crash tests, exit code tests).
Verbosity
Default output is quiet (failures + summary only). Full output is
available via TESTSUITE_VERBOSE=1. Tests should work correctly in
both modes.
Refactoring Policy
Sanitize on Sight
Every file touched for other work gets scanned for:
- Unlocalized variables that could inherit unexpected values
- Stale comments that no longer match the code
$self/$classreferences (should be$_Self/$_Class)- Missing function header comments
Fix these on the spot. Don’t create TODO items for them.
Don’t Break the Tests
All changes must pass the full test suite before committing. Currently 514 assertions across 6 TestSuite files.
Class Variant Conventions (::Simple and ::Fast)
Two naming conventions for class variants with specific optimization
axes. Not mandatory – not every class has, needs, or will ever need
one of these. The convention exists so a reader seeing Foo, Foo::Simple,
or Foo::Fast in the tree immediately understands what they’re looking
at without having to read the file for a first clue.
Class::Simple
A minimal-dependency variant for use inside the dependency graph of
core classes. Typically a subset of the full API, no dependencies on
other framework classes beyond the root boop, and focused on the one
or two operations that matter for the intended caller. Examples:
Collection::Map::Simple– plain key-value hash with set/get/has/keys, no ordering, no delimiters, no Iterator.Config::Simple– flatkey=valueparser with#comments and blank-line skipping. No INI sections, no object wrapper, no round-trip.
Class::Fast
An optimized-hot-path variant. Already demonstrated by Collection::Map::Fast
(flat compound-key store, O(1) get/set, no insertion ordering). The
conventional signal is “this sacrifices features for speed.”
What the convention is NOT
- Not a mandate. Most classes will have neither variant.
- Not a symmetry requirement. A class can have
::Simplewithout::Fast, or vice versa. - Not the same as inheritance. The variants don’t share an inheritance chain – each implements what it needs directly.
Class Properties (static/instance)
Design decision (settled): No implicit fallback. Java/C#/Ruby model.
$obj.propreads the instance’s own value from__boop_static["${objId}.${prop}"]ClassName.propreads the class value from__boop_static["${className}.${prop}"]- These are independent. Setting one doesn’t affect the other.
- If a developer wants inherited defaults, they call
$obj.inheritValueFor propin their constructor – explicit, not magic.
Property values live in __boop_static. The descriptor is schema-only
(|class=X|parent=Y|methods=...|properties=...|). Get/set is a single
hash lookup – no regex parse, no encode/decode.
Constructor Preamble: into= Guard
Any constructor (or method) that receives into= from its caller AND
calls other framework code internally must save and clear into at
the top to prevent leakage into subcalls:
MyClass.new() {
local __MyClass_new_into="${into:-}"; into=''
# ... internal work ...
boop.pass "$_Self" ${__MyClass_new_into:+$__MyClass_new_into}
}
See docs/GOTCHAS.md “Environment Prefix Leakage” for the full
explanation of why this is necessary.