# Firefly Import Preprocessor — Agent Instructions PHP 8.1+ CLI ETL tool that transforms bank CSV exports (UBS E-Banking) into Firefly III-compatible format. See [README.md](README.md) for full documentation. ## Build & Test ```bash composer test # PHPUnit tests composer lint # phpcs PSR-12 check (src/ bin/) composer lint-fix # phpcbf auto-fix composer analyze # phpstan level 8 composer psalm # Psalm static analysis ``` ### Test Suite Overview 85 tests across 5 test classes: | File | Tests | Scope | |------|-------|-------| | `tests/ColumnTransformerTest.php` | 37 | All 13 transformation types, edge cases | | `tests/ConfigurationLoaderTest.php` | 18 | JSON loading, dot-notation access, validation | | `tests/CsvReaderTest.php` | 15 | CSV parsing, BOM handling, delimiter, encoding | | `tests/MetadataExtractorTest.php` | 14 | Pre-header regex extraction, edge cases | | `tests/ConfigIntegrationTest.php` | 1× per fixture | Golden-file integration tests (see below) | ### Integration Tests (Golden-File Pattern) `ConfigIntegrationTest` auto-discovers every subdirectory in `tests/fixtures/` and runs a full transform pipeline against it. For each fixture directory `tests/fixtures//`: - `input.csv` — minimal representative CSV input - `expected.csv` — exact expected output after transformation - `config/.json` must exist in the project root config dir **Currently active fixtures:** `config-ubs-account` **Adding a new fixture:** create the directory, add `input.csv` and `expected.csv`, ensure the matching `config/.json` exists. No code changes required — the provider discovers it automatically. **Regenerating `expected.csv`** after a config change (replace `` accordingly): ```bash php -r " require 'vendor/autoload.php'; use UbsCsvTransformer\ConfigurationLoader; use UbsCsvTransformer\TransformerEngine; \$tmpConfig = sys_get_temp_dir() . '/gen.json'; \$cfg = json_decode(file_get_contents('config/.json'), true); \$cfg['directories']['output'] = 'tests/fixtures/'; \$cfg['csvStructure']['outputFilename'] = 'expected.csv'; file_put_contents(\$tmpConfig, json_encode(\$cfg, JSON_UNESCAPED_UNICODE)); \$loader = new ConfigurationLoader(\$tmpConfig); \$loader->load(); \$engine = new TransformerEngine(\$loader); \$result = \$engine->transform('tests/fixtures//input.csv'); unlink(\$tmpConfig); echo \$result['success'] ? 'OK' . PHP_EOL : 'ERROR: ' . \$result['error'] . PHP_EOL; " ``` Run the tool: ```bash php bin/transformer.php test input.csv config/config.json --rows=5 php bin/transformer.php transform input.csv config/config.json --output=output.csv php bin/transformer.php validate config/config.json --strict php bin/transformer.php auto-import config/config.json --watch # Add --debug / -d for verbose output ``` ## Architecture ```bash bin/transformer.php → TransformerEngine ├── ConfigurationLoader (JSON config) ├── CsvReader (reads + BOM handling) ├── MetadataExtractor (regex on pre-header lines) ├── ColumnTransformer (transformation pipeline) ├── CsvWriter (output CSV) └── FireflyImporter (optional, shells to Firefly CLI) ``` `DebugLogger` is a static helper used across all components; activated by the `--debug` flag. `TransformerEngine` instantiates `CsvReader` per call (in `transform()`/`validate()`), not in the constructor. ## Conventions - **PSR-12** enforced via phpcs using `phpcs.xml` (auto-discovered at root). Line length: soft 120, hard 150 chars. - **PHPStan level 8** with `checkMissingCallableSignature: true`. `phpstan-baseline.neon` is empty — do not add suppressions without good reason. - **All source comments and docblocks are written in German.** - Namespace `UbsCsvTransformer\` (PSR-4 → `src/`); tests use `UbsCsvTransformer\Tests\` (→ `tests/`). - No runtime package dependencies — only `ext-json` and `ext-mbstring`. ## Config Format See [config/config.example.json](config/config.example.json) for a full reference. Three top-level sections: - **`metadata.extractionRules`** — regex rules against 1-based pre-header line numbers - **`csvStructure`** — `headerLine`, `delimiter`, `encoding`, `hasBom` - **`columnTransformations`** — array of per-column transformation pipelines ### Key patterns in config - `"sourceColumn": "_constant_"` — injects an extracted metadata value (e.g. IBAN) as a new output column without reading a CSV column - `"outputAction": "create"` vs `"overwrite"` — controls whether the result is a new column or replaces an existing one - `MetadataExtractor` uses 1-based `lineNumber` in config; it converts to 0-based array index internally Supported transformation types: `map`, `replace`, `regex`, `regexextract`, `dateformat`, `split`, `trim`, `uppercase`, `lowercase`, `ucwordsfirst`, `truncate`, `constantvalue`, `pipeline`