firefly-import-preprocessor/AGENTS.md
2026-05-06 23:17:54 +02:00

112 lines
5.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Firefly Import Preprocessor — Agent Instructions
PHP 8.1+ CLI ETL tool that transforms bank CSV exports (UBS E-Banking) into Firefly III-compatible format. See [README.md](README.md) for full documentation.
## Build & Test
```bash
composer test # PHPUnit tests
composer lint # phpcs PSR-12 check (src/ bin/)
composer lint-fix # phpcbf auto-fix
composer analyze # phpstan level 8
composer psalm # Psalm static analysis
```
### Test Suite Overview
129 tests across 7 test classes:
| File | Tests | Scope |
| ------ | -------: | ------- |
| `tests/ColumnTransformerTest.php` | 51 | All 14 transformation types, edge cases |
| `tests/ConfigurationLoaderTest.php` | 18 | JSON loading, dot-notation access, validation |
| `tests/CsvReaderTest.php` | 15 | CSV parsing, BOM handling, delimiter, encoding |
| `tests/MetadataExtractorTest.php` | 14 | Pre-header regex extraction, edge cases |
| `tests/ConfigIntegrationTest.php` | 1× per fixture | Golden-file integration tests (see below) |
| `tests/RowFilterTest.php` | 19 | skipIf conditions, all operators, nested AND/OR groups |
| `tests/FireflyImporterChunkStateTest.php` | 11 | Chunk state persistence, resume, reset |
### Integration Tests (Golden-File Pattern)
`ConfigIntegrationTest` auto-discovers every subdirectory in `tests/fixtures/` and runs a full transform pipeline against it. For each fixture directory `tests/fixtures/<name>/`:
- `input.csv` — minimal representative CSV input
- `expected.csv` — exact expected output after transformation
- `config/<name>.json` must exist in the project root config dir
**Currently active fixtures:** `config-ubs-account`
**Adding a new fixture:** create the directory, add `input.csv` and `expected.csv`, ensure the matching `config/<name>.json` exists. No code changes required — the provider discovers it automatically.
**Regenerating `expected.csv`** after a config change (replace `<name>` accordingly):
```bash
php -r "
require 'vendor/autoload.php';
use UbsCsvTransformer\ConfigurationLoader;
use UbsCsvTransformer\TransformerEngine;
\$tmpConfig = sys_get_temp_dir() . '/gen.json';
\$cfg = json_decode(file_get_contents('config/<name>.json'), true);
\$cfg['directories']['output'] = 'tests/fixtures/<name>';
\$cfg['csvStructure']['outputFilename'] = 'expected.csv';
file_put_contents(\$tmpConfig, json_encode(\$cfg, JSON_UNESCAPED_UNICODE));
\$loader = new ConfigurationLoader(\$tmpConfig); \$loader->load();
\$engine = new TransformerEngine(\$loader);
\$result = \$engine->transform('tests/fixtures/<name>/input.csv');
unlink(\$tmpConfig);
echo \$result['success'] ? 'OK' . PHP_EOL : 'ERROR: ' . \$result['error'] . PHP_EOL;
"
```
Run the tool:
```bash
php bin/transformer.php test input.csv config/config.json --rows=5
php bin/transformer.php transform input.csv config/config.json --output=output.csv
php bin/transformer.php validate config/config.json --strict
php bin/transformer.php auto-import config/config.json --watch
# Add --debug / -d for verbose output
```
## Architecture
```bash
bin/transformer.php → TransformerEngine
├── ConfigurationLoader (JSON config)
├── CsvReader (reads + BOM handling)
├── MetadataExtractor (regex on pre-header lines)
├── ColumnTransformer (transformation pipeline)
├── CsvWriter (output CSV)
└── FireflyImporter (optional, shells to Firefly CLI)
```
`DebugLogger` is a static helper used across all components; activated by the `--debug` flag.
`TransformerEngine` instantiates `CsvReader` per call (in `transform()`/`validate()`), not in the constructor.
## Conventions
- **PSR-12** enforced via phpcs using `phpcs.xml` (auto-discovered at root). Line length: soft 120, hard 150 chars.
- **PHPStan level 8** with `checkMissingCallableSignature: true`. `phpstan-baseline.neon` is empty — do not add suppressions without good reason.
- **All source comments and docblocks are written in English.**
- **Documentation language:** `README.md` is the primary documentation in **English**. `README.de.md` is the German translation. Both cross-link to each other at the top.
- **`showHelp()` in `bin/transformer.php`** is locale-aware: English is the default; German is shown when `isGermanLocale()` returns `true` (checks `LANG`, `LC_ALL`, `LC_MESSAGES`, `LANGUAGE` env vars for a `de` prefix).
- **License:** GPL-3.0.
- Namespace `UbsCsvTransformer\` (PSR-4 → `src/`); tests use `UbsCsvTransformer\Tests\` (→ `tests/`).
- No runtime package dependencies — only `ext-json` and `ext-mbstring`.
## Config Format
See [config/config.example.json](config/config.example.json) for a full reference. Three top-level sections:
- **`metadata.extractionRules`** — regex rules against 1-based pre-header line numbers
- **`csvStructure`** — `headerLine`, `delimiter`, `encoding`, `hasBom`
- **`columnTransformations`** — array of per-column transformation pipelines
### Key patterns in config
- `"sourceColumn": "_constant_"` — injects an extracted metadata value (e.g. IBAN) as a new output column without reading a CSV column
- `"outputAction": "create"` vs `"overwrite"` — controls whether the result is a new column or replaces an existing one
- `MetadataExtractor` uses 1-based `lineNumber` in config; it converts to 0-based array index internally
Supported transformation types: `map`, `replace`, `regex`, `regexextract`, `dateformat`, `split`, `trim`, `uppercase`, `lowercase`, `ucwordsfirst`, `truncate`, `constantvalue`, `pipeline`, `timeperiod`