firefly-import-preprocessor/README.md
2026-05-06 23:17:54 +02:00

985 lines
26 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Firefly Import Preprocessor
**Version:** 1.0.0
**Date:** 03 May 2026
**Status:** Production Ready
🌐 [Deutsch](README.de.md)
---
## Table of Contents
1. [Overview](#overview)
2. [Installation & Setup](#installation--setup)
3. [Quick Start](#quick-start)
4. [Configuration](#configuration)
5. [Transformation Types](#transformation-types)
6. [CLI Reference](#cli-reference)
7. [Debug Mode](#debug-mode)
8. [Firefly III Integration](#firefly-iii-integration)
9. [Architecture](#architecture)
10. [Error Handling](#error-handling)
---
## Overview
The **Firefly Import Preprocessor** is a production-ready PHP preprocessor for bank CSV export files. It transforms bank data into a standardised format and can optionally import it into Firefly III.
### Core Features
**Full CSV transformation** with complex pipelines
**Metadata extraction** via regex (IBAN, currency, account name)
**14 transformation types** for flexible data processing
**Firefly III integration** — CLI, Docker, and HTTP upload
**Debug mode** for full processing transparency
**Production ready** with complete error handling
**Zero dependencies** for core functionality
### Workflow
```text
Input CSV
Extract metadata (regex)
Transform data rows (pipeline)
Write output CSV
[Optional] Import into Firefly III
```
---
## Installation & Setup
### Requirements
- PHP 8.1+
- Composer (recommended)
- [Optional] Docker for Firefly III integration
### Installation
```bash
# 1. Clone / copy the repository
cd ff-imp-preprocessor
# 2. Install dependencies (optional, dev tools only)
composer install
# 3. Create configuration
cp config/config.example.json config/config.json
# Edit config/config.json with your settings
# 4. Create directories
mkdir -p config/import/{source,output,archive,error}
chmod 755 config/import/{source,output,archive,error}
# 5. Run a test
php bin/transformer.php validate config/config.json input.csv
```
---
## Quick Start
### 1. Adjust configuration
Edit `config/config.json` and make sure the extraction rules match your CSV format:
```json
{
"metadata": {
"extractionRules": [
{
"name": "account_iban",
"lineNumber": 2,
"regex": "IBAN:\\s*([A-Z0-9 ]+)",
"captureGroup": 1
}
]
},
"csvStructure": {
"headerLine": 5,
"delimiter": ";",
"encoding": "UTF-8"
}
}
```
### 2. Validate CSV
```bash
php bin/transformer.php validate config/config.json input.csv
```
### 3. Run transformation
```bash
php bin/transformer.php transform input.csv config/config.json
# With debug mode for troubleshooting
php bin/transformer.php transform input.csv config/config.json --debug
```
### 4. Inspect output
```bash
php bin/transformer.php test input.csv config/config.json --debug
# Shows up to 10 transformed rows and debug logs
```
---
## Configuration
### config.json structure
#### `metadata` — Metadata extraction
```json
{
"metadata": {
"extractionRules": [
{
"name": "account_iban",
"lineNumber": 2,
"regex": "IBAN:\\s*([A-Z0-9 ]+)",
"captureGroup": 1
},
{
"name": "currency_code",
"lineNumber": 3,
"regex": "Currency:\\s*([A-Z]{3})",
"captureGroup": 1
}
]
}
}
```
| Field | Type | Description |
| ------ | ----- | ----------- |
| `name` | string | Name of the metadata variable (used in `constantvalue`) |
| `lineNumber` | int | Line number in CSV (1-based, human-readable) |
| `regex` | string | Regex pattern for extraction (without delimiters) |
| `captureGroup` | int | Capture group index (0 = full match, 1 = first group, etc.) |
**Regex example:**
- Pattern: `IBAN:\s*([A-Z0-9 ]+)`
- Input: `IBAN: CH93 0077 2020 6262 5252 7`
- Capture group 1: `CH93 0077 2020 6262 5252 7`
#### `csvStructure` — CSV format
```json
{
"csvStructure": {
"headerLine": 5,
"delimiter": ";",
"encoding": "UTF-8",
"hasBom": false
}
}
```
| Field | Type | Default | Description |
| ------ | ----- | ------- | ----------- |
| `headerLine` | int | 5 | Line number of the header row (1-based) |
| `delimiter` | string | `;` | CSV delimiter |
| `encoding` | string | `UTF-8` | Character encoding (UTF-8, ISO-8859-1, CP1252) |
| `hasBom` | bool | false | Whether the file has a BOM (Byte Order Mark) |
#### `columnTransformations` — Column transformations
```json
{
"columnTransformations": [
{
"sourceColumn": "BookingDate",
"transformations": [
{
"type": "dateformat",
"fromFormat": "d.m.Y",
"toFormat": "Y-m-d"
}
],
"outputColumn": "date",
"outputAction": "overwrite"
}
]
}
```
**outputAction:**
| Value | Behaviour |
|---|---|
| `overwrite` | Replace the target column with the transformation result (default) |
| `create` | Write the result into a new output column |
| `append` | Concatenate the result to the end of the existing column value. Add `"appendDelimiter": " "` (any string) to insert a separator between the existing and new value — the delimiter is omitted when the target column is still empty |
| `append-if-not-empty` | Same as `append` (including optional `appendDelimiter`) but skips entirely when the transformation result is empty — safe for optional values such as tags or notes lines |
| `append-line` | Same as `append` but the separator is always a newline `\n`; no leading newline when the target is empty |
| `overwrite-if-empty` | Only write the result if the target column is currently empty |
| `overwrite-if-not-empty` | Only write the result if the transformation result is not empty |
#### `directories` — File system
```json
{
"directories": {
"source": "/opt/ff-imp-preprocessor/import/source",
"output": "/opt/ff-imp-preprocessor/import/output",
"archive": "/opt/ff-imp-preprocessor/import/archive",
"error": "/opt/ff-imp-preprocessor/import/error"
}
}
```
| Field | Description |
| ------ | ----------- |
| `source` | Input directory |
| `output` | Output directory |
| `archive` | Archive for processed files |
| `error` | Error directory for invalid files |
#### `fireflyImport` — Firefly III integration
Optional. When present, passing `--do-import` to the `transform` command (or using `auto-import`) will call the Firefly III Data Importer after the output CSV is written.
See [Firefly III Integration](#firefly-iii-integration) for the full field reference and mode-specific examples.
---
## Transformation Types
There are **14 supported transformation types** that can be combined as a pipeline:
### 1. **trim** — Remove whitespace
Removes leading and trailing whitespace.
```json
{ "type": "trim" }
```
- Input: ` Coop Pronto ` → Output: `Coop Pronto`
---
### 2. **lowercase** — Convert to lowercase
Converts to lowercase (UTF-8 safe).
```json
{ "type": "lowercase" }
```
- Input: `COOP PRONTO CHUR` → Output: `coop pronto chur`
---
### 3. **uppercase** — Convert to uppercase
Converts to uppercase (UTF-8 safe).
```json
{ "type": "uppercase" }
```
- Input: `Coop Pronto Chur` → Output: `COOP PRONTO CHUR`
---
### 4. **ucwordsfirst** — Capitalise after word separators
Capitalises the first letter after each word separator.
```json
{ "type": "ucwordsfirst" }
```
- `COOP PRONTO CHUR``Coop Pronto Chur`
- `migros-rail city``Migros-Rail City`
- `O'NEILL STORE``O'Neill Store`
- `SAINT-JEAN-DE-MAURIENNE``Saint-Jean-De-Maurienne`
Separators: space, hyphen, apostrophe, slash, period, comma, semicolon, colon, parentheses.
> **Guard:** If the input already contains *both* uppercase and lowercase letters (mixed-case), it is returned unchanged. This prevents accidentally re-casing intentionally formatted strings such as `"Coop pronto chur"`. Fully uppercase or fully lowercase inputs are always processed.
---
### 5. **replace** — String replacement
Replaces a substring with another string (case-sensitive).
```json
{ "type": "replace", "search": " ", "replace": " " }
```
- Input: `Coop Pronto` (two spaces) → Output: `Coop Pronto` (one space)
---
### 6. **split** — Split column
Splits a value at a delimiter and keeps a defined part.
```json
{ "type": "split", "delimiter": ";", "part": 0 }
```
- Input: `Coop Pronto Chur;7007 Chur` → Output: `Coop Pronto Chur`
---
### 7. **regex** — Regex replacement
Replaces parts of a string using a regular expression. Uses PHP `preg_replace`.
```json
{ "type": "regex", "pattern": "^(.*?);.*$", "replace": "$1" }
```
**No match → original value is passed through unchanged** (pipeline-safe).
Use capture groups as `$1`, `$2`, … in the `replace` field.
A pattern without `^`/`$` anchors replaces only the matched portion, not the whole value.
---
### 8. **regexextract** — Regex extraction
Extracts a capture group and returns **only that**. Uses PHP `preg_match`.
```json
{ "type": "regexextract", "pattern": "(\\d{4,} [^;]+)" }
```
- Input: `Coop Pronto Chur, 7007 Chur` → Output: `7007 Chur`
- No match → empty string
> **⚠ Not pipeline-safe:** A no-match discards all previous pipeline results. Use `regex` instead if you want to preserve the current value on no-match.
---
### 9. **dateformat** — Date reformatting
Converts between date formats.
```json
{ "type": "dateformat", "fromFormat": "d.m.Y", "toFormat": "Y-m-d" }
```
- Input: `10.12.2025` → Output: `2025-12-10`
Supports all PHP `DateTime` format characters.
---
### 10. **truncate** — Truncate string
Truncates a string to a maximum length.
```json
{ "type": "truncate", "maxLength": 100 }
```
---
### 11. **constantvalue** — Constant value from metadata
Injects an extracted metadata value as a constant for every row.
```json
{
"sourceColumn": "_constant_",
"transformations": [
{ "type": "constantvalue", "metadataKey": "account_iban" }
],
"outputColumn": "account_iban",
"outputAction": "create"
}
```
- Every row receives the extracted `account_iban` value (e.g. `CH9300777222666888999`) in a new column.
---
### 12. **map** — Copy / rename column
Copies a column value as-is (optionally to a new name).
```json
{ "type": "map" }
```
---
### 13. **pipeline** — Nested pipeline
Runs a sub-pipeline as a single transformation step.
```json
{
"type": "pipeline",
"steps": [
{ "type": "trim" },
{ "type": "lowercase" },
{ "type": "ucwordsfirst" }
]
}
```
Useful for grouping steps as a logical unit within a `transformations` array.
---
### 14. **timeperiod** — Map time to a period label
Parses a time string and returns the label of the matching period range.
Supports midnight-spanning ranges (e.g. 22:0003:59).
Returns `default` (empty string by default) when no range matches or the input is invalid.
```json
{
"type": "timeperiod",
"timeFormat": "H:i:s",
"periods": [
{ "from": "04:00:00", "to": "08:59:59", "label": "Morgen" },
{ "from": "09:00:00", "to": "10:59:59", "label": "Vormittag" },
{ "from": "11:00:00", "to": "13:59:59", "label": "Mittag" },
{ "from": "14:00:00", "to": "17:59:59", "label": "Nachmittag" },
{ "from": "18:00:00", "to": "21:59:59", "label": "Abend" },
{ "from": "22:00:00", "to": "03:59:59", "label": "Nacht" }
],
"default": ""
}
```
- `"09:30:00"``"Vormittag"`
- `"23:00:00"``"Nacht"` (midnight-spanning range)
- `"02:00:00"``"Nacht"` (midnight-spanning range)
- `""` or unparseable input → `""`
`timeFormat` follows PHP's `DateTime::createFromFormat` syntax (default `H:i:s`).
---
### Row filtering — `skipIf`
Rows can be excluded from the output by adding a top-level `skipIf` key to the configuration.
The value is a filter node — either a bare condition or a nested `and`/`or` group.
**Bare condition:**
```json
"skipIf": { "column": "Buchungstext", "operator": "equals", "value": "Saldovortrag" }
```
**AND group:**
```json
"skipIf": {
"and": [
{ "column": "Beschreibung1", "operator": "empty" },
{ "column": "Beschreibung2", "operator": "empty" }
]
}
```
**Nested AND / OR:**
```json
"skipIf": {
"or": [
{ "column": "Amount", "operator": "gt", "value": "10000" },
{
"and": [
{ "column": "Type", "operator": "equals", "value": "Saldo" },
{ "column": "Notes", "operator": "empty" }
]
}
]
}
```
**Supported operators:**
| Operator | Matches when… |
|---|---|
| `empty` | column value is empty string |
| `not-empty` | column value is not empty |
| `equals` | column value equals `"value"` |
| `not-equals` | column value does not equal `"value"` |
| `contains` | column value contains `"value"` |
| `not-contains` | column value does not contain `"value"` |
| `matches` | column value matches regex `"pattern"` |
| `not-matches` | column value does not match regex `"pattern"` |
| `gt` | `(float) column > (float) value` |
| `gte` | `(float) column >= (float) value` |
| `lt` | `(float) column < (float) value` |
| `lte` | `(float) column <= (float) value` |
---
### Pipeline example
Multiple transformations chained:
```json
{
"sourceColumn": "BookingText",
"transformations": [
{ "type": "trim" },
{ "type": "replace", "search": " ", "replace": " " },
{ "type": "lowercase" },
{ "type": "ucwordsfirst" }
],
"outputColumn": "description",
"outputAction": "overwrite"
}
```
**Processing:**
1. `" COOP PRONTO "` → trim → `"COOP PRONTO"`
2. `"COOP PRONTO"` → replace → `"COOP PRONTO"`
3. `"COOP PRONTO"` → lowercase → `"coop pronto"`
4. `"coop pronto"` → ucwordsfirst → `"Coop Pronto"`
---
## CLI Reference
```bash
php bin/transformer.php <command> [input] [config] [options]
```
### Commands
| Command | Description |
| ------- | ----------- |
| `test` | Test run (up to 10 rows) |
| `transform` | Full transformation |
| `validate` | Validate configuration |
| `auto-import` | Directory monitoring |
| `help` | Show help |
### Options
| Option | Description |
| ------ | ----------- |
| `--debug`, `-d` | Enable debug mode |
| `--rows=N` | Max. N rows (`test` command) |
| `--output=FILE`, `-o` | Output path |
| `--do-import` | Import into Firefly III after transformation (`transform` only) |
| `--strict` | Strict validation |
| `--watch` | Continuous monitoring |
| `--interval=SEC` | Check interval in seconds (default: 60) |
| `--dry-run` | Simulation mode, no real operations |
---
## Debug Mode
```bash
php bin/transformer.php test input.csv config/config.json --debug
```
### Log categories
| Category | When |
| -------- | ---- |
| `transformer` | Start/end of transformation |
| `csv_reader` | While reading CSV |
| `metadata` | During metadata extraction |
| `metadata_warning` | On extraction problems |
| `transformation` | For each transformation step |
| `csv_writer` | While writing output CSV |
### Debug log output (JSON)
```json
{
"success": true,
"debug_logs": [
{
"timestamp": 1702200120.5432,
"category": "transformer",
"message": "Transformation started",
"data": { "inputFile": "input.csv", "maxRows": 0 }
},
{
"timestamp": 1702200120.5445,
"category": "metadata",
"message": "Extraction rule applied",
"data": { "rule_name": "account_iban", "value": "CH93..." }
}
]
}
```
---
## Firefly III Integration
The transformer can automatically import transformed files into Firefly III.
Three operating modes cover all typical deployment scenarios.
### Prerequisites (all modes)
**1. Create a Firefly III Data Importer JSON configuration file**
This file maps transformed CSV columns to Firefly III transaction fields (format v3).
Recommended approach: upload a sample CSV once in the Firefly III Data Importer Web UI, configure the column mapping there, then download the finished configuration. Alternatively, use `config/firefly-import-config.example.json` as a template and adjust `default_account` to your asset account ID.
**2. Choose an operating mode** — see sections below.
---
### `fireflyImport` field reference
| Field | Type | Description |
| --- | --- | --- |
| `mode` | string | Operating mode: `cli` \| `docker` \| `http` (default: `cli`) |
| `jsonConfig` | string | Path to the Firefly III Data Importer JSON config file (format v3). For `cli` and `http` modes the file must exist locally; relative paths are resolved from the **working directory** where `php bin/transformer.php` is invoked (typically the project root). For `docker` mode the path is **inside the container** — local existence is not checked. |
| `importerCommand` | string | Full CLI command *(modes: cli, docker)* |
| `importerUrl` | string | URL of the Data Importer *(mode: http)* |
| `personalSecret` | string | The `AUTO_IMPORT_SECRET` set on the importer server (min. 16 chars). Sent as `?secret=` URL query parameter. *(mode: http)* |
| `accessToken` | string | Firefly III Personal Access Token. Sent as `Authorization: Bearer` header. Required if not already set as `FIREFLY_III_ACCESS_TOKEN` in the importer environment. *(mode: http)* |
| `deleteAfterImport` | boolean | Delete transformed CSV after successful import |
| `chunkSize` | integer | Split the CSV into chunks of at most N data rows and import each chunk as a separate request. Prevents server-side timeouts on large files (rule of thumb: ~34 s/transaction for HTTP mode). `0` or absent = no chunking (default). Applies to all modes. |
| `chunkRetries` | integer | Number of additional import attempts per chunk after the first. On failure the importer retries up to this many times before aborting. `0` or absent = no retry (default). Only effective when `chunkSize > 0`. |
| `chunkRetryDelay` | integer | Pause in seconds before each chunk request after the first, and between retry attempts for the same failed chunk. Addresses both inter-chunk cooldown and retry back-off. `0` or absent = no pause (default). Only effective when `chunkSize > 0`. |
| `connectionTimeout` | integer | Maximum seconds to wait for the TCP connection to the importer to be established. Distinct from `timeout` (full transfer duration). Default: `10`. *(mode: http only)* |
| `timeout` | integer | Timeout in seconds per request (default: 300). For chunked imports this applies per chunk, not for the total run. |
| `environment` | object | Additional environment variables *(modes: cli, docker)* |
---
### Mode `cli` — Transformer and Firefly on the same server
Both the transformer and the Firefly III Data Importer run on the same server. The transformer calls the importer directly as a local command.
```json
"fireflyImport": {
"mode": "cli",
"jsonConfig": "/opt/firefly-data-importer/storage/configurations/ubs-import.json",
"importerCommand": "php /opt/firefly-data-importer/artisan importer:import",
"deleteAfterImport": false,
"chunkSize": 50,
"chunkRetries": 3,
"chunkRetryDelay": 10,
"timeout": 300,
"environment": {
"FIREFLY_III_URL": "https://localhost",
"FIREFLY_III_ACCESS_TOKEN": "your-token-here"
}
}
```
---
### Mode `docker` — Transformer local, Firefly in Docker
The transformer runs locally or in its own container; the Firefly III Data Importer runs in a Docker container. The transformer calls the importer via `docker exec`.
**Important:** The transformer's output directory must be mounted as a volume in the importer container. `jsonConfig` is the path **inside the container** (not a local path). Do not use the `-it` flag (no TTY).
Example `docker-compose.yml` for the importer:
```yaml
services:
firefly-importer:
image: fireflyiii/data-importer:latest
volumes:
- /opt/ff-imp-preprocessor/import:/import
environment:
- FIREFLY_III_URL=https://your-firefly.com
- FIREFLY_III_ACCESS_TOKEN=your-token-here
- CAN_POST_FILES=false
```
```json
"fireflyImport": {
"mode": "docker",
"jsonConfig": "/import/configs/ubs-import.json",
"importerCommand": "docker exec firefly-importer php artisan importer:import",
"deleteAfterImport": false,
"chunkSize": 50,
"chunkRetries": 3,
"chunkRetryDelay": 10,
"timeout": 300
}
```
The JSON config file must be available inside the container — either via a volume mount or `docker cp`:
```bash
docker cp ubs-import.json firefly-importer:/import/configs/ubs-import.json
```
---
### Mode `http` — Transformer local, Firefly importer on a remote server
The transformer runs locally; the Firefly III Data Importer is reachable over HTTP(S). The CSV and JSON configuration are sent as a multipart HTTP upload to the importer.
**Requirements on the importer server:**
```text
CAN_POST_FILES=true
AUTO_IMPORT_SECRET=<secret> # at least 16 characters — set this as personalSecret in your config
```
**Local requirement:** PHP extension `ext-curl`
```json
"fireflyImport": {
"mode": "http",
"importerUrl": "https://importer.your-server.com",
"personalSecret": "your-auto-import-secret-min-16-chars",
"accessToken": "your-firefly-iii-personal-access-token",
"jsonConfig": "config/ubs-import.json",
"deleteAfterImport": false,
"chunkSize": 50,
"chunkRetries": 3,
"chunkRetryDelay": 10,
"connectionTimeout": 10,
"timeout": 300
}
```
The transformer sends a `POST` request to `{importerUrl}/autoupload?secret={personalSecret}` with the CSV and JSON config as multipart form fields. The `accessToken` is sent as `Authorization: Bearer`. If `FIREFLY_III_ACCESS_TOKEN` is already set in the importer's environment, `accessToken` can be omitted.
---
### Server-side tuning
For large imports the bottleneck is usually the Firefly III Data Importer server, not the transformer. The settings below belong in the importer's environment (`.env` or `docker-compose.yml`):
| Setting | Recommended value | Notes |
|---|---|---|
| `PHP_MEMORY_LIMIT` | `512M` `2048M` | Docker env var. Raise when PHP crashes with "Allowed memory size exhausted". |
| `CONNECTION_TIMEOUT` | `60` | Seconds to wait for TCP connect to Firefly III API. Default is ~31 s (π × 10). |
| `IGNORE_DUPLICATE_ERRORS` | `true` | Suppress duplicate-transaction warnings on repeated imports. |
**nginx reverse proxy** (if applicable):
```nginx
proxy_read_timeout 600s; # must exceed the longest single-chunk import time
client_max_body_size 64M; # must accommodate your largest chunk CSV
```
**Docker Compose** example:
```yaml
services:
firefly-importer:
environment:
- PHP_MEMORY_LIMIT=1024M
- CONNECTION_TIMEOUT=60
- IGNORE_DUPLICATE_ERRORS=true
```
---
### Usage
```bash
# Transform only (no import)
php bin/transformer.php transform input.csv config/config.json
# Transform and import into Firefly III
php bin/transformer.php transform input.csv config/config.json --do-import
# Watch mode: transform and import automatically for each new CSV in source directory
php bin/transformer.php auto-import config/config.json --watch
```
---
## Architecture
### Components
```text
bin/transformer.php (CLI entry point)
TransformerEngine (orchestration)
├─ ConfigurationLoader (load / validate config)
├─ CsvReader (read CSV)
├─ MetadataExtractor (metadata via regex)
├─ ColumnTransformer (apply transformations)
├─ CsvWriter (write CSV)
├─ FireflyImporter (Firefly III integration)
└─ DebugLogger (debug logs)
```
### Data flow
```text
Input CSV
CsvReader::readMetadataLines() → array of lines
MetadataExtractor::extract() → {iban: "...", currency: "..."}
CsvReader::readCsvData() → array of rows
ColumnTransformer::transformRow() → transformed row (pipeline)
CsvWriter::write() → output CSV
```
### Classes
| Class | Responsibility |
| ----- | -------------- |
| `TransformerEngine` | Orchestrates the entire workflow |
| `ConfigurationLoader` | Loads and validates JSON configuration |
| `CsvReader` | Reads CSV with metadata support |
| `MetadataExtractor` | Extracts metadata via regex |
| `ColumnTransformer` | Transforms columns (pipeline) |
| `CsvWriter` | Writes output CSV |
| `FireflyImporter` | Imports into Firefly III |
| `DebugLogger` | Static logger for debug output |
---
## Error Handling
### Common errors
#### "Input file not found"
```bash
# Check the file path
ls -la input.csv
# Use an absolute path if relative paths do not work
php bin/transformer.php transform /absolute/path/input.csv config.json
```
---
#### "Missing metadata: account_iban"
The IBAN could not be extracted — wrong regex or wrong line number.
```bash
# Inspect the first lines of the CSV
head -5 input.csv
# Validate with debug output
php bin/transformer.php validate config.json input.csv --debug
```
---
#### "Invalid JSON: …"
Syntax error in `config.json`.
```bash
php -r "json_decode(file_get_contents('config/config.json'), true) or die('JSON invalid');"
```
---
#### "Configuration: 'csvStructure.headerLine' required"
A required configuration field is missing.
```bash
diff config/config.json config/config.example.json
```
---
### Exception handling
```php
try {
$result = $engine->transform($inputFile);
if (!$result['success']) {
echo "Error: " . $result['error'];
}
} catch (Exception $e) {
echo "Fatal error: " . $e->getMessage();
}
```
---
## Tips
### UTF-8 handling
The transformer uses UTF-8 safe functions throughout:
- `mb_strtolower()` instead of `strtolower()`
- `mb_strtoupper()` instead of `strtoupper()`
- `mb_strlen()` for correct character counting
Supported encodings: UTF-8, ISO-8859-1, CP1252.
### Regex tips
**Pattern without delimiters (auto-wrapped):**
```json
"pattern": "IBAN:\\s*([A-Z0-9 ]+)"
// becomes: /IBAN:\s*([A-Z0-9 ]+)/u
```
**With explicit flags:**
```json
"pattern": "/IBAN:\\s*([A-Z0-9 ]+)/iu"
// case-insensitive
```
### Performance
- **Optimised for:** up to 1 million rows
- **Typical file size:** 10100 k rows
### Batch processing
```bash
#!/bin/bash
for file in import/source/*.csv; do
php bin/transformer.php transform "$file" config/config.json
if [ $? -eq 0 ]; then
mv "$file" import/archive/
else
mv "$file" import/error/
fi
done
```
---
## Version History
**v1.0.0 (03 May 2026)**
- ✅ Initial release
- ✅ 14 transformation types
- ✅ Metadata extraction via regex
- ✅ Debug mode
- ✅ Firefly III integration (cli / docker / http)
- ✅ Full documentation
---
**License:** GPL-3.0
**Author:** PHP CSV Transformer Project
**Repository:** [git.andare.ch/david.reindl/ff-imp-preprocessor](https://git.andare.ch/david.reindl/ff-imp-preprocessor)