985 lines
26 KiB
Markdown
985 lines
26 KiB
Markdown
# Firefly Import Preprocessor
|
||
|
||
**Version:** 1.0.0
|
||
**Date:** 03 May 2026
|
||
**Status:** Production Ready
|
||
|
||
🌐 [Deutsch](README.de.md)
|
||
|
||
---
|
||
|
||
## Table of Contents
|
||
|
||
1. [Overview](#overview)
|
||
2. [Installation & Setup](#installation--setup)
|
||
3. [Quick Start](#quick-start)
|
||
4. [Configuration](#configuration)
|
||
5. [Transformation Types](#transformation-types)
|
||
6. [CLI Reference](#cli-reference)
|
||
7. [Debug Mode](#debug-mode)
|
||
8. [Firefly III Integration](#firefly-iii-integration)
|
||
9. [Architecture](#architecture)
|
||
10. [Error Handling](#error-handling)
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
The **Firefly Import Preprocessor** is a production-ready PHP preprocessor for bank CSV export files. It transforms bank data into a standardised format and can optionally import it into Firefly III.
|
||
|
||
### Core Features
|
||
|
||
✅ **Full CSV transformation** with complex pipelines
|
||
✅ **Metadata extraction** via regex (IBAN, currency, account name)
|
||
✅ **14 transformation types** for flexible data processing
|
||
✅ **Firefly III integration** — CLI, Docker, and HTTP upload
|
||
✅ **Debug mode** for full processing transparency
|
||
✅ **Production ready** with complete error handling
|
||
✅ **Zero dependencies** for core functionality
|
||
|
||
### Workflow
|
||
|
||
```text
|
||
Input CSV
|
||
↓
|
||
Extract metadata (regex)
|
||
↓
|
||
Transform data rows (pipeline)
|
||
↓
|
||
Write output CSV
|
||
↓
|
||
[Optional] Import into Firefly III
|
||
```
|
||
|
||
---
|
||
|
||
## Installation & Setup
|
||
|
||
### Requirements
|
||
|
||
- PHP 8.1+
|
||
- Composer (recommended)
|
||
- [Optional] Docker for Firefly III integration
|
||
|
||
### Installation
|
||
|
||
```bash
|
||
# 1. Clone / copy the repository
|
||
cd ff-imp-preprocessor
|
||
|
||
# 2. Install dependencies (optional, dev tools only)
|
||
composer install
|
||
|
||
# 3. Create configuration
|
||
cp config/config.example.json config/config.json
|
||
# Edit config/config.json with your settings
|
||
|
||
# 4. Create directories
|
||
mkdir -p config/import/{source,output,archive,error}
|
||
chmod 755 config/import/{source,output,archive,error}
|
||
|
||
# 5. Run a test
|
||
php bin/transformer.php validate config/config.json input.csv
|
||
```
|
||
|
||
---
|
||
|
||
## Quick Start
|
||
|
||
### 1. Adjust configuration
|
||
|
||
Edit `config/config.json` and make sure the extraction rules match your CSV format:
|
||
|
||
```json
|
||
{
|
||
"metadata": {
|
||
"extractionRules": [
|
||
{
|
||
"name": "account_iban",
|
||
"lineNumber": 2,
|
||
"regex": "IBAN:\\s*([A-Z0-9 ]+)",
|
||
"captureGroup": 1
|
||
}
|
||
]
|
||
},
|
||
"csvStructure": {
|
||
"headerLine": 5,
|
||
"delimiter": ";",
|
||
"encoding": "UTF-8"
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2. Validate CSV
|
||
|
||
```bash
|
||
php bin/transformer.php validate config/config.json input.csv
|
||
```
|
||
|
||
### 3. Run transformation
|
||
|
||
```bash
|
||
php bin/transformer.php transform input.csv config/config.json
|
||
|
||
# With debug mode for troubleshooting
|
||
php bin/transformer.php transform input.csv config/config.json --debug
|
||
```
|
||
|
||
### 4. Inspect output
|
||
|
||
```bash
|
||
php bin/transformer.php test input.csv config/config.json --debug
|
||
# Shows up to 10 transformed rows and debug logs
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration
|
||
|
||
### config.json structure
|
||
|
||
#### `metadata` — Metadata extraction
|
||
|
||
```json
|
||
{
|
||
"metadata": {
|
||
"extractionRules": [
|
||
{
|
||
"name": "account_iban",
|
||
"lineNumber": 2,
|
||
"regex": "IBAN:\\s*([A-Z0-9 ]+)",
|
||
"captureGroup": 1
|
||
},
|
||
{
|
||
"name": "currency_code",
|
||
"lineNumber": 3,
|
||
"regex": "Currency:\\s*([A-Z]{3})",
|
||
"captureGroup": 1
|
||
}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
| Field | Type | Description |
|
||
| ------ | ----- | ----------- |
|
||
| `name` | string | Name of the metadata variable (used in `constantvalue`) |
|
||
| `lineNumber` | int | Line number in CSV (1-based, human-readable) |
|
||
| `regex` | string | Regex pattern for extraction (without delimiters) |
|
||
| `captureGroup` | int | Capture group index (0 = full match, 1 = first group, etc.) |
|
||
|
||
**Regex example:**
|
||
|
||
- Pattern: `IBAN:\s*([A-Z0-9 ]+)`
|
||
- Input: `IBAN: CH93 0077 2020 6262 5252 7`
|
||
- Capture group 1: `CH93 0077 2020 6262 5252 7`
|
||
|
||
#### `csvStructure` — CSV format
|
||
|
||
```json
|
||
{
|
||
"csvStructure": {
|
||
"headerLine": 5,
|
||
"delimiter": ";",
|
||
"encoding": "UTF-8",
|
||
"hasBom": false
|
||
}
|
||
}
|
||
```
|
||
|
||
| Field | Type | Default | Description |
|
||
| ------ | ----- | ------- | ----------- |
|
||
| `headerLine` | int | 5 | Line number of the header row (1-based) |
|
||
| `delimiter` | string | `;` | CSV delimiter |
|
||
| `encoding` | string | `UTF-8` | Character encoding (UTF-8, ISO-8859-1, CP1252) |
|
||
| `hasBom` | bool | false | Whether the file has a BOM (Byte Order Mark) |
|
||
|
||
#### `columnTransformations` — Column transformations
|
||
|
||
```json
|
||
{
|
||
"columnTransformations": [
|
||
{
|
||
"sourceColumn": "BookingDate",
|
||
"transformations": [
|
||
{
|
||
"type": "dateformat",
|
||
"fromFormat": "d.m.Y",
|
||
"toFormat": "Y-m-d"
|
||
}
|
||
],
|
||
"outputColumn": "date",
|
||
"outputAction": "overwrite"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**outputAction:**
|
||
|
||
| Value | Behaviour |
|
||
|---|---|
|
||
| `overwrite` | Replace the target column with the transformation result (default) |
|
||
| `create` | Write the result into a new output column |
|
||
| `append` | Concatenate the result to the end of the existing column value. Add `"appendDelimiter": " "` (any string) to insert a separator between the existing and new value — the delimiter is omitted when the target column is still empty |
|
||
| `append-if-not-empty` | Same as `append` (including optional `appendDelimiter`) but skips entirely when the transformation result is empty — safe for optional values such as tags or notes lines |
|
||
| `append-line` | Same as `append` but the separator is always a newline `\n`; no leading newline when the target is empty |
|
||
| `overwrite-if-empty` | Only write the result if the target column is currently empty |
|
||
| `overwrite-if-not-empty` | Only write the result if the transformation result is not empty |
|
||
|
||
#### `directories` — File system
|
||
|
||
```json
|
||
{
|
||
"directories": {
|
||
"source": "/opt/ff-imp-preprocessor/import/source",
|
||
"output": "/opt/ff-imp-preprocessor/import/output",
|
||
"archive": "/opt/ff-imp-preprocessor/import/archive",
|
||
"error": "/opt/ff-imp-preprocessor/import/error"
|
||
}
|
||
}
|
||
```
|
||
|
||
| Field | Description |
|
||
| ------ | ----------- |
|
||
| `source` | Input directory |
|
||
| `output` | Output directory |
|
||
| `archive` | Archive for processed files |
|
||
| `error` | Error directory for invalid files |
|
||
|
||
#### `fireflyImport` — Firefly III integration
|
||
|
||
Optional. When present, passing `--do-import` to the `transform` command (or using `auto-import`) will call the Firefly III Data Importer after the output CSV is written.
|
||
|
||
See [Firefly III Integration](#firefly-iii-integration) for the full field reference and mode-specific examples.
|
||
|
||
---
|
||
|
||
## Transformation Types
|
||
|
||
There are **14 supported transformation types** that can be combined as a pipeline:
|
||
|
||
### 1. **trim** — Remove whitespace
|
||
|
||
Removes leading and trailing whitespace.
|
||
|
||
```json
|
||
{ "type": "trim" }
|
||
```
|
||
|
||
- Input: ` Coop Pronto ` → Output: `Coop Pronto`
|
||
|
||
---
|
||
|
||
### 2. **lowercase** — Convert to lowercase
|
||
|
||
Converts to lowercase (UTF-8 safe).
|
||
|
||
```json
|
||
{ "type": "lowercase" }
|
||
```
|
||
|
||
- Input: `COOP PRONTO CHUR` → Output: `coop pronto chur`
|
||
|
||
---
|
||
|
||
### 3. **uppercase** — Convert to uppercase
|
||
|
||
Converts to uppercase (UTF-8 safe).
|
||
|
||
```json
|
||
{ "type": "uppercase" }
|
||
```
|
||
|
||
- Input: `Coop Pronto Chur` → Output: `COOP PRONTO CHUR`
|
||
|
||
---
|
||
|
||
### 4. **ucwordsfirst** — Capitalise after word separators
|
||
|
||
Capitalises the first letter after each word separator.
|
||
|
||
```json
|
||
{ "type": "ucwordsfirst" }
|
||
```
|
||
|
||
- `COOP PRONTO CHUR` → `Coop Pronto Chur`
|
||
- `migros-rail city` → `Migros-Rail City`
|
||
- `O'NEILL STORE` → `O'Neill Store`
|
||
- `SAINT-JEAN-DE-MAURIENNE` → `Saint-Jean-De-Maurienne`
|
||
|
||
Separators: space, hyphen, apostrophe, slash, period, comma, semicolon, colon, parentheses.
|
||
|
||
> **Guard:** If the input already contains *both* uppercase and lowercase letters (mixed-case), it is returned unchanged. This prevents accidentally re-casing intentionally formatted strings such as `"Coop pronto chur"`. Fully uppercase or fully lowercase inputs are always processed.
|
||
|
||
---
|
||
|
||
### 5. **replace** — String replacement
|
||
|
||
Replaces a substring with another string (case-sensitive).
|
||
|
||
```json
|
||
{ "type": "replace", "search": " ", "replace": " " }
|
||
```
|
||
|
||
- Input: `Coop Pronto` (two spaces) → Output: `Coop Pronto` (one space)
|
||
|
||
---
|
||
|
||
### 6. **split** — Split column
|
||
|
||
Splits a value at a delimiter and keeps a defined part.
|
||
|
||
```json
|
||
{ "type": "split", "delimiter": ";", "part": 0 }
|
||
```
|
||
|
||
- Input: `Coop Pronto Chur;7007 Chur` → Output: `Coop Pronto Chur`
|
||
|
||
---
|
||
|
||
### 7. **regex** — Regex replacement
|
||
|
||
Replaces parts of a string using a regular expression. Uses PHP `preg_replace`.
|
||
|
||
```json
|
||
{ "type": "regex", "pattern": "^(.*?);.*$", "replace": "$1" }
|
||
```
|
||
|
||
**No match → original value is passed through unchanged** (pipeline-safe).
|
||
|
||
Use capture groups as `$1`, `$2`, … in the `replace` field.
|
||
A pattern without `^`/`$` anchors replaces only the matched portion, not the whole value.
|
||
|
||
---
|
||
|
||
### 8. **regexextract** — Regex extraction
|
||
|
||
Extracts a capture group and returns **only that**. Uses PHP `preg_match`.
|
||
|
||
```json
|
||
{ "type": "regexextract", "pattern": "(\\d{4,} [^;]+)" }
|
||
```
|
||
|
||
- Input: `Coop Pronto Chur, 7007 Chur` → Output: `7007 Chur`
|
||
- No match → empty string
|
||
|
||
> **⚠ Not pipeline-safe:** A no-match discards all previous pipeline results. Use `regex` instead if you want to preserve the current value on no-match.
|
||
|
||
---
|
||
|
||
### 9. **dateformat** — Date reformatting
|
||
|
||
Converts between date formats.
|
||
|
||
```json
|
||
{ "type": "dateformat", "fromFormat": "d.m.Y", "toFormat": "Y-m-d" }
|
||
```
|
||
|
||
- Input: `10.12.2025` → Output: `2025-12-10`
|
||
|
||
Supports all PHP `DateTime` format characters.
|
||
|
||
---
|
||
|
||
### 10. **truncate** — Truncate string
|
||
|
||
Truncates a string to a maximum length.
|
||
|
||
```json
|
||
{ "type": "truncate", "maxLength": 100 }
|
||
```
|
||
|
||
---
|
||
|
||
### 11. **constantvalue** — Constant value from metadata
|
||
|
||
Injects an extracted metadata value as a constant for every row.
|
||
|
||
```json
|
||
{
|
||
"sourceColumn": "_constant_",
|
||
"transformations": [
|
||
{ "type": "constantvalue", "metadataKey": "account_iban" }
|
||
],
|
||
"outputColumn": "account_iban",
|
||
"outputAction": "create"
|
||
}
|
||
```
|
||
|
||
- Every row receives the extracted `account_iban` value (e.g. `CH9300777222666888999`) in a new column.
|
||
|
||
---
|
||
|
||
### 12. **map** — Copy / rename column
|
||
|
||
Copies a column value as-is (optionally to a new name).
|
||
|
||
```json
|
||
{ "type": "map" }
|
||
```
|
||
|
||
---
|
||
|
||
### 13. **pipeline** — Nested pipeline
|
||
|
||
Runs a sub-pipeline as a single transformation step.
|
||
|
||
```json
|
||
{
|
||
"type": "pipeline",
|
||
"steps": [
|
||
{ "type": "trim" },
|
||
{ "type": "lowercase" },
|
||
{ "type": "ucwordsfirst" }
|
||
]
|
||
}
|
||
```
|
||
|
||
Useful for grouping steps as a logical unit within a `transformations` array.
|
||
|
||
---
|
||
|
||
### 14. **timeperiod** — Map time to a period label
|
||
|
||
Parses a time string and returns the label of the matching period range.
|
||
Supports midnight-spanning ranges (e.g. 22:00–03:59).
|
||
Returns `default` (empty string by default) when no range matches or the input is invalid.
|
||
|
||
```json
|
||
{
|
||
"type": "timeperiod",
|
||
"timeFormat": "H:i:s",
|
||
"periods": [
|
||
{ "from": "04:00:00", "to": "08:59:59", "label": "Morgen" },
|
||
{ "from": "09:00:00", "to": "10:59:59", "label": "Vormittag" },
|
||
{ "from": "11:00:00", "to": "13:59:59", "label": "Mittag" },
|
||
{ "from": "14:00:00", "to": "17:59:59", "label": "Nachmittag" },
|
||
{ "from": "18:00:00", "to": "21:59:59", "label": "Abend" },
|
||
{ "from": "22:00:00", "to": "03:59:59", "label": "Nacht" }
|
||
],
|
||
"default": ""
|
||
}
|
||
```
|
||
|
||
- `"09:30:00"` → `"Vormittag"`
|
||
- `"23:00:00"` → `"Nacht"` (midnight-spanning range)
|
||
- `"02:00:00"` → `"Nacht"` (midnight-spanning range)
|
||
- `""` or unparseable input → `""`
|
||
|
||
`timeFormat` follows PHP's `DateTime::createFromFormat` syntax (default `H:i:s`).
|
||
|
||
---
|
||
|
||
### Row filtering — `skipIf`
|
||
|
||
Rows can be excluded from the output by adding a top-level `skipIf` key to the configuration.
|
||
The value is a filter node — either a bare condition or a nested `and`/`or` group.
|
||
|
||
**Bare condition:**
|
||
|
||
```json
|
||
"skipIf": { "column": "Buchungstext", "operator": "equals", "value": "Saldovortrag" }
|
||
```
|
||
|
||
**AND group:**
|
||
|
||
```json
|
||
"skipIf": {
|
||
"and": [
|
||
{ "column": "Beschreibung1", "operator": "empty" },
|
||
{ "column": "Beschreibung2", "operator": "empty" }
|
||
]
|
||
}
|
||
```
|
||
|
||
**Nested AND / OR:**
|
||
|
||
```json
|
||
"skipIf": {
|
||
"or": [
|
||
{ "column": "Amount", "operator": "gt", "value": "10000" },
|
||
{
|
||
"and": [
|
||
{ "column": "Type", "operator": "equals", "value": "Saldo" },
|
||
{ "column": "Notes", "operator": "empty" }
|
||
]
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**Supported operators:**
|
||
|
||
| Operator | Matches when… |
|
||
|---|---|
|
||
| `empty` | column value is empty string |
|
||
| `not-empty` | column value is not empty |
|
||
| `equals` | column value equals `"value"` |
|
||
| `not-equals` | column value does not equal `"value"` |
|
||
| `contains` | column value contains `"value"` |
|
||
| `not-contains` | column value does not contain `"value"` |
|
||
| `matches` | column value matches regex `"pattern"` |
|
||
| `not-matches` | column value does not match regex `"pattern"` |
|
||
| `gt` | `(float) column > (float) value` |
|
||
| `gte` | `(float) column >= (float) value` |
|
||
| `lt` | `(float) column < (float) value` |
|
||
| `lte` | `(float) column <= (float) value` |
|
||
|
||
---
|
||
|
||
### Pipeline example
|
||
|
||
Multiple transformations chained:
|
||
|
||
```json
|
||
{
|
||
"sourceColumn": "BookingText",
|
||
"transformations": [
|
||
{ "type": "trim" },
|
||
{ "type": "replace", "search": " ", "replace": " " },
|
||
{ "type": "lowercase" },
|
||
{ "type": "ucwordsfirst" }
|
||
],
|
||
"outputColumn": "description",
|
||
"outputAction": "overwrite"
|
||
}
|
||
```
|
||
|
||
**Processing:**
|
||
|
||
1. `" COOP PRONTO "` → trim → `"COOP PRONTO"`
|
||
2. `"COOP PRONTO"` → replace → `"COOP PRONTO"`
|
||
3. `"COOP PRONTO"` → lowercase → `"coop pronto"`
|
||
4. `"coop pronto"` → ucwordsfirst → `"Coop Pronto"`
|
||
|
||
---
|
||
|
||
## CLI Reference
|
||
|
||
```bash
|
||
php bin/transformer.php <command> [input] [config] [options]
|
||
```
|
||
|
||
### Commands
|
||
|
||
| Command | Description |
|
||
| ------- | ----------- |
|
||
| `test` | Test run (up to 10 rows) |
|
||
| `transform` | Full transformation |
|
||
| `validate` | Validate configuration |
|
||
| `auto-import` | Directory monitoring |
|
||
| `help` | Show help |
|
||
|
||
### Options
|
||
|
||
| Option | Description |
|
||
| ------ | ----------- |
|
||
| `--debug`, `-d` | Enable debug mode |
|
||
| `--rows=N` | Max. N rows (`test` command) |
|
||
| `--output=FILE`, `-o` | Output path |
|
||
| `--do-import` | Import into Firefly III after transformation (`transform` only) |
|
||
| `--strict` | Strict validation |
|
||
| `--watch` | Continuous monitoring |
|
||
| `--interval=SEC` | Check interval in seconds (default: 60) |
|
||
| `--dry-run` | Simulation mode, no real operations |
|
||
|
||
---
|
||
|
||
## Debug Mode
|
||
|
||
```bash
|
||
php bin/transformer.php test input.csv config/config.json --debug
|
||
```
|
||
|
||
### Log categories
|
||
|
||
| Category | When |
|
||
| -------- | ---- |
|
||
| `transformer` | Start/end of transformation |
|
||
| `csv_reader` | While reading CSV |
|
||
| `metadata` | During metadata extraction |
|
||
| `metadata_warning` | On extraction problems |
|
||
| `transformation` | For each transformation step |
|
||
| `csv_writer` | While writing output CSV |
|
||
|
||
### Debug log output (JSON)
|
||
|
||
```json
|
||
{
|
||
"success": true,
|
||
"debug_logs": [
|
||
{
|
||
"timestamp": 1702200120.5432,
|
||
"category": "transformer",
|
||
"message": "Transformation started",
|
||
"data": { "inputFile": "input.csv", "maxRows": 0 }
|
||
},
|
||
{
|
||
"timestamp": 1702200120.5445,
|
||
"category": "metadata",
|
||
"message": "Extraction rule applied",
|
||
"data": { "rule_name": "account_iban", "value": "CH93..." }
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Firefly III Integration
|
||
|
||
The transformer can automatically import transformed files into Firefly III.
|
||
Three operating modes cover all typical deployment scenarios.
|
||
|
||
### Prerequisites (all modes)
|
||
|
||
**1. Create a Firefly III Data Importer JSON configuration file**
|
||
|
||
This file maps transformed CSV columns to Firefly III transaction fields (format v3).
|
||
|
||
Recommended approach: upload a sample CSV once in the Firefly III Data Importer Web UI, configure the column mapping there, then download the finished configuration. Alternatively, use `config/firefly-import-config.example.json` as a template and adjust `default_account` to your asset account ID.
|
||
|
||
**2. Choose an operating mode** — see sections below.
|
||
|
||
---
|
||
|
||
### `fireflyImport` field reference
|
||
|
||
| Field | Type | Description |
|
||
| --- | --- | --- |
|
||
| `mode` | string | Operating mode: `cli` \| `docker` \| `http` (default: `cli`) |
|
||
| `jsonConfig` | string | Path to the Firefly III Data Importer JSON config file (format v3). For `cli` and `http` modes the file must exist locally; relative paths are resolved from the **working directory** where `php bin/transformer.php` is invoked (typically the project root). For `docker` mode the path is **inside the container** — local existence is not checked. |
|
||
| `importerCommand` | string | Full CLI command *(modes: cli, docker)* |
|
||
| `importerUrl` | string | URL of the Data Importer *(mode: http)* |
|
||
| `personalSecret` | string | The `AUTO_IMPORT_SECRET` set on the importer server (min. 16 chars). Sent as `?secret=` URL query parameter. *(mode: http)* |
|
||
| `accessToken` | string | Firefly III Personal Access Token. Sent as `Authorization: Bearer` header. Required if not already set as `FIREFLY_III_ACCESS_TOKEN` in the importer environment. *(mode: http)* |
|
||
| `deleteAfterImport` | boolean | Delete transformed CSV after successful import |
|
||
| `chunkSize` | integer | Split the CSV into chunks of at most N data rows and import each chunk as a separate request. Prevents server-side timeouts on large files (rule of thumb: ~3–4 s/transaction for HTTP mode). `0` or absent = no chunking (default). Applies to all modes. |
|
||
| `chunkRetries` | integer | Number of additional import attempts per chunk after the first. On failure the importer retries up to this many times before aborting. `0` or absent = no retry (default). Only effective when `chunkSize > 0`. |
|
||
| `chunkRetryDelay` | integer | Pause in seconds before each chunk request after the first, and between retry attempts for the same failed chunk. Addresses both inter-chunk cooldown and retry back-off. `0` or absent = no pause (default). Only effective when `chunkSize > 0`. |
|
||
| `connectionTimeout` | integer | Maximum seconds to wait for the TCP connection to the importer to be established. Distinct from `timeout` (full transfer duration). Default: `10`. *(mode: http only)* |
|
||
| `timeout` | integer | Timeout in seconds per request (default: 300). For chunked imports this applies per chunk, not for the total run. |
|
||
| `environment` | object | Additional environment variables *(modes: cli, docker)* |
|
||
|
||
---
|
||
|
||
### Mode `cli` — Transformer and Firefly on the same server
|
||
|
||
Both the transformer and the Firefly III Data Importer run on the same server. The transformer calls the importer directly as a local command.
|
||
|
||
```json
|
||
"fireflyImport": {
|
||
"mode": "cli",
|
||
"jsonConfig": "/opt/firefly-data-importer/storage/configurations/ubs-import.json",
|
||
"importerCommand": "php /opt/firefly-data-importer/artisan importer:import",
|
||
"deleteAfterImport": false,
|
||
"chunkSize": 50,
|
||
"chunkRetries": 3,
|
||
"chunkRetryDelay": 10,
|
||
"timeout": 300,
|
||
"environment": {
|
||
"FIREFLY_III_URL": "https://localhost",
|
||
"FIREFLY_III_ACCESS_TOKEN": "your-token-here"
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### Mode `docker` — Transformer local, Firefly in Docker
|
||
|
||
The transformer runs locally or in its own container; the Firefly III Data Importer runs in a Docker container. The transformer calls the importer via `docker exec`.
|
||
|
||
**Important:** The transformer's output directory must be mounted as a volume in the importer container. `jsonConfig` is the path **inside the container** (not a local path). Do not use the `-it` flag (no TTY).
|
||
|
||
Example `docker-compose.yml` for the importer:
|
||
|
||
```yaml
|
||
services:
|
||
firefly-importer:
|
||
image: fireflyiii/data-importer:latest
|
||
volumes:
|
||
- /opt/ff-imp-preprocessor/import:/import
|
||
environment:
|
||
- FIREFLY_III_URL=https://your-firefly.com
|
||
- FIREFLY_III_ACCESS_TOKEN=your-token-here
|
||
- CAN_POST_FILES=false
|
||
```
|
||
|
||
```json
|
||
"fireflyImport": {
|
||
"mode": "docker",
|
||
"jsonConfig": "/import/configs/ubs-import.json",
|
||
"importerCommand": "docker exec firefly-importer php artisan importer:import",
|
||
"deleteAfterImport": false,
|
||
"chunkSize": 50,
|
||
"chunkRetries": 3,
|
||
"chunkRetryDelay": 10,
|
||
"timeout": 300
|
||
}
|
||
```
|
||
|
||
The JSON config file must be available inside the container — either via a volume mount or `docker cp`:
|
||
|
||
```bash
|
||
docker cp ubs-import.json firefly-importer:/import/configs/ubs-import.json
|
||
```
|
||
|
||
---
|
||
|
||
### Mode `http` — Transformer local, Firefly importer on a remote server
|
||
|
||
The transformer runs locally; the Firefly III Data Importer is reachable over HTTP(S). The CSV and JSON configuration are sent as a multipart HTTP upload to the importer.
|
||
|
||
**Requirements on the importer server:**
|
||
|
||
```text
|
||
CAN_POST_FILES=true
|
||
AUTO_IMPORT_SECRET=<secret> # at least 16 characters — set this as personalSecret in your config
|
||
```
|
||
|
||
**Local requirement:** PHP extension `ext-curl`
|
||
|
||
```json
|
||
"fireflyImport": {
|
||
"mode": "http",
|
||
"importerUrl": "https://importer.your-server.com",
|
||
"personalSecret": "your-auto-import-secret-min-16-chars",
|
||
"accessToken": "your-firefly-iii-personal-access-token",
|
||
"jsonConfig": "config/ubs-import.json",
|
||
"deleteAfterImport": false,
|
||
"chunkSize": 50,
|
||
"chunkRetries": 3,
|
||
"chunkRetryDelay": 10,
|
||
"connectionTimeout": 10,
|
||
"timeout": 300
|
||
}
|
||
```
|
||
|
||
The transformer sends a `POST` request to `{importerUrl}/autoupload?secret={personalSecret}` with the CSV and JSON config as multipart form fields. The `accessToken` is sent as `Authorization: Bearer`. If `FIREFLY_III_ACCESS_TOKEN` is already set in the importer's environment, `accessToken` can be omitted.
|
||
|
||
---
|
||
|
||
### Server-side tuning
|
||
|
||
For large imports the bottleneck is usually the Firefly III Data Importer server, not the transformer. The settings below belong in the importer's environment (`.env` or `docker-compose.yml`):
|
||
|
||
| Setting | Recommended value | Notes |
|
||
|---|---|---|
|
||
| `PHP_MEMORY_LIMIT` | `512M` – `2048M` | Docker env var. Raise when PHP crashes with "Allowed memory size exhausted". |
|
||
| `CONNECTION_TIMEOUT` | `60` | Seconds to wait for TCP connect to Firefly III API. Default is ~31 s (π × 10). |
|
||
| `IGNORE_DUPLICATE_ERRORS` | `true` | Suppress duplicate-transaction warnings on repeated imports. |
|
||
|
||
**nginx reverse proxy** (if applicable):
|
||
```nginx
|
||
proxy_read_timeout 600s; # must exceed the longest single-chunk import time
|
||
client_max_body_size 64M; # must accommodate your largest chunk CSV
|
||
```
|
||
|
||
**Docker Compose** example:
|
||
```yaml
|
||
services:
|
||
firefly-importer:
|
||
environment:
|
||
- PHP_MEMORY_LIMIT=1024M
|
||
- CONNECTION_TIMEOUT=60
|
||
- IGNORE_DUPLICATE_ERRORS=true
|
||
```
|
||
|
||
---
|
||
|
||
### Usage
|
||
|
||
```bash
|
||
# Transform only (no import)
|
||
php bin/transformer.php transform input.csv config/config.json
|
||
|
||
# Transform and import into Firefly III
|
||
php bin/transformer.php transform input.csv config/config.json --do-import
|
||
|
||
# Watch mode: transform and import automatically for each new CSV in source directory
|
||
php bin/transformer.php auto-import config/config.json --watch
|
||
```
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
### Components
|
||
|
||
```text
|
||
bin/transformer.php (CLI entry point)
|
||
↓
|
||
TransformerEngine (orchestration)
|
||
├─ ConfigurationLoader (load / validate config)
|
||
├─ CsvReader (read CSV)
|
||
├─ MetadataExtractor (metadata via regex)
|
||
├─ ColumnTransformer (apply transformations)
|
||
├─ CsvWriter (write CSV)
|
||
├─ FireflyImporter (Firefly III integration)
|
||
└─ DebugLogger (debug logs)
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```text
|
||
Input CSV
|
||
↓
|
||
CsvReader::readMetadataLines() → array of lines
|
||
↓
|
||
MetadataExtractor::extract() → {iban: "...", currency: "..."}
|
||
↓
|
||
CsvReader::readCsvData() → array of rows
|
||
↓
|
||
ColumnTransformer::transformRow() → transformed row (pipeline)
|
||
↓
|
||
CsvWriter::write() → output CSV
|
||
```
|
||
|
||
### Classes
|
||
|
||
| Class | Responsibility |
|
||
| ----- | -------------- |
|
||
| `TransformerEngine` | Orchestrates the entire workflow |
|
||
| `ConfigurationLoader` | Loads and validates JSON configuration |
|
||
| `CsvReader` | Reads CSV with metadata support |
|
||
| `MetadataExtractor` | Extracts metadata via regex |
|
||
| `ColumnTransformer` | Transforms columns (pipeline) |
|
||
| `CsvWriter` | Writes output CSV |
|
||
| `FireflyImporter` | Imports into Firefly III |
|
||
| `DebugLogger` | Static logger for debug output |
|
||
|
||
---
|
||
|
||
## Error Handling
|
||
|
||
### Common errors
|
||
|
||
#### "Input file not found"
|
||
|
||
```bash
|
||
# Check the file path
|
||
ls -la input.csv
|
||
|
||
# Use an absolute path if relative paths do not work
|
||
php bin/transformer.php transform /absolute/path/input.csv config.json
|
||
```
|
||
|
||
---
|
||
|
||
#### "Missing metadata: account_iban"
|
||
|
||
The IBAN could not be extracted — wrong regex or wrong line number.
|
||
|
||
```bash
|
||
# Inspect the first lines of the CSV
|
||
head -5 input.csv
|
||
|
||
# Validate with debug output
|
||
php bin/transformer.php validate config.json input.csv --debug
|
||
```
|
||
|
||
---
|
||
|
||
#### "Invalid JSON: …"
|
||
|
||
Syntax error in `config.json`.
|
||
|
||
```bash
|
||
php -r "json_decode(file_get_contents('config/config.json'), true) or die('JSON invalid');"
|
||
```
|
||
|
||
---
|
||
|
||
#### "Configuration: 'csvStructure.headerLine' required"
|
||
|
||
A required configuration field is missing.
|
||
|
||
```bash
|
||
diff config/config.json config/config.example.json
|
||
```
|
||
|
||
---
|
||
|
||
### Exception handling
|
||
|
||
```php
|
||
try {
|
||
$result = $engine->transform($inputFile);
|
||
if (!$result['success']) {
|
||
echo "Error: " . $result['error'];
|
||
}
|
||
} catch (Exception $e) {
|
||
echo "Fatal error: " . $e->getMessage();
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Tips
|
||
|
||
### UTF-8 handling
|
||
|
||
The transformer uses UTF-8 safe functions throughout:
|
||
|
||
- `mb_strtolower()` instead of `strtolower()`
|
||
- `mb_strtoupper()` instead of `strtoupper()`
|
||
- `mb_strlen()` for correct character counting
|
||
|
||
Supported encodings: UTF-8, ISO-8859-1, CP1252.
|
||
|
||
### Regex tips
|
||
|
||
**Pattern without delimiters (auto-wrapped):**
|
||
|
||
```json
|
||
"pattern": "IBAN:\\s*([A-Z0-9 ]+)"
|
||
// becomes: /IBAN:\s*([A-Z0-9 ]+)/u
|
||
```
|
||
|
||
**With explicit flags:**
|
||
|
||
```json
|
||
"pattern": "/IBAN:\\s*([A-Z0-9 ]+)/iu"
|
||
// case-insensitive
|
||
```
|
||
|
||
### Performance
|
||
|
||
- **Optimised for:** up to 1 million rows
|
||
- **Typical file size:** 10–100 k rows
|
||
|
||
### Batch processing
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
for file in import/source/*.csv; do
|
||
php bin/transformer.php transform "$file" config/config.json
|
||
if [ $? -eq 0 ]; then
|
||
mv "$file" import/archive/
|
||
else
|
||
mv "$file" import/error/
|
||
fi
|
||
done
|
||
```
|
||
|
||
---
|
||
|
||
## Version History
|
||
|
||
**v1.0.0 (03 May 2026)**
|
||
|
||
- ✅ Initial release
|
||
- ✅ 14 transformation types
|
||
- ✅ Metadata extraction via regex
|
||
- ✅ Debug mode
|
||
- ✅ Firefly III integration (cli / docker / http)
|
||
- ✅ Full documentation
|
||
|
||
---
|
||
|
||
**License:** GPL-3.0
|
||
**Author:** PHP CSV Transformer Project
|
||
**Repository:** [git.andare.ch/david.reindl/ff-imp-preprocessor](https://git.andare.ch/david.reindl/ff-imp-preprocessor)
|