26 KiB
Firefly Import Preprocessor
Version: 1.0.0
Date: 03 May 2026
Status: Production Ready
🌐 Deutsch
Table of Contents
- Overview
- Installation & Setup
- Quick Start
- Configuration
- Transformation Types
- CLI Reference
- Debug Mode
- Firefly III Integration
- Architecture
- Error Handling
Overview
The Firefly Import Preprocessor is a production-ready PHP preprocessor for bank CSV export files. It transforms bank data into a standardised format and can optionally import it into Firefly III.
Core Features
✅ Full CSV transformation with complex pipelines
✅ Metadata extraction via regex (IBAN, currency, account name)
✅ 14 transformation types for flexible data processing
✅ Firefly III integration — CLI, Docker, and HTTP upload
✅ Debug mode for full processing transparency
✅ Production ready with complete error handling
✅ Zero dependencies for core functionality
Workflow
Input CSV
↓
Extract metadata (regex)
↓
Transform data rows (pipeline)
↓
Write output CSV
↓
[Optional] Import into Firefly III
Installation & Setup
Requirements
- PHP 8.1+
- Composer (recommended)
- [Optional] Docker for Firefly III integration
Installation
# 1. Clone / copy the repository
cd ff-imp-preprocessor
# 2. Install dependencies (optional, dev tools only)
composer install
# 3. Create configuration
cp config/config.example.json config/config.json
# Edit config/config.json with your settings
# 4. Create directories
mkdir -p config/import/{source,output,archive,error}
chmod 755 config/import/{source,output,archive,error}
# 5. Run a test
php bin/transformer.php validate config/config.json input.csv
Quick Start
1. Adjust configuration
Edit config/config.json and make sure the extraction rules match your CSV format:
{
"metadata": {
"extractionRules": [
{
"name": "account_iban",
"lineNumber": 2,
"regex": "IBAN:\\s*([A-Z0-9 ]+)",
"captureGroup": 1
}
]
},
"csvStructure": {
"headerLine": 5,
"delimiter": ";",
"encoding": "UTF-8"
}
}
2. Validate CSV
php bin/transformer.php validate config/config.json input.csv
3. Run transformation
php bin/transformer.php transform input.csv config/config.json
# With debug mode for troubleshooting
php bin/transformer.php transform input.csv config/config.json --debug
4. Inspect output
php bin/transformer.php test input.csv config/config.json --debug
# Shows up to 10 transformed rows and debug logs
Configuration
config.json structure
metadata — Metadata extraction
{
"metadata": {
"extractionRules": [
{
"name": "account_iban",
"lineNumber": 2,
"regex": "IBAN:\\s*([A-Z0-9 ]+)",
"captureGroup": 1
},
{
"name": "currency_code",
"lineNumber": 3,
"regex": "Currency:\\s*([A-Z]{3})",
"captureGroup": 1
}
]
}
}
| Field | Type | Description |
|---|---|---|
name |
string | Name of the metadata variable (used in constantvalue) |
lineNumber |
int | Line number in CSV (1-based, human-readable) |
regex |
string | Regex pattern for extraction (without delimiters) |
captureGroup |
int | Capture group index (0 = full match, 1 = first group, etc.) |
Regex example:
- Pattern:
IBAN:\s*([A-Z0-9 ]+) - Input:
IBAN: CH93 0077 2020 6262 5252 7 - Capture group 1:
CH93 0077 2020 6262 5252 7
csvStructure — CSV format
{
"csvStructure": {
"headerLine": 5,
"delimiter": ";",
"encoding": "UTF-8",
"hasBom": false
}
}
| Field | Type | Default | Description |
|---|---|---|---|
headerLine |
int | 5 | Line number of the header row (1-based) |
delimiter |
string | ; |
CSV delimiter |
encoding |
string | UTF-8 |
Character encoding (UTF-8, ISO-8859-1, CP1252) |
hasBom |
bool | false | Whether the file has a BOM (Byte Order Mark) |
columnTransformations — Column transformations
{
"columnTransformations": [
{
"sourceColumn": "BookingDate",
"transformations": [
{
"type": "dateformat",
"fromFormat": "d.m.Y",
"toFormat": "Y-m-d"
}
],
"outputColumn": "date",
"outputAction": "overwrite"
}
]
}
outputAction:
| Value | Behaviour |
|---|---|
overwrite |
Replace the target column with the transformation result (default) |
create |
Write the result into a new output column |
append |
Concatenate the result to the end of the existing column value. Add "appendDelimiter": " " (any string) to insert a separator between the existing and new value — the delimiter is omitted when the target column is still empty |
append-if-not-empty |
Same as append (including optional appendDelimiter) but skips entirely when the transformation result is empty — safe for optional values such as tags or notes lines |
append-line |
Same as append but the separator is always a newline \n; no leading newline when the target is empty |
overwrite-if-empty |
Only write the result if the target column is currently empty |
overwrite-if-not-empty |
Only write the result if the transformation result is not empty |
directories — File system
{
"directories": {
"source": "/opt/ff-imp-preprocessor/import/source",
"output": "/opt/ff-imp-preprocessor/import/output",
"archive": "/opt/ff-imp-preprocessor/import/archive",
"error": "/opt/ff-imp-preprocessor/import/error"
}
}
| Field | Description |
|---|---|
source |
Input directory |
output |
Output directory |
archive |
Archive for processed files |
error |
Error directory for invalid files |
fireflyImport — Firefly III integration
Optional. When present, passing --do-import to the transform command (or using auto-import) will call the Firefly III Data Importer after the output CSV is written.
See Firefly III Integration for the full field reference and mode-specific examples.
Transformation Types
There are 14 supported transformation types that can be combined as a pipeline:
1. trim — Remove whitespace
Removes leading and trailing whitespace.
{ "type": "trim" }
- Input:
Coop Pronto→ Output:Coop Pronto
2. lowercase — Convert to lowercase
Converts to lowercase (UTF-8 safe).
{ "type": "lowercase" }
- Input:
COOP PRONTO CHUR→ Output:coop pronto chur
3. uppercase — Convert to uppercase
Converts to uppercase (UTF-8 safe).
{ "type": "uppercase" }
- Input:
Coop Pronto Chur→ Output:COOP PRONTO CHUR
4. ucwordsfirst — Capitalise after word separators
Capitalises the first letter after each word separator.
{ "type": "ucwordsfirst" }
COOP PRONTO CHUR→Coop Pronto Churmigros-rail city→Migros-Rail CityO'NEILL STORE→O'Neill StoreSAINT-JEAN-DE-MAURIENNE→Saint-Jean-De-Maurienne
Separators: space, hyphen, apostrophe, slash, period, comma, semicolon, colon, parentheses.
Guard: If the input already contains both uppercase and lowercase letters (mixed-case), it is returned unchanged. This prevents accidentally re-casing intentionally formatted strings such as
"Coop pronto chur". Fully uppercase or fully lowercase inputs are always processed.
5. replace — String replacement
Replaces a substring with another string (case-sensitive).
{ "type": "replace", "search": " ", "replace": " " }
- Input:
Coop Pronto(two spaces) → Output:Coop Pronto(one space)
6. split — Split column
Splits a value at a delimiter and keeps a defined part.
{ "type": "split", "delimiter": ";", "part": 0 }
- Input:
Coop Pronto Chur;7007 Chur→ Output:Coop Pronto Chur
7. regex — Regex replacement
Replaces parts of a string using a regular expression. Uses PHP preg_replace.
{ "type": "regex", "pattern": "^(.*?);.*$", "replace": "$1" }
No match → original value is passed through unchanged (pipeline-safe).
Use capture groups as $1, $2, … in the replace field.
A pattern without ^/$ anchors replaces only the matched portion, not the whole value.
8. regexextract — Regex extraction
Extracts a capture group and returns only that. Uses PHP preg_match.
{ "type": "regexextract", "pattern": "(\\d{4,} [^;]+)" }
- Input:
Coop Pronto Chur, 7007 Chur→ Output:7007 Chur - No match → empty string
⚠ Not pipeline-safe: A no-match discards all previous pipeline results. Use
regexinstead if you want to preserve the current value on no-match.
9. dateformat — Date reformatting
Converts between date formats.
{ "type": "dateformat", "fromFormat": "d.m.Y", "toFormat": "Y-m-d" }
- Input:
10.12.2025→ Output:2025-12-10
Supports all PHP DateTime format characters.
10. truncate — Truncate string
Truncates a string to a maximum length.
{ "type": "truncate", "maxLength": 100 }
11. constantvalue — Constant value from metadata
Injects an extracted metadata value as a constant for every row.
{
"sourceColumn": "_constant_",
"transformations": [
{ "type": "constantvalue", "metadataKey": "account_iban" }
],
"outputColumn": "account_iban",
"outputAction": "create"
}
- Every row receives the extracted
account_ibanvalue (e.g.CH9300777222666888999) in a new column.
12. map — Copy / rename column
Copies a column value as-is (optionally to a new name).
{ "type": "map" }
13. pipeline — Nested pipeline
Runs a sub-pipeline as a single transformation step.
{
"type": "pipeline",
"steps": [
{ "type": "trim" },
{ "type": "lowercase" },
{ "type": "ucwordsfirst" }
]
}
Useful for grouping steps as a logical unit within a transformations array.
14. timeperiod — Map time to a period label
Parses a time string and returns the label of the matching period range.
Supports midnight-spanning ranges (e.g. 22:00–03:59).
Returns default (empty string by default) when no range matches or the input is invalid.
{
"type": "timeperiod",
"timeFormat": "H:i:s",
"periods": [
{ "from": "04:00:00", "to": "08:59:59", "label": "Morgen" },
{ "from": "09:00:00", "to": "10:59:59", "label": "Vormittag" },
{ "from": "11:00:00", "to": "13:59:59", "label": "Mittag" },
{ "from": "14:00:00", "to": "17:59:59", "label": "Nachmittag" },
{ "from": "18:00:00", "to": "21:59:59", "label": "Abend" },
{ "from": "22:00:00", "to": "03:59:59", "label": "Nacht" }
],
"default": ""
}
"09:30:00"→"Vormittag""23:00:00"→"Nacht"(midnight-spanning range)"02:00:00"→"Nacht"(midnight-spanning range)""or unparseable input →""
timeFormat follows PHP's DateTime::createFromFormat syntax (default H:i:s).
Row filtering — skipIf
Rows can be excluded from the output by adding a top-level skipIf key to the configuration.
The value is a filter node — either a bare condition or a nested and/or group.
Bare condition:
"skipIf": { "column": "Buchungstext", "operator": "equals", "value": "Saldovortrag" }
AND group:
"skipIf": {
"and": [
{ "column": "Beschreibung1", "operator": "empty" },
{ "column": "Beschreibung2", "operator": "empty" }
]
}
Nested AND / OR:
"skipIf": {
"or": [
{ "column": "Amount", "operator": "gt", "value": "10000" },
{
"and": [
{ "column": "Type", "operator": "equals", "value": "Saldo" },
{ "column": "Notes", "operator": "empty" }
]
}
]
}
Supported operators:
| Operator | Matches when… |
|---|---|
empty |
column value is empty string |
not-empty |
column value is not empty |
equals |
column value equals "value" |
not-equals |
column value does not equal "value" |
contains |
column value contains "value" |
not-contains |
column value does not contain "value" |
matches |
column value matches regex "pattern" |
not-matches |
column value does not match regex "pattern" |
gt |
(float) column > (float) value |
gte |
(float) column >= (float) value |
lt |
(float) column < (float) value |
lte |
(float) column <= (float) value |
Pipeline example
Multiple transformations chained:
{
"sourceColumn": "BookingText",
"transformations": [
{ "type": "trim" },
{ "type": "replace", "search": " ", "replace": " " },
{ "type": "lowercase" },
{ "type": "ucwordsfirst" }
],
"outputColumn": "description",
"outputAction": "overwrite"
}
Processing:
" COOP PRONTO "→ trim →"COOP PRONTO""COOP PRONTO"→ replace →"COOP PRONTO""COOP PRONTO"→ lowercase →"coop pronto""coop pronto"→ ucwordsfirst →"Coop Pronto"
CLI Reference
php bin/transformer.php <command> [input] [config] [options]
Commands
| Command | Description |
|---|---|
test |
Test run (up to 10 rows) |
transform |
Full transformation |
validate |
Validate configuration |
auto-import |
Directory monitoring |
help |
Show help |
Options
| Option | Description |
|---|---|
--debug, -d |
Enable debug mode |
--rows=N |
Max. N rows (test command) |
--output=FILE, -o |
Output path |
--do-import |
Import into Firefly III after transformation (transform only) |
--strict |
Strict validation |
--watch |
Continuous monitoring |
--interval=SEC |
Check interval in seconds (default: 60) |
--dry-run |
Simulation mode, no real operations |
Debug Mode
php bin/transformer.php test input.csv config/config.json --debug
Log categories
| Category | When |
|---|---|
transformer |
Start/end of transformation |
csv_reader |
While reading CSV |
metadata |
During metadata extraction |
metadata_warning |
On extraction problems |
transformation |
For each transformation step |
csv_writer |
While writing output CSV |
Debug log output (JSON)
{
"success": true,
"debug_logs": [
{
"timestamp": 1702200120.5432,
"category": "transformer",
"message": "Transformation started",
"data": { "inputFile": "input.csv", "maxRows": 0 }
},
{
"timestamp": 1702200120.5445,
"category": "metadata",
"message": "Extraction rule applied",
"data": { "rule_name": "account_iban", "value": "CH93..." }
}
]
}
Firefly III Integration
The transformer can automatically import transformed files into Firefly III. Three operating modes cover all typical deployment scenarios.
Prerequisites (all modes)
1. Create a Firefly III Data Importer JSON configuration file
This file maps transformed CSV columns to Firefly III transaction fields (format v3).
Recommended approach: upload a sample CSV once in the Firefly III Data Importer Web UI, configure the column mapping there, then download the finished configuration. Alternatively, use config/firefly-import-config.example.json as a template and adjust default_account to your asset account ID.
2. Choose an operating mode — see sections below.
fireflyImport field reference
| Field | Type | Description |
|---|---|---|
mode |
string | Operating mode: cli | docker | http (default: cli) |
jsonConfig |
string | Path to the Firefly III Data Importer JSON config file (format v3). For cli and http modes the file must exist locally; relative paths are resolved from the working directory where php bin/transformer.php is invoked (typically the project root). For docker mode the path is inside the container — local existence is not checked. |
importerCommand |
string | Full CLI command (modes: cli, docker) |
importerUrl |
string | URL of the Data Importer (mode: http) |
personalSecret |
string | The AUTO_IMPORT_SECRET set on the importer server (min. 16 chars). Sent as ?secret= URL query parameter. (mode: http) |
accessToken |
string | Firefly III Personal Access Token. Sent as Authorization: Bearer header. Required if not already set as FIREFLY_III_ACCESS_TOKEN in the importer environment. (mode: http) |
deleteAfterImport |
boolean | Delete transformed CSV after successful import |
chunkSize |
integer | Split the CSV into chunks of at most N data rows and import each chunk as a separate request. Prevents server-side timeouts on large files (rule of thumb: ~3–4 s/transaction for HTTP mode). 0 or absent = no chunking (default). Applies to all modes. |
chunkRetries |
integer | Number of additional import attempts per chunk after the first. On failure the importer retries up to this many times before aborting. 0 or absent = no retry (default). Only effective when chunkSize > 0. |
chunkRetryDelay |
integer | Pause in seconds before each chunk request after the first, and between retry attempts for the same failed chunk. Addresses both inter-chunk cooldown and retry back-off. 0 or absent = no pause (default). Only effective when chunkSize > 0. |
connectionTimeout |
integer | Maximum seconds to wait for the TCP connection to the importer to be established. Distinct from timeout (full transfer duration). Default: 10. (mode: http only) |
timeout |
integer | Timeout in seconds per request (default: 300). For chunked imports this applies per chunk, not for the total run. |
environment |
object | Additional environment variables (modes: cli, docker) |
Mode cli — Transformer and Firefly on the same server
Both the transformer and the Firefly III Data Importer run on the same server. The transformer calls the importer directly as a local command.
"fireflyImport": {
"mode": "cli",
"jsonConfig": "/opt/firefly-data-importer/storage/configurations/ubs-import.json",
"importerCommand": "php /opt/firefly-data-importer/artisan importer:import",
"deleteAfterImport": false,
"chunkSize": 50,
"chunkRetries": 3,
"chunkRetryDelay": 10,
"timeout": 300,
"environment": {
"FIREFLY_III_URL": "https://localhost",
"FIREFLY_III_ACCESS_TOKEN": "your-token-here"
}
}
Mode docker — Transformer local, Firefly in Docker
The transformer runs locally or in its own container; the Firefly III Data Importer runs in a Docker container. The transformer calls the importer via docker exec.
Important: The transformer's output directory must be mounted as a volume in the importer container. jsonConfig is the path inside the container (not a local path). Do not use the -it flag (no TTY).
Example docker-compose.yml for the importer:
services:
firefly-importer:
image: fireflyiii/data-importer:latest
volumes:
- /opt/ff-imp-preprocessor/import:/import
environment:
- FIREFLY_III_URL=https://your-firefly.com
- FIREFLY_III_ACCESS_TOKEN=your-token-here
- CAN_POST_FILES=false
"fireflyImport": {
"mode": "docker",
"jsonConfig": "/import/configs/ubs-import.json",
"importerCommand": "docker exec firefly-importer php artisan importer:import",
"deleteAfterImport": false,
"chunkSize": 50,
"chunkRetries": 3,
"chunkRetryDelay": 10,
"timeout": 300
}
The JSON config file must be available inside the container — either via a volume mount or docker cp:
docker cp ubs-import.json firefly-importer:/import/configs/ubs-import.json
Mode http — Transformer local, Firefly importer on a remote server
The transformer runs locally; the Firefly III Data Importer is reachable over HTTP(S). The CSV and JSON configuration are sent as a multipart HTTP upload to the importer.
Requirements on the importer server:
CAN_POST_FILES=true
AUTO_IMPORT_SECRET=<secret> # at least 16 characters — set this as personalSecret in your config
Local requirement: PHP extension ext-curl
"fireflyImport": {
"mode": "http",
"importerUrl": "https://importer.your-server.com",
"personalSecret": "your-auto-import-secret-min-16-chars",
"accessToken": "your-firefly-iii-personal-access-token",
"jsonConfig": "config/ubs-import.json",
"deleteAfterImport": false,
"chunkSize": 50,
"chunkRetries": 3,
"chunkRetryDelay": 10,
"connectionTimeout": 10,
"timeout": 300
}
The transformer sends a POST request to {importerUrl}/autoupload?secret={personalSecret} with the CSV and JSON config as multipart form fields. The accessToken is sent as Authorization: Bearer. If FIREFLY_III_ACCESS_TOKEN is already set in the importer's environment, accessToken can be omitted.
Server-side tuning
For large imports the bottleneck is usually the Firefly III Data Importer server, not the transformer. The settings below belong in the importer's environment (.env or docker-compose.yml):
| Setting | Recommended value | Notes |
|---|---|---|
PHP_MEMORY_LIMIT |
512M – 2048M |
Docker env var. Raise when PHP crashes with "Allowed memory size exhausted". |
CONNECTION_TIMEOUT |
60 |
Seconds to wait for TCP connect to Firefly III API. Default is ~31 s (π × 10). |
IGNORE_DUPLICATE_ERRORS |
true |
Suppress duplicate-transaction warnings on repeated imports. |
nginx reverse proxy (if applicable):
proxy_read_timeout 600s; # must exceed the longest single-chunk import time
client_max_body_size 64M; # must accommodate your largest chunk CSV
Docker Compose example:
services:
firefly-importer:
environment:
- PHP_MEMORY_LIMIT=1024M
- CONNECTION_TIMEOUT=60
- IGNORE_DUPLICATE_ERRORS=true
Usage
# Transform only (no import)
php bin/transformer.php transform input.csv config/config.json
# Transform and import into Firefly III
php bin/transformer.php transform input.csv config/config.json --do-import
# Watch mode: transform and import automatically for each new CSV in source directory
php bin/transformer.php auto-import config/config.json --watch
Architecture
Components
bin/transformer.php (CLI entry point)
↓
TransformerEngine (orchestration)
├─ ConfigurationLoader (load / validate config)
├─ CsvReader (read CSV)
├─ MetadataExtractor (metadata via regex)
├─ ColumnTransformer (apply transformations)
├─ CsvWriter (write CSV)
├─ FireflyImporter (Firefly III integration)
└─ DebugLogger (debug logs)
Data flow
Input CSV
↓
CsvReader::readMetadataLines() → array of lines
↓
MetadataExtractor::extract() → {iban: "...", currency: "..."}
↓
CsvReader::readCsvData() → array of rows
↓
ColumnTransformer::transformRow() → transformed row (pipeline)
↓
CsvWriter::write() → output CSV
Classes
| Class | Responsibility |
|---|---|
TransformerEngine |
Orchestrates the entire workflow |
ConfigurationLoader |
Loads and validates JSON configuration |
CsvReader |
Reads CSV with metadata support |
MetadataExtractor |
Extracts metadata via regex |
ColumnTransformer |
Transforms columns (pipeline) |
CsvWriter |
Writes output CSV |
FireflyImporter |
Imports into Firefly III |
DebugLogger |
Static logger for debug output |
Error Handling
Common errors
"Input file not found"
# Check the file path
ls -la input.csv
# Use an absolute path if relative paths do not work
php bin/transformer.php transform /absolute/path/input.csv config.json
"Missing metadata: account_iban"
The IBAN could not be extracted — wrong regex or wrong line number.
# Inspect the first lines of the CSV
head -5 input.csv
# Validate with debug output
php bin/transformer.php validate config.json input.csv --debug
"Invalid JSON: …"
Syntax error in config.json.
php -r "json_decode(file_get_contents('config/config.json'), true) or die('JSON invalid');"
"Configuration: 'csvStructure.headerLine' required"
A required configuration field is missing.
diff config/config.json config/config.example.json
Exception handling
try {
$result = $engine->transform($inputFile);
if (!$result['success']) {
echo "Error: " . $result['error'];
}
} catch (Exception $e) {
echo "Fatal error: " . $e->getMessage();
}
Tips
UTF-8 handling
The transformer uses UTF-8 safe functions throughout:
mb_strtolower()instead ofstrtolower()mb_strtoupper()instead ofstrtoupper()mb_strlen()for correct character counting
Supported encodings: UTF-8, ISO-8859-1, CP1252.
Regex tips
Pattern without delimiters (auto-wrapped):
"pattern": "IBAN:\\s*([A-Z0-9 ]+)"
// becomes: /IBAN:\s*([A-Z0-9 ]+)/u
With explicit flags:
"pattern": "/IBAN:\\s*([A-Z0-9 ]+)/iu"
// case-insensitive
Performance
- Optimised for: up to 1 million rows
- Typical file size: 10–100 k rows
Batch processing
#!/bin/bash
for file in import/source/*.csv; do
php bin/transformer.php transform "$file" config/config.json
if [ $? -eq 0 ]; then
mv "$file" import/archive/
else
mv "$file" import/error/
fi
done
Version History
v1.0.0 (03 May 2026)
- ✅ Initial release
- ✅ 14 transformation types
- ✅ Metadata extraction via regex
- ✅ Debug mode
- ✅ Firefly III integration (cli / docker / http)
- ✅ Full documentation
License: GPL-3.0
Author: PHP CSV Transformer Project
Repository: git.andare.ch/david.reindl/ff-imp-preprocessor