Configurable preprocessor to make any .csv file importable into Firefly-III
Go to file
2026-05-04 00:23:02 +02:00
.github/prompts release 1.0 2026-05-02 17:53:19 +02:00
bin extended firefly import options, cleaned up language usage in code, new README in english, changed license references 2026-05-04 00:23:02 +02:00
config extended firefly import options, cleaned up language usage in code, new README in english, changed license references 2026-05-04 00:23:02 +02:00
src extended firefly import options, cleaned up language usage in code, new README in english, changed license references 2026-05-04 00:23:02 +02:00
tests release 1.0 2026-05-02 17:53:19 +02:00
.gitignore release 1.0 2026-05-02 17:53:19 +02:00
AGENTS.md extended firefly import options, cleaned up language usage in code, new README in english, changed license references 2026-05-04 00:23:02 +02:00
composer.json extended firefly import options, cleaned up language usage in code, new README in english, changed license references 2026-05-04 00:23:02 +02:00
phpcs.xml release 1.0 2026-05-02 17:53:19 +02:00
phpstan-baseline.neon release 1.0 2026-05-02 17:53:19 +02:00
phpstan.neon release 1.0 2026-05-02 17:53:19 +02:00
phpunit.xml release 1.0 2026-05-02 17:53:19 +02:00
README.de.md extended firefly import options, cleaned up language usage in code, new README in english, changed license references 2026-05-04 00:23:02 +02:00
README.md extended firefly import options, cleaned up language usage in code, new README in english, changed license references 2026-05-04 00:23:02 +02:00

Firefly Import Preprocessor

Version: 1.0.0
Date: 03 May 2026
Status: Production Ready

🌐 Deutsch


Table of Contents

  1. Overview
  2. Installation & Setup
  3. Quick Start
  4. Configuration
  5. Transformation Types
  6. CLI Reference
  7. Debug Mode
  8. Firefly III Integration
  9. Architecture
  10. Error Handling

Overview

The Firefly Import Preprocessor is a production-ready PHP preprocessor for bank CSV export files. It transforms bank data into a standardised format and can optionally import it into Firefly III.

Core Features

Full CSV transformation with complex pipelines
Metadata extraction via regex (IBAN, currency, account name)
13 transformation types for flexible data processing
Firefly III integration — CLI, Docker, and HTTP upload
Debug mode for full processing transparency
Production ready with complete error handling
Zero dependencies for core functionality

Workflow

Input CSV
    ↓
Extract metadata (regex)
    ↓
Transform data rows (pipeline)
    ↓
Write output CSV
    ↓
[Optional] Import into Firefly III

Installation & Setup

Requirements

  • PHP 8.1+
  • Composer (recommended)
  • [Optional] Docker for Firefly III integration

Installation

# 1. Clone / copy the repository
cd ff-imp-preprocessor

# 2. Install dependencies (optional, dev tools only)
composer install

# 3. Create configuration
cp config/config.example.json config/config.json
# Edit config/config.json with your settings

# 4. Create directories
mkdir -p config/import/{source,output,archive,error}
chmod 755 config/import/{source,output,archive,error}

# 5. Run a test
php bin/transformer.php validate config/config.json input.csv

Quick Start

1. Adjust configuration

Edit config/config.json and make sure the extraction rules match your CSV format:

{
  "metadata": {
    "extractionRules": [
      {
        "name": "account_iban",
        "lineNumber": 2,
        "regex": "IBAN:\\s*([A-Z0-9 ]+)",
        "captureGroup": 1
      }
    ]
  },
  "csvStructure": {
    "headerLine": 5,
    "delimiter": ";",
    "encoding": "UTF-8"
  }
}

2. Validate CSV

php bin/transformer.php validate config/config.json input.csv

3. Run transformation

php bin/transformer.php transform input.csv config/config.json

# With debug mode for troubleshooting
php bin/transformer.php transform input.csv config/config.json --debug

4. Inspect output

php bin/transformer.php test input.csv config/config.json --debug
# Shows up to 10 transformed rows and debug logs

Configuration

config.json structure

metadata — Metadata extraction

{
  "metadata": {
    "extractionRules": [
      {
        "name": "account_iban",
        "lineNumber": 2,
        "regex": "IBAN:\\s*([A-Z0-9 ]+)",
        "captureGroup": 1
      },
      {
        "name": "currency_code",
        "lineNumber": 3,
        "regex": "Currency:\\s*([A-Z]{3})",
        "captureGroup": 1
      }
    ]
  }
}
Field Type Description
name string Name of the metadata variable (used in constantvalue)
lineNumber int Line number in CSV (1-based, human-readable)
regex string Regex pattern for extraction (without delimiters)
captureGroup int Capture group index (0 = full match, 1 = first group, etc.)

Regex example:

  • Pattern: IBAN:\s*([A-Z0-9 ]+)
  • Input: IBAN: CH93 0077 2020 6262 5252 7
  • Capture group 1: CH93 0077 2020 6262 5252 7

csvStructure — CSV format

{
  "csvStructure": {
    "headerLine": 5,
    "delimiter": ";",
    "encoding": "UTF-8",
    "hasBom": false
  }
}
Field Type Default Description
headerLine int 5 Line number of the header row (1-based)
delimiter string ; CSV delimiter
encoding string UTF-8 Character encoding (UTF-8, ISO-8859-1, CP1252)
hasBom bool false Whether the file has a BOM (Byte Order Mark)

columnTransformations — Column transformations

{
  "columnTransformations": [
    {
      "sourceColumn": "BookingDate",
      "transformations": [
        {
          "type": "dateformat",
          "fromFormat": "d.m.Y",
          "toFormat": "Y-m-d"
        }
      ],
      "outputColumn": "date",
      "outputAction": "overwrite"
    }
  ]
}

outputAction:

  • overwrite — overwrite the source column
  • create — create a new column (for regex extract, split, etc.)

directories — File system

{
  "directories": {
    "source": "/opt/ff-imp-preprocessor/import/source",
    "output": "/opt/ff-imp-preprocessor/import/output",
    "archive": "/opt/ff-imp-preprocessor/import/archive",
    "error": "/opt/ff-imp-preprocessor/import/error"
  }
}
Field Description
source Input directory
output Output directory
archive Archive for processed files
error Error directory for invalid files

fireflyImport — Firefly III integration

The operating mode is controlled by the mode field. Possible values: cli, docker, http. Full details and examples: Firefly III Integration.

{
  "fireflyImport": {
    "mode": "docker",
    "jsonConfig": "/import/configs/ubs-import.json",
    "importerCommand": "docker exec firefly-importer php artisan importer:import",
    "autoImport": false,
    "deleteAfterImport": false,
    "timeout": 300
  }
}
Field Type Description
mode string Operating mode: cli | docker | http (default: cli)
jsonConfig string Path to the Firefly III Data Importer JSON config file (format v3)
importerCommand string Full CLI command (modes: cli, docker)
importerUrl string URL of the Data Importer (mode: http)
importerSecret string AUTO_IMPORT_SECRET of the importer (min. 16 chars) (mode: http)
autoImport boolean Run import immediately after transformation
deleteAfterImport boolean Delete transformed CSV after successful import
timeout integer Timeout in seconds (default: 300)
environment object Additional environment variables (modes: cli, docker)

Transformation Types

There are 13 supported transformation types that can be combined as a pipeline:

1. trim — Remove whitespace

Removes leading and trailing whitespace.

{ "type": "trim" }
  • Input: Coop Pronto → Output: Coop Pronto

2. lowercase — Convert to lowercase

Converts to lowercase (UTF-8 safe).

{ "type": "lowercase" }
  • Input: COOP PRONTO CHUR → Output: coop pronto chur

3. uppercase — Convert to uppercase

Converts to uppercase (UTF-8 safe).

{ "type": "uppercase" }
  • Input: Coop Pronto Chur → Output: COOP PRONTO CHUR

4. ucwordsfirst — Capitalise after word separators

Capitalises the first letter after each word separator.

{ "type": "ucwordsfirst" }
  • COOP PRONTO CHURCoop Pronto Chur
  • migros-rail cityMigros-Rail City
  • O'NEILL STOREO'Neill Store
  • SAINT-JEAN-DE-MAURIENNESaint-Jean-De-Maurienne

Separators: space, hyphen, apostrophe, slash, period, comma, semicolon, colon, parentheses.


5. replace — String replacement

Replaces a substring with another string (case-sensitive).

{ "type": "replace", "search": "  ", "replace": " " }
  • Input: Coop Pronto (two spaces) → Output: Coop Pronto (one space)

6. split — Split column

Splits a value at a delimiter and keeps a defined part.

{ "type": "split", "delimiter": ";", "part": 0 }
  • Input: Coop Pronto Chur;7007 Chur → Output: Coop Pronto Chur

7. regex — Regex replacement

Replaces parts of a string using a regular expression. Uses PHP preg_replace.

{ "type": "regex", "pattern": "^(.*?);.*$", "replace": "$1" }

No match → original value is passed through unchanged (pipeline-safe).

Use capture groups as $1, $2, … in the replace field. A pattern without ^/$ anchors replaces only the matched portion, not the whole value.


8. regexextract — Regex extraction

Extracts a capture group and returns only that. Uses PHP preg_match.

{ "type": "regexextract", "pattern": "(\\d{4,} [^;]+)" }
  • Input: Coop Pronto Chur, 7007 Chur → Output: 7007 Chur
  • No match → empty string

⚠ Not pipeline-safe: A no-match discards all previous pipeline results. Use regex instead if you want to preserve the current value on no-match.


9. dateformat — Date reformatting

Converts between date formats.

{ "type": "dateformat", "fromFormat": "d.m.Y", "toFormat": "Y-m-d" }
  • Input: 10.12.2025 → Output: 2025-12-10

Supports all PHP DateTime format characters.


10. truncate — Truncate string

Truncates a string to a maximum length.

{ "type": "truncate", "maxLength": 100 }

11. constantvalue — Constant value from metadata

Injects an extracted metadata value as a constant for every row.

{
  "sourceColumn": "_constant_",
  "transformations": [
    { "type": "constantvalue", "metadataKey": "account_iban" }
  ],
  "outputColumn": "account_iban",
  "outputAction": "create"
}
  • Every row receives the extracted account_iban value (e.g. CH9300777222666888999) in a new column.

12. map — Copy / rename column

Copies a column value as-is (optionally to a new name).

{ "type": "map" }

13. pipeline — Nested pipeline

Runs a sub-pipeline as a single transformation step.

{
  "type": "pipeline",
  "steps": [
    { "type": "trim" },
    { "type": "lowercase" },
    { "type": "ucwordsfirst" }
  ]
}

Useful for grouping steps as a logical unit within a transformations array.


Pipeline example

Multiple transformations chained:

{
  "sourceColumn": "BookingText",
  "transformations": [
    { "type": "trim" },
    { "type": "replace", "search": "  ", "replace": " " },
    { "type": "lowercase" },
    { "type": "ucwordsfirst" }
  ],
  "outputColumn": "description",
  "outputAction": "overwrite"
}

Processing:

  1. " COOP PRONTO " → trim → "COOP PRONTO"
  2. "COOP PRONTO" → replace → "COOP PRONTO"
  3. "COOP PRONTO" → lowercase → "coop pronto"
  4. "coop pronto" → ucwordsfirst → "Coop Pronto"

CLI Reference

php bin/transformer.php <command> [input] [config] [options]

Commands

Command Description
test Test run (up to 10 rows)
transform Full transformation
validate Validate configuration
auto-import Directory monitoring
help Show help

Options

Option Description
--debug, -d Enable debug mode
--rows=N Max. N rows (test command)
--output=FILE, -o Output path
--strict Strict validation
--watch Continuous monitoring
--interval=SEC Check interval in seconds (default: 60)
--dry-run Simulation mode, no real operations

Debug Mode

php bin/transformer.php test input.csv config/config.json --debug

Log categories

Category When
transformer Start/end of transformation
csv_reader While reading CSV
metadata During metadata extraction
metadata_warning On extraction problems
transformation For each transformation step
csv_writer While writing output CSV

Debug log output (JSON)

{
  "success": true,
  "debug_logs": [
    {
      "timestamp": 1702200120.5432,
      "category": "transformer",
      "message": "Transformation started",
      "data": { "inputFile": "input.csv", "maxRows": 0 }
    },
    {
      "timestamp": 1702200120.5445,
      "category": "metadata",
      "message": "Extraction rule applied",
      "data": { "rule_name": "account_iban", "value": "CH93..." }
    }
  ]
}

Firefly III Integration

The transformer can automatically import transformed files into Firefly III. Three operating modes cover all typical deployment scenarios.

Prerequisites (all modes)

1. Create a Firefly III Data Importer JSON configuration file

This file maps transformed CSV columns to Firefly III transaction fields (format v3).

Recommended approach: upload a sample CSV once in the Firefly III Data Importer Web UI, configure the column mapping there, then download the finished configuration. Alternatively, use config/firefly-import-config.example.json as a template and adjust default_account to your asset account ID.

2. Choose an operating mode — see sections below.


Mode cli — Transformer and Firefly on the same server

Both the transformer and the Firefly III Data Importer run on the same server. The transformer calls the importer directly as a local command.

"fireflyImport": {
  "mode": "cli",
  "jsonConfig": "/opt/firefly-data-importer/storage/configurations/ubs-import.json",
  "importerCommand": "php /opt/firefly-data-importer/artisan importer:import",
  "autoImport": true,
  "deleteAfterImport": false,
  "timeout": 300,
  "environment": {
    "FIREFLY_III_URL": "https://localhost",
    "FIREFLY_III_ACCESS_TOKEN": "your-token-here"
  }
}

Mode docker — Transformer local, Firefly in Docker

The transformer runs locally or in its own container; the Firefly III Data Importer runs in a Docker container. The transformer calls the importer via docker exec.

Important: The transformer's output directory must be mounted as a volume in the importer container. jsonConfig is the path inside the container (not a local path). Do not use the -it flag (no TTY).

Example docker-compose.yml for the importer:

services:
  firefly-importer:
    image: fireflyiii/data-importer:latest
    volumes:
      - /opt/ff-imp-preprocessor/import:/import
    environment:
      - FIREFLY_III_URL=https://your-firefly.com
      - FIREFLY_III_ACCESS_TOKEN=your-token-here
      - CAN_POST_FILES=false
"fireflyImport": {
  "mode": "docker",
  "jsonConfig": "/import/configs/ubs-import.json",
  "importerCommand": "docker exec firefly-importer php artisan importer:import",
  "autoImport": true,
  "deleteAfterImport": false,
  "timeout": 300
}

The JSON config file must be available inside the container — either via a volume mount or docker cp:

docker cp ubs-import.json firefly-importer:/import/configs/ubs-import.json

Mode http — Transformer local, Firefly importer on a remote server

The transformer runs locally; the Firefly III Data Importer is reachable over HTTP(S). The CSV and JSON configuration are sent as a multipart HTTP upload to the importer.

Requirements on the importer server:

CAN_POST_FILES=true
AUTO_IMPORT_SECRET=<secret>  # at least 16 characters

Local requirement: PHP extension ext-curl

"fireflyImport": {
  "mode": "http",
  "importerUrl": "https://importer.your-server.com",
  "importerSecret": "your-auto-import-secret-min-16-chars",
  "jsonConfig": "/local/path/to/ubs-import.json",
  "autoImport": true,
  "deleteAfterImport": false,
  "timeout": 300
}

The transformer sends files to POST {importerUrl}/autoupload. The JSON config lives locally — the transformer uploads it together with the CSV. No volume mount or SSH access to the remote server is required.


Usage

# Transformation + automatic import (when autoImport=true)
php bin/transformer.php transform input.csv config/config.json

# Watch mode: trigger on new CSV in source directory
php bin/transformer.php auto-import config/config.json --watch

Architecture

Components

bin/transformer.php (CLI entry point)
  ↓
TransformerEngine (orchestration)
  ├─ ConfigurationLoader (load / validate config)
  ├─ CsvReader (read CSV)
  ├─ MetadataExtractor (metadata via regex)
  ├─ ColumnTransformer (apply transformations)
  ├─ CsvWriter (write CSV)
  ├─ FireflyImporter (Firefly III integration)
  └─ DebugLogger (debug logs)

Data flow

Input CSV
  ↓
CsvReader::readMetadataLines() → array of lines
  ↓
MetadataExtractor::extract() → {iban: "...", currency: "..."}
  ↓
CsvReader::readCsvData() → array of rows
  ↓
ColumnTransformer::transformRow() → transformed row (pipeline)
  ↓
CsvWriter::write() → output CSV

Classes

Class Responsibility
TransformerEngine Orchestrates the entire workflow
ConfigurationLoader Loads and validates JSON configuration
CsvReader Reads CSV with metadata support
MetadataExtractor Extracts metadata via regex
ColumnTransformer Transforms columns (pipeline)
CsvWriter Writes output CSV
FireflyImporter Imports into Firefly III
DebugLogger Static logger for debug output

Error Handling

Common errors

"Input file not found"

# Check the file path
ls -la input.csv

# Use an absolute path if relative paths do not work
php bin/transformer.php transform /absolute/path/input.csv config.json

"Missing metadata: account_iban"

The IBAN could not be extracted — wrong regex or wrong line number.

# Inspect the first lines of the CSV
head -5 input.csv

# Validate with debug output
php bin/transformer.php validate config.json input.csv --debug

"Invalid JSON: …"

Syntax error in config.json.

php -r "json_decode(file_get_contents('config/config.json'), true) or die('JSON invalid');"

"Configuration: 'csvStructure.headerLine' required"

A required configuration field is missing.

diff config/config.json config/config.example.json

Exception handling

try {
    $result = $engine->transform($inputFile);
    if (!$result['success']) {
        echo "Error: " . $result['error'];
    }
} catch (Exception $e) {
    echo "Fatal error: " . $e->getMessage();
}

Tips

UTF-8 handling

The transformer uses UTF-8 safe functions throughout:

  • mb_strtolower() instead of strtolower()
  • mb_strtoupper() instead of strtoupper()
  • mb_strlen() for correct character counting

Supported encodings: UTF-8, ISO-8859-1, CP1252.

Regex tips

Pattern without delimiters (auto-wrapped):

"pattern": "IBAN:\\s*([A-Z0-9 ]+)"
// becomes: /IBAN:\s*([A-Z0-9 ]+)/u

With explicit flags:

"pattern": "/IBAN:\\s*([A-Z0-9 ]+)/iu"
// case-insensitive

Performance

  • Optimised for: up to 1 million rows
  • Typical file size: 10100 k rows

Batch processing

#!/bin/bash
for file in import/source/*.csv; do
    php bin/transformer.php transform "$file" config/config.json
    if [ $? -eq 0 ]; then
        mv "$file" import/archive/
    else
        mv "$file" import/error/
    fi
done

Version History

v1.0.0 (03 May 2026)

  • Initial release
  • 13 transformation types
  • Metadata extraction via regex
  • Debug mode
  • Firefly III integration (cli / docker / http)
  • Full documentation

License: GPL-3.0
Author: PHP CSV Transformer Project
Repository: git.andare.ch/david.reindl/ff-imp-preprocessor