March 10, 20265 min read

Building MoMo Parse API — Turning SMS Chaos into Structured Data

PythonRegexFintechAPIOpen Source

Building MoMo Parse API — Turning SMS Chaos into Structured Data

If you've ever used Mobile Money in Ghana, you know the drill — every transaction triggers an SMS that looks like it was formatted by someone in a hurry. Different telcos, different formats, abbreviations everywhere. Now imagine trying to extract meaningful financial data from thousands of those messages programmatically.

That's the problem MoMo Parse solves.

The Problem

Fintech apps, lending platforms, and financial institutions in Ghana need transaction intelligence from MoMo users. But MoMo transaction data lives in unstructured SMS messages — each telco (MTN, Telecel, AirtelTigo) formats them differently, and even within the same telco, different transaction types look completely different.

A typical MTN MoMo SMS might say something like:

You have received GHS 50.00 from KWAME MENSAH (024XXXXXXX).
Your new balance is GHS 120.50. Transaction ID: 12345678.
Fee charged: GHS 0.00. Date: 10/03/2026 Time: 14:30:05.

Multiply that by nine transaction types across three telcos, and you have a parsing nightmare.

The Solution: A Three-Stage Pipeline

I designed MoMo Parse as a three-stage processing pipeline:

1. Telco Detection

The first stage identifies which provider sent the message using keyword matching and sender-ID analysis. If you pass a sender ID like "MobileMoney", it narrows the search immediately. Otherwise, content-based heuristics figure it out.

2. Template Matching

Each (telco, transaction type) combination has its own regex template stored as a JSON config. All matching templates compete against the input — the highest-scoring template wins. This approach means adding support for a new message format is as simple as dropping a new JSON object into parser/configs/.

3. Field Extraction

Named regex capture groups pull out the financial data — amount, currency, balance, fee, counterparty details, transaction ID, date, and time. Every result comes with a confidence score:

1.0 — perfect match, all fields extracted
0.8 — partial match, some fields missing
0.0 — unrecognized format

from momoparse import parse

result = parse(sms_text, sender_id="MobileMoney")

print(result.telco)        # "mtn"
print(result.tx_type)      # "transfer_received"
print(result.amount)       # 50.00
print(result.currency)     # "GHS"
print(result.balance)      # 120.50
print(result.confidence)   # 1.0

Beyond Parsing: The API Layer

The open-source parser is just the foundation. On top of it, I built a REST API that adds:

Categorization — Automatically assigns financial categories (rent, salary, groceries) to transactions
Enrichment — Batch analytics from 1,000+ SMS in a single request
Financial Profiles — Monthly income metrics, expense ratios, business activity scoring, and risk signals

curl -X POST https://web-production-5aa38.up.railway.app/v1/parse \
  -H "X-API-Key: sk-sandbox-momoparse" \
  -H "Content-Type: application/json" \
  -d '{"sms_text": "YOUR_SMS_HERE"}'

The sandbox key (sk-sandbox-momoparse) gives you 100 calls per day with no registration — enough to test the API and build a proof of concept.

Technical Decisions

Why Regex Over NLP?

This was the biggest decision. I could have thrown an LLM at the problem, but:

Speed — Regex parsing is sub-millisecond. LLM calls take seconds and cost money per request
Determinism — Same input always produces the same output. No temperature, no hallucinations
Offline capability — The core parser works without an internet connection
Cost — Zero API costs for the parsing layer

The trade-off is maintenance — new SMS formats require new regex templates. But I designed the config system to make this trivial. Each template is a single JSON object, and the community can contribute new patterns through pull requests.

JSON-Based Template Configs

Instead of hardcoding regex patterns in Python, I externalized them into JSON config files. This means:

Non-developers can review and understand the patterns
Adding a new telco or transaction type doesn't touch core logic
Templates are version-controlled and easy to diff

Poetry for Dependency Management

I chose Poetry over pip/setuptools for reproducible builds and a clean pyproject.toml. The package is published to PyPI as momoparse — one pip install and you're running.

Railway for Deployment

The API runs on Railway with Docker. The setup is minimal — a Dockerfile, railway.toml, and environment variables. Auto-deploys on push to main.

Challenges

The Long Tail of SMS Formats

The hardest part wasn't building the parser — it was collecting enough real SMS samples to cover edge cases. Telcos occasionally change their message formats without warning, amounts can appear with or without comma separators, and some messages get truncated by carriers.

I built a corpus/ directory with sample messages for testing. Every bug report that comes in with a new format gets added to the corpus and a matching template gets created.

Confidence Scoring

Deciding when a parse is "good enough" was tricky. A message might match a template but miss the transaction ID. Is that a 0.8 or a 0.5? I settled on a simple rule: if the core financial fields (amount, transaction type) are present, it's at least 0.8. If all fields are present, it's 1.0. Anything below the core fields is 0.0.

What I Learned

Building MoMo Parse taught me that the best architecture for messy real-world data is often the simplest one. Regex gets a bad reputation, but for structured-ish text with known patterns, it's unbeatable on speed and reliability.

It also reinforced that open-source design matters. Making templates JSON-based instead of hardcoded Python means the project can grow beyond what I personally encounter. Someone using Telecel in Tamale might see SMS formats I've never seen in Kumasi — and they can contribute a fix without understanding the parser internals.

MoMo Parse is open source and published on PyPI. Check it out on GitHub or install it with pip install momoparse.

← Back to blog

March 10, 20265 min read

Building MoMo Parse API — Turning SMS Chaos into Structured Data

PythonRegexFintechAPIOpen Source

Building MoMo Parse API — Turning SMS Chaos into Structured Data

That's the problem MoMo Parse solves.

The Problem

A typical MTN MoMo SMS might say something like:

You have received GHS 50.00 from KWAME MENSAH (024XXXXXXX).
Your new balance is GHS 120.50. Transaction ID: 12345678.
Fee charged: GHS 0.00. Date: 10/03/2026 Time: 14:30:05.

Multiply that by nine transaction types across three telcos, and you have a parsing nightmare.

The Solution: A Three-Stage Pipeline

I designed MoMo Parse as a three-stage processing pipeline:

1. Telco Detection

2. Template Matching

3. Field Extraction

Named regex capture groups pull out the financial data — amount, currency, balance, fee, counterparty details, transaction ID, date, and time. Every result comes with a confidence score:

1.0 — perfect match, all fields extracted
0.8 — partial match, some fields missing
0.0 — unrecognized format

from momoparse import parse

result = parse(sms_text, sender_id="MobileMoney")

print(result.telco)        # "mtn"
print(result.tx_type)      # "transfer_received"
print(result.amount)       # 50.00
print(result.currency)     # "GHS"
print(result.balance)      # 120.50
print(result.confidence)   # 1.0

Beyond Parsing: The API Layer

The open-source parser is just the foundation. On top of it, I built a REST API that adds:

Categorization — Automatically assigns financial categories (rent, salary, groceries) to transactions
Enrichment — Batch analytics from 1,000+ SMS in a single request
Financial Profiles — Monthly income metrics, expense ratios, business activity scoring, and risk signals

curl -X POST https://web-production-5aa38.up.railway.app/v1/parse \
  -H "X-API-Key: sk-sandbox-momoparse" \
  -H "Content-Type: application/json" \
  -d '{"sms_text": "YOUR_SMS_HERE"}'

The sandbox key (sk-sandbox-momoparse) gives you 100 calls per day with no registration — enough to test the API and build a proof of concept.

Technical Decisions

Why Regex Over NLP?

This was the biggest decision. I could have thrown an LLM at the problem, but:

Speed — Regex parsing is sub-millisecond. LLM calls take seconds and cost money per request
Determinism — Same input always produces the same output. No temperature, no hallucinations
Offline capability — The core parser works without an internet connection
Cost — Zero API costs for the parsing layer

JSON-Based Template Configs

Instead of hardcoding regex patterns in Python, I externalized them into JSON config files. This means:

Non-developers can review and understand the patterns
Adding a new telco or transaction type doesn't touch core logic
Templates are version-controlled and easy to diff

Poetry for Dependency Management

I chose Poetry over pip/setuptools for reproducible builds and a clean pyproject.toml. The package is published to PyPI as momoparse — one pip install and you're running.

Railway for Deployment

The API runs on Railway with Docker. The setup is minimal — a Dockerfile, railway.toml, and environment variables. Auto-deploys on push to main.

Challenges

The Long Tail of SMS Formats

I built a corpus/ directory with sample messages for testing. Every bug report that comes in with a new format gets added to the corpus and a matching template gets created.

Confidence Scoring

What I Learned

MoMo Parse is open source and published on PyPI. Check it out on GitHub or install it with pip install momoparse.