Based in Brazil
// 5 min

Scan Tracking Codes With Your Camera: Building OCR Into a Package Tracker

ios vision ocr swift indie-dev

The best features remove a step the user didn’t know they were taking.


The Problem With Tracking Codes

If you’ve ever ordered anything in Brazil, you know the drill. You get a tracking code from Correios, or Shopee, or AliExpress. It arrives in one of many forms:

  • A printed slip taped to a shelf at the post office
  • A WhatsApp message from the seller: “Aqui o rastreio: AD123456789BR”
  • An email buried in a purchase confirmation
  • A screenshot someone sent you

In every case, you end up squinting at a 13-character alphanumeric code and thumb-typing it into a tracking app. AD123456789BR. One wrong character and nothing comes up. It’s a small friction — but it happens every single time.

I built Packybara to track packages with a capybara companion. But the add-package flow had this exact friction point. Paste from clipboard helped (and we detect that automatically), but what about the printed slip? The WhatsApp screenshot?

So I pointed the camera at the problem.


How It Works

The flow is dead simple from the user’s perspective:

  1. Tap the camera icon in the add package sheet
  2. Take a photo (or choose one from the library)
  3. Packybara scans the image, finds the tracking code, detects the carrier
  4. Code auto-populates. Carrier chip turns green. Tap “Track.”

Three taps instead of 13 characters of careful typing.

Under the hood, it’s Apple’s Vision framework doing the heavy lifting.


The Technical Stack

Vision Framework OCR

let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate
request.usesLanguageCorrection = false

Two important choices here:

.accurate over .fast — tracking codes are short alphanumeric strings on noisy backgrounds (shipping labels, thermal prints, low-res screenshots). Accuracy matters more than speed. The difference is ~200ms, and the user is already waiting for the camera to dismiss.

usesLanguageCorrection = false — this is critical. Language correction would “fix” AD123456789BR into something that looks like a word. Tracking codes aren’t words. They’re structured identifiers. Turning off correction preserves them exactly as printed.

Token Extraction

Vision returns text observations — blocks of recognized text. A shipping label might contain the sender’s name, address, barcode, and tracking code all at once. I split every observation into tokens by whitespace, trim punctuation, uppercase everything, and deduplicate with a Set.

for observation in results {
    let tokens = observation.topCandidates(1).first?.string
        .components(separatedBy: .whitespacesAndNewlines)
    for token in tokens {
        let cleaned = token.trimmingCharacters(in: .punctuationCharacters).uppercased()
        if let carrier = CarrierDetector.detect(trackingCode: cleaned) {
            // Found a valid tracking code
        }
    }
}

Carrier Detection via Regex

The magic filter: not every string on a shipping label is a tracking code. CarrierDetector validates each token against known patterns:

CarrierPatternExample
Correios^[A-Z]{2}\d{9}[A-Z]{2}$AD123456789BR
Shopee^SPXBR\d+$SPXBR042820003
AliExpress^(LP|CNG)\d+$LP532029401CN
Anjun^AJ\d+$AJ12345678
J&T^JT\d+$JT98765432

Only tokens that match a known carrier pattern are returned as results. This eliminates false positives — addresses, names, dates all get filtered out naturally.

Multiple Codes in One Image

Sometimes a photo contains multiple tracking codes. A seller might send a screenshot with two orders, or a shipping label might have both a domestic and international code.

When the scanner finds multiple valid codes, it presents a picker sheet. User taps the one they want. Simple.

If only one code is found, it auto-populates directly. No extra step.


Why This Matters for Brazilian Users

Brazil’s e-commerce is exploding. Shopee and AliExpress are the top international platforms. Correios handles domestic delivery. Most users track 3-5 packages at any given time.

The tracking code experience is uniquely painful here because:

  • Multiple carriers per household — someone has a Correios package from Mercado Livre, a Shopee package from China, and an AliExpress order from last month
  • Codes arrive via WhatsApp — Brazil’s primary communication channel. Sellers send tracking codes as text messages, not deep links
  • Post office receipts are thermal prints — they fade, they’re hard to read, and you’re standing in line trying to type the code before you leave

Camera scan turns all of these into a single tap. Point at the WhatsApp screenshot. Point at the thermal receipt. Point at the email on another screen. Done.


Connecting to the Bigger Picture

This OCR feature became even more powerful after I shipped Siri integration. Now the full flow is:

  1. Scan a tracking code with the camera → package added
  2. Ask Siri “Cadê meu pacote?” → get status by voice
  3. Delivery automation fires → texts your family when it arrives

Three features that independently make sense, but together create a system where you barely interact with the app directly. The OS and the camera do the work.


What OCR Gets Wrong

I’ll be honest about the limitations:

  • Blurry photos — Vision’s .accurate mode handles moderate blur, but a shaky hand on a crumpled receipt still fails. The user just retakes the photo.
  • Handwritten codes — some sellers write tracking codes by hand. Vision’s handwriting recognition exists but I haven’t tuned for it yet. These still need manual entry.
  • Non-Brazilian carriers — the regex patterns only match the 5 carriers Packybara supports. A DHL or FedEx code on the same label gets silently ignored. That’s by design — we don’t track those carriers(yet).
  • Partial codes — if the photo crops the last 2 characters, the regex won’t match and the user gets “No tracking code found.” The error message could be more helpful here.

None of these are dealbreakers. The feature works reliably for the 90% case: a clear photo of a standard Brazilian shipping label or a screenshot of a tracking code message.


The Implementation Was Small

The entire OCR scanner is ~40 lines of code. TrackingCodeScanner.swift does the Vision request. CarrierDetector.swift (which already existed for clipboard detection) does the pattern matching. The UI integration in AddPackageSheet is another ~20 lines for the camera button, scan options dialog, and result handling.

Total effort: about 2 hours from idea to shipped feature.

The lesson: sometimes the highest-impact features are the ones that remove friction you’ve been tolerating. Nobody asked for OCR scanning. But once it existed, typing a tracking code felt barbaric.

The capybara agrees.