IPA Transcription in Kilobytes with Zig, LLaMA 2, and WASM

LLVM, take the wheel

🎗️

Very much WIP.

TLDR

  • We made a super small G2P model (IPA transcriber).
  • The model is based on LLaMA 2.
  • Phoneme error rate is 1% for Korean and 13% for English.
  • The model on disk is less than 500KBs, so you can run it on a toaster.
  • We release Zig, Python, and Javascript libraries using WASM.
  • You can download the library at https://hamanlp.org

Experimental setup

Datasets

We use the following datasets.

Model

We modified the LLama2 implementation from https://github.com/cgbur/llama2.zig.

We perform grid-search for the following hyperparameters.

  • Hidden size (32, 64, 128, 256)
  • Transformer layers (1, 2, 3, 4)
  • Feed-forward network intermediate size (32, 64, 128, 256)

Findings

PER vs. hyperparameters

output.png

PER by Transformer layers

output(1).png

PER Heatmap for each hyperparameter pair

output(2).png

Compression

This section is to be updated.
But the gist of it is:
We pasted all float32 weights of our model to a single Zig file. We let llvm take the wheel for optimization.