IPA Transcription in Kilobytes with Zig, LLaMA 2, and WASM
LLVM, take the wheel
Very much WIP.
TLDR
- We made a super small G2P model (IPA transcriber).
- The model is based on LLaMA 2.
- Phoneme error rate is 1% for Korean and 13% for English.
- The model on disk is less than 500KBs, so you can run it on a toaster.
- We release Zig, Python, and Javascript libraries using WASM.
- You can download the library at https://hamanlp.org
Experimental setup
Datasets
We use the following datasets.
Model
We modified the LLama2 implementation from https://github.com/cgbur/llama2.zig.
We perform grid-search for the following hyperparameters.
- Hidden size (32, 64, 128, 256)
- Transformer layers (1, 2, 3, 4)
- Feed-forward network intermediate size (32, 64, 128, 256)
Findings
PER vs. hyperparameters
PER by Transformer layers
PER Heatmap for each hyperparameter pair
Compression
This section is to be updated.
But the gist of it is:
We pasted all float32 weights of our model to a single Zig file. We let llvm take the wheel for optimization.