OCR — Extract text from scanned PDFs
Run OCR on image-based PDFs to make them searchable and copyable. Supports 13 languages.
Drop a scanned PDF to OCR
or click to select · up to 100 MB
Pick the language of the text in the PDF. Combined options let you mix languages (e.g. Chinese + English).
Note: OCR runs entirely in your browser (Tesseract.js via WebAssembly). First use downloads the OCR engine and language data — after that it's cached. Higher-resolution scans give much better accuracy.
Other PDF tools
Frequently asked questions
How accurate is the OCR?
Very accurate on clean scans (90%+). Lower on low-resolution, skewed, or noisy scans. For best results, scan at 300 DPI and keep pages straight.
Can I use multiple languages?
Yes — pick a combined option like '繁中 + English' from the language list. This is useful for bilingual documents.
Why is the first run slow?
The OCR engine (~5 MB) and language data (~2-10 MB per language) are downloaded on first use. Subsequent runs use the browser cache and are much faster.