We are glad to present a “ground truth” for automatic text recognition of printed Hebrew material. The meaning of this resource for the Jewish studies scholar and student is the ability to make their scanned printed material – in Hebrew, Judeo Arabic, Yiddish or Ladino – into digital text. In subsequent posts we will give examples of what can then be done with the digital text that could not be done with the scanned image or the hard copy of the source.
What is Transkribus?
Transkribus is a platform for transcription, automatic text detection and search in historical document, the fruit of the European Commission funded e-Infrastructure project READ (Recognition and Enrichment of Archival Documents. It is now continuing to be developed in the framework of READ COOP.
How can you use it? in short:
- Register to Transkribus
- Download and install the software
- Upload your scanned document
- Analyse the page layout
- Run the text recognition model
- Export the document
Though this may look simple, the platform does have a learning curve. For download and installation I recommend using Transkribus’ ‘how to’ instructions, or the longer instructions on Transkribus Wiki, or the even longer complementary instructions here: read them carefully and progress patiently step by step:
Among Transkribus public models you will find DiJeSt model for Hebrew script languages, which you can apply to your documents. It is good for automatic recognition of several types of printed texts in Yiddish, Hebrew, Judeo Arabic and Ladino. To read more about the model and how we built it, go here.