Mellon Grant Funds Continuation of Islamicate Text Digitization Project
July 05, 2022
A $1.75 million grant will help expand digital access to Arabic, Persian, Ottoman Turkish and Urdu manuscripts and books.
By Jessica Weiss ’05
The University of Maryland has received a $1.75 million grant from the Mellon Foundation to continue development of open-source technology to expand digital access to manuscripts and books from the premodern Islamicate world in Arabic, Persian, Ottoman Turkish and Urdu.
Matthew Thomas Miller, assistant professor in the Roshan Institute for Persian Studies in the School of Languages, Literatures, and Cultures, leads the interdisciplinary team of researchers, including David Smith from Northeastern University, Sarah Bowen Savant from Aga Khan University (AKU) in London, Taylor Berg-Kirkpatrick from the University of California, San Diego, and Raffaele Viglianti from the Maryland Institute for Technology in the Humanities at Maryland. The Mellon Foundation has been funding the project, known as “OpenITI AOCP,” since 2019.
“Over the past four years we have made incredible progress on the creation of digital infrastructure for Islamicate studies, and that is thanks in large part to the Mellon Foundation,” Miller said. “We are honored that the foundation continues to support our efforts to expand access to and digitally preserve such a rich and important cultural tradition.”
There are currently hundreds of thousands—perhaps even millions—of premodern Islamicate books and manuscripts that are not able to be accessed digitally by academics or the public, Miller said.
Thus far, the project team—made up of computer science and humanities experts—has successfully improved the accuracy of open-source Persian and Arabic optical character recognition (OCR) software, which is a system that turns physical, printed documents into machine-readable text. Under the new grant, they will use this OCR software to produce 2,500 new digitized Persian and Arabic texts, as well as expand the OCR system’s capabilities into Ottoman Turkish and Urdu.
They also aim to improve the accuracy of open-source handwritten text recognition (HTR) for Arabic-script manuscripts. A subfield of OCR technology, HTR tools are designed to read a diversity of human handwriting types with high levels of accuracy.
The team will also roll out a user-friendly redesign of its eScriptorium platform, which hosts the open-source tools. This latest Mellon grant will last three years. (Last year, Miller also received a grant from the National Endowment for the Humanities to support the project.)
Though he hopes its next phase of developments mark a major improvement for Arabic, Persian, Ottoman Turkish and Urdu texts, Miller said the goal ultimately is for the open-source tools to be used across a wide variety of languages.
“We really hope the technology will be reused by other users, especially those working in other under-resourced languages,” he said. “It’s designed to meet the needs of varied users.”
Image description: Persian ruba‘i (quatrain) calligraphy dating between circa 1610 and circa 1620. Gift in honor of Madeline Neves Clapp; Gift of Mrs. Henry White Cannon by exchange; Bequest of Louise T. Cooper; Leonard C. Hanna Jr. Fund; From the Catherine and Ralph Benkaim Collection. Learn more.