In this presentation a number of examples of existing partnerships with cultural heri. DARIAH is committed to continuing to encourage, facilitate and celebrate this collaboration at the core of what we do. Libraries, Archives and Museums are essential partners in this endeavour. For us, the digital is not a goal in itself, but a means to explore, discover and grow. Digital methods are a cornerstone of what we do, ensuring we focus on how technology is transforming not objects, but activities. DARIAH: Core communities for collaboration: Libraries, Archives and Museums as essential partners in digital arts and humanities research - Sally Chambers (Ghent Centre for Digital Humanities, Belgium) The mission of DARIAH, the Digital Research Infrastructure for the Arts and Humanities, is to empower research communities with digital methods to create, connect and share knowledge about culture and society.Morning lightning talks for DH2019 DH & Lib preconference workshop. Our research also demonstrates how texts' intrinsic semantic features can be used for evaluating the impacts of OCR noise on advanced language models, which is an underdeveloped and promising direction for future work. Rochester Intermediate School Rochester Junior High School Rochester High School. This should help alleviate some DL users' concerns regarding applying contextualized word embeddings to encode chapter-level or even document-level OCR'd text information, which benefits promoting scholarly use of DL collections. Rochester CUSD 3A4 Rocket DriveRochester, Illinois 62563Phone. Our empirical results show that (1) BERT embeddings can encode and preserve texts' intrinsic semantic features (i.e., relevance and coherence) and (2) such capabilities are comparatively robust against OCR noise. ![]() Given the encoded text features, we further calculated the cosine similarity between any two chapters and used normalized discounted cumulative gain (NDCG) to measure BERT variants' capabilities to preserve narrative coherence and semantic relevance among texts. ![]() Specifically, we encoded chapterwise paired OCR'd texts and their cleaned counterparts extracted from books in six domains using BERT pre-trained and fine-tune models respectively. ![]() In conjunction with IEEE International Parallel & Distributed Processing Symposium ( IPDPS) May 19 2023, Hilton St. To shed some light on this issue, this study evaluates the impacts of OCR noise on BERT models for encoding the intrinsic semantic features of OCR'd texts. Parallel and Distributed Processing for Computational Social Systems. The uncertainty caused by optical character recognition (OCR) noise has been a primary barrier for digital libraries (DL) to promote their curated datasets for research purposes, particularly when the datasets are fed into advanced language models with less transparency.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |