Rd-TableBench – Accurately evaluating table extraction
20 by raunakchowdhuri | 4 comments on Hacker News.
Hey HN! A ton of document parsing solutions have been coming out lately, each claiming SOTA with little evidence. A lot of these turned out to be LLM or LVM wrappers that hallucinate frequently on complex tables. We just released RD-TableBench, an open benchmark to help teams evaluate extraction performance for complex tables. The benchmark includes a variety of challenging scenarios including scanned tables, handwriting, language detection, merged cells, and more. We employed an independent team of PhD-level human labelers who manually annotated 1000 complex table images from a diverse set of publicly available documents. Alongside this, we also release a new bioinformatics inspired algorithm for grading table similarity. Would love to hear any feedback! -Raunak
No comments:
Post a Comment