Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While this paper is clearly not without merits, it intends to be more like an excuse to make a bombastic statements about a whole profession or "industry" (perhaps to raise their visibility and try to sell something later on?). The worst part is that they have actually referenced a single preprint document as "previous art" - and that document itself is not related to contract review, but to legal reasoning in general. (A part of LegalBench is of course "interpretation", and that is built on existing contract review benchmarks, but they could've found more relevant papers). Automating legal document review has been a very active field in NLP for twenty years or so (including in QA tasks) and became a lot more active since 2017. At least e.g. Kira (and Luminance etc., none of which is LLM-based) are already used quite widely in legal departments/firms around the world. So lawyers do have practical experience in their limitations... But Kira & co. are not measuring the performance of the latest and greatest models and they do not use transparent benchmarks etc. So the benchmark results in this paper are indeed a welcome addition in terms of using LLMs. But also considering its limited scope of reviewing 10 (!) documents based on a single review playbook, they should not have written about "implications for the industry". It is very much pretentious and shows more of the lack of knowledge of the authors of the very same industry than about the future of the legal services industry.

If you're interested in the capabilities and limitations, I suggest these informative, but still light reads as well: https://kirasystems.com/science/ https://zuva.ai/blog/ https://www.atticusprojectai.org/cuad



Any particular papers you would recommend? The links are to blogs with lots of papers.


The third one (CUAD) is a single paper, not blogposts like the others. I think this paper is still the best in terms of being done by NLP experts and understanding the possibilities and not being just some and mirror. But there are so many papers published in this area nowadays that I might not even notice a new one. The CUAD paper was still based on BERT, so pretraining was needed - that needs a bit more expertise than just prompting GPT-4-32k like in this paper or feeding prompts back to GPT-4 for another round of refining, or doing RAG. For honest research purposes, "contract review" is not really a good area of approach: the subject field is not standardised, there is no good benchmark yet and your paper can easily get into bad company of snake oil sellers cashing on visceral hatred of average people for all professions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: