Ƶ

Available Legal AI Datasets

18th December 2024
3 min
Text Link

Note: This article is just one of 60+ sections from our full report titled: The 2024 Legal AI Retrospective - Key Lessons from the Past Year. Please download the full report to check any citations.

Available datasets

Contract Understanding Atticus Dataset (CUAD) is a corpus of 13,000+ labels in 510 commercial legal contracts that have been manually labeled under the supervision of experienced lawyers to identify 41 types of legal clauses that are considered important in contract review.

The contracts are collected from the Electronic Data Gathering, Analysis, and Retrieval ("EDGAR") system, which is maintained by the U.S. Securities and Exchange Commission (SEC) ().

ContractNLI is a dataset for document-level natural language inference (NLI) on contracts, containing 607 (NDAs). Despite containing more contracts than the CUAD dataset, these are considerably shorter and the whole contract corpus of this dataset is shorter. Moreover, it doesn't contain any other contract type other than NDA. Having more extensive knowledge of the context for this data would enhance the performance of models fine-tuned on them.

Written by

Alex Denne
Head of Growth

Review any legal document for free

Join 130,000+ users already strengthening their legal docs using Ƶ:
4.6 / 5
4.8 / 5

Interested in joining our team? Explore career opportunities with us and be a part of the future of Legal AI.

Related Posts

Show all

Discover what Genie can do for you

Create

Generate bulletproof legal documents from plain language.
Explore Create

Review

Spot and resolve risks with AI-powered contract review.
Explore Review

Ask

Your on-demand legal assistant; get instant legal guidance.
Explore Ask