Model Training
We've spent 7 years training generative Legal AIÂ models
Many legal AI companies popped into existence after ChatGPT’s public release on 30th November 2022, looking to 'cash in' on hype in the industry.
We had already been deeply embedded members of the Legal AI industry for 5 years by then. In fact, our founders Rafie and Nitish both did their Masters in machine learning. ¶¶Òõ¶ÌÊÓƵ, Oxford University, University College London (UCL) and Imperial College London then undertook a sequence of studies into: how to build Machine Learning which handles the sensitive data of legal work both securely and ethically. Learn more about our research here.
While the world changed and reacted to ChatGPT's release, our privacy-first approach to model training hasn't, and our team remains dedicated to customer privacy.
So, you may ask, how do we improve ¶¶Òõ¶ÌÊÓƵ without using customer data?
Training On Our Proprietary Knowledge Base
Since 2017, we’ve worked in collaboration with Magic Circle law firms like Withers, and FTSE 250/SP500 bluechip companies to develop a knowledge base that’s broad in terms of practice areas and deep in terms of legal context particularly today within US and UK legal systems.
This growing database of 1m+ legal templates, definitions, clauses, legislation and case law are a key part of how our model is trained and kept up to date with evolving legal standards.
We filter each request through our proprietary knowledge base to ensure we provide accurate legal context before the AI responds.
You can see part of that data set in the form of , developed and vetted by our partner firms.
Expanding our proprietary knowledge base for in-context learning
While others may boast about the large fine-tuning they have done, they might forget to mention that.
- Fine tuning achieves worse results than in-context learning (
- A fine-tuned model might be sat on top of an outdated foundation model, unable to establish general context in the present day.
- They will have to re fine-tune their models at great expense to themselves and most likely passing the costs on to their customers.
±·´Ç³Ù±ð:ÌýWe've had some messages enquiring about point 3 so we'll expand. To train GPT-4 on all UK Case Law would cost ~£100,000. If you get that wrong, or GPT-4 is no longer best-in-class, you have to start again from scratch.
Would you hire someone just because they could recite every piece of case law? Of course not, you hire them to get something done right. That's our approach too.
Our approach emphasises the agentic elements of AI. We want to provide our AI with the tools to reason like a paralegal who is expert in your company, your playbook and your document would.
Legal experts provide domain knowledge
For training data, prompt engineering and in-context learning algorithms, we rely on an expert panel of legal advisors and in-house legal engineers (ex Magic Circle / Big Law).
Ongoing performance improvements
Performance is enhanced by both data scientists and our machine learning teams. Model performance tests emphasize accuracy, ethics, confidentiality and compliance above all else.
Unlimited context for your queries
Unlike many other Legal AI tools, or even general purpose LLMs, we offer an unlimited context window, allowing you to condition the model on the entire text of the document you are querying. This delivers truly comprehensive, transparent and detailed analysis on even the most complex legal documents without constraint.
Does ¶¶Òõ¶ÌÊÓƵ use Large Language Models?
We also partner with the best-in-class Large Language Models (LLMs), in our case GPT4 (OpenAI) and Claude (Anthropic), which have been fine-tuned for legal.
We are constantly evaluating the accuracy (precision and recall in AI-speak) of our model set-up per use case, so we can tailor the LLM we query to the legal task at hand. Sometimes we will get responses from multiple LLMs and compare them, before giving you what our model deems the 'best' output.
Prompt Engineering
Genie's complex prompting framework encourages the AI to go through a specialised process when giving an answer. In effect, this makes Genie act like a junior lawyer would:
- collecting context
- looking at previous examples
- thinking step by step
- checking its answer etc.
So without giving away too much here, these are some of the components that generate accuracy levels of 94-96% according to our legal panel. You can ask us more by
â€