Ethan Mollick’s Post

View profile for Ethan Mollick, graphic
Ethan Mollick Ethan Mollick is an Influencer

Remember BloombergGPT, which was a specially trained finance LLM, drawing on all of Bloomberg's data? It made a bunch of firms decide to train their own models to reap the benefits of their special information and data. You may not have seen that GPT-4 (the old, pre-turbo version with a small context window), without specialized finance training or special tools, beat it on almost all finance tasks. It is part of a pattern - the smartest generalist frontier models beat specialized models in specialized topics. Your special proprietary data may be less useful than you think in the world of LLMs... https://lnkd.in/e4QKBFPK

  • No alternative text description for this image
  • No alternative text description for this image

Yes, smaller models have advantages over larger models in areas like speed & cost, which is why we are likely to see many types of LLMs work together 👇 But BloombergGPT was trained on financial dafa so as to be better than generalist models at financial analysis, which it wasn’t. https://www.oneusefulthing.org/p/an-ai-haunted-world

Sanchit Garg

Generalist | 3x Startups | IIM Indore | IIIT Delhi

4mo

Could one of the reasons be that Bloomberg's proprietary data comprised only 0.7% of the overall training dataset? Bloomberg portrayed that majority data was their own financial data, however, as per their research paper, 99.3% of the data was anyway generally available public data.

  • No alternative text description for this image
Gang Lee

Founder & CEO at ELGO Technologies

4mo

It is always a balancing act. While the best generalist model might outperform specialized models, the compute and hosting costs of these generalist models usually is huge compared to the specialized models. Companies would look at several aspects such as cost, control, security and privacy other than just costs when choosing the right model to deploy for their use case. I believe that specialized models are still relevant (if not even more relevant) in the future. We will start seeing more ensembles of a generalist model (for reasoning and routing tasks) and specialized models (specialized tasks).

Gary Longsine

Fractional CTO. Collaborate • Deliver • Iterate. 📱

4mo

Some technology races are won by watching them for a while, from the sidelines, before entering. If one assumes that this won't be a "winner take all" game (and be warned, many people assume that it *is* just that) then it might be best to build some infrastructure and practice a bit, and build up team skills — but with the expectation that this round is just preparation.

Harsh Singhal

I solve business problems with data+algos | ML@Adobe | Led the ML team at Koo | Prev at Netflix, LinkedIn California | Relocated to Blr 2021 | Visiting faculty at MSRIT | LLMs are epistemology probes

4mo

This has serious implications on the open source LLM ecosystem. Especially given the primary advantage with open source LLMs is fine-tuning. And come to think of it, GPT-4 continues to be the solution to generate SFT data.

Manprit Singh

Data and AI CTO Healthcare and Fintech

4mo

Other recent studies that show how prompting strategies alone can be effective in evoking this kind of domain-specific expertise from generalist foundation models.   https://www.microsoft.com/en-us/research/blog/the-power-of-prompting/

  • No alternative text description for this image
James V Baber

Technology Leadership | AI Transformation Consulting | AI Startup Investor

4mo

RAG is a hack primarily because we have to chunk data into context windows with the same 8KB memory capacity of a calculator from the mid 1980's, then feed it to an LLM with significant token costs so the quantity of chunks delivered for processing is necessarily kept to a minimum. This results in missed content, even with Vector Search, Semantic Ranking, and Knowledge Graphs. I challenge anyone to build a RAG model with your favorite LLM where you upload 500 SEC Form 10-Q documents that are 25 page PDFs and then ask your RAG-enabled LLM to list all 500 10-Q's by company name and summarize the operational challenges of each. It either won't due to limits, or you'll spend a fortune on tokens. Your expectations for what you can achieve with a RAG model have to be constrained to it's structural ability, context window size, and token costs. However, a recent RAG model I built for an engineering (materials testing) lab far exceeds generalist frontier models, drafting with a Professional Engineer's terminology and context; but that's because I knew exactly how to build it (with some NLP summarization tricks up my sleeve), knew my client's expectations, and gave explicit instructions and sample queries on how to use it effectively.

Abhishek Gupta

Simplifying technology adoption, step-by-step | AE @ Whatfix

4mo

Is there a case of creating guardrails when training purpose-specific models hence decreasing the freedom of degree?

Dan Wasserman

Less talk, more prompting

4mo

Every business leader wants a model trained on their proprietary data but this is a great example of the hesitation I have recommending that right now. It's not a slam dunk like you'd expect.

Joseph Pareti

AI Consultant @ Joseph Pareti's AI Consulting Services | AI in CAE, HPC, Health Science

4mo

'Your special proprietary data may be less useful than you think' --- I would be careful with this type pf statements, one example in #healthscience :  #BioNeMo provides large scale, optimized training on YOUR OWN DATA

See more comments

To view or add a comment, sign in

Explore topics