Artificial intelligence models are typically used online, but a variety of freely available tools are changing that.
local AIs are trending.
AI - Why local models are the future |
The website histo.fyi is a database of structures of immune system proteins called major histocompatibility complex (MHC) molecules. It contains images, data tables, and amino acid sequences, which uses artificial intelligence (AI) tools called large language models (LLMs) to transform these assets into readable summaries. But he doesn't use ChatGPT or any other web-based LLM. Instead, he runs the AI on his laptop.
In recent years, chatbots based on LLMs have been praised for their ability to write poetry or hold conversations. Some LLMs have hundreds of billions of parameters—the more parameters, the more complex—and are only accessible online. But two recent trends have emerged. First, organizations are creating “open weights” versions of LLMs, where the weights and biases used to train a model are publicly available, allowing users to download them and run them locally if they have the computing power. Second, technology companies are creating stripped-down versions that can run on consumer hardware—and that can match the performance of older, larger models.
Researchers could use such tools to save money, protect patient or company confidentiality, or ensure reproducibility. This trend is likely to increase. As computers get faster and models more efficient, people will increasingly have AIs running on their laptops or mobile devices for all but the most demanding purposes. Scientists will finally have AI assistants at their fingertips - but the actual algorithms, not just remote access to them.
Big things in small packages
Several large technology companies and research institutes have released small and open models in recent years, including Google DeepMind in London, Meta in Menlo Park, California, and the Allen Institute for Artificial Intelligence in Seattle, Washington (see "Some small open models"). ("Small" is relative - these models can contain about 30 billion parameters, which is large compared to previous models.)
Some small models with open weight
developer |
Model |
parameter |
---|---|---|
Allen Institute for AI |
7 billion |
|
Alibaba |
0.5 billion |
|
Apple |
7 billion |
|
Google DeepMind |
9 billion |
|
Google DeepMind |
7 billion |
|
Meta |
8 billion |
|
Microsoft |
14 billion |
|
Mistral AI |
12 billion |
Although California-based tech company OpenAI has not openly weighted its current GPT models, its partner Microsoft in Redmond, Washington, has done a lot of work, releasing small language models Phi-1, Phi-1.5, and Phi-2 in 2023, then four versions of Phi-3 and three versions of Phi-3.5 this year. The Phi-3 and Phi-3.5 models have between 3.8 and 14 billion active parameters, and two models (Phi-3-vision and Phi-3.5-vision) process images 1 . On some benchmarks, even the smallest Phi model outperforms OpenAI's 2023 GPT-3.5 Turbo, which is rumored to have 20 billion parameters.
Sébastien Bubeck, Microsoft's vice president of generative AI, attributes Phi-3's performance to its training dataset. LLMs first train by predicting the next "token" (an iota of text) in long strings of text. For example, to predict the name of the murderer at the end of a crime novel, an AI must "understand" everything that came before, but such momentous predictions are rare in most text. To get around this problem, Microsoft used LLMs to write millions of short stories and textbooks where one builds on the other. The result of training on this text, Bubeck says, is a model that fits on a mobile phone but has the performance of the first version of ChatGPT, released in 2022. "If you're able to create a dataset that's very rich in these thought tokens, then the signal is going to be much richer," he says.
Phi-3 can also help with routing - deciding whether to forward a query to a larger model. "That's one area where Phi-3 will shine," says Bubeck. Small models can also help scientists in remote regions where there is little cloud connection. "Here in the Pacific Northwest, there are great hiking areas, and sometimes I just don't have a network," he says. "And maybe I want to take a photo of a flower and ask my AI for information about it."
Researchers can use these tools to build custom applications. Chinese e-commerce site Alibaba, for example, has built models called Qwen with 500 million to 72 billion parameters. A biomedical scientist in New Hampshire refined the largest Qwen model using scientific data and created Turbcat-72b, which is available on the model-sharing site Hugging Face. (The researcher is known only as Kal'tsit on the messaging platform Discord, as AI-powered work in academia is still controversial.) Kal'tsit says she created the model to help researchers brainstorm, proofread manuscripts, prototype code, and summarize published work; the model has been downloaded thousands of times.
Ki - Privacy Protection |
Respect for privacy
In addition to the ability to optimize open models for specific applications, Kal'tsit says another advantage of local models is privacy. Sending personal data to a commercial service could violate privacy regulations. "If there is an audit and you show that you are using ChatGPT, the situation could get pretty nasty," she says.
Cyril Zakka, a doctor who leads the healthcare team at Hugging Face, uses local models to generate training data for other models (which are sometimes local as well). In one project, he uses them to extract diagnoses from medical reports so another model can learn to predict those diagnoses based on echocardiograms used to monitor heart disease. In another project, he uses the models to generate questions and answers from medical textbooks to test other models. "We're paving the way to fully autonomous surgery," he explains. A robot trained to answer questions could communicate better with doctors.
Zakka uses local models - he prefers Mistral 7B, released by Paris-based technology firm Mistral AI, or Meta's Llama-3 70B - because they are cheaper than subscription services like ChatGPT Plus and because he can fine-tune them. But privacy is also important, as he is not allowed to send patient records to commercial AI services.
Johnson Thomas, an endocrinologist at Mercy Health System in Springfield, Missouri, is also motivated by patient privacy. Doctors rarely have time to transcribe and summarize patient conversations, but most commercial services that use AI to do this are either too expensive or not approved to process private medical data. Thomas is developing an alternative. Based on Whisper—an open-weight speech recognition model from OpenAI—and Gemma 2 from Google DeepMind, the system allows doctors to transcribe conversations and turn them into medical notes, as well as summarize data from participants in medical studies.
Privacy is also an issue in industry. CELLama, developed at the South Korean pharmaceutical company Portrai in Seoul, uses local LLMs such as Llama 3.1 to reduce information about a cell's gene expression and other properties to a summary set 2 . It then creates a numerical representation of that set that can be used to group cells into types. The developers highlight privacy as a benefit on their GitHub page, noting that CELLama "operates locally, ensuring no data leaks occur."
AI - Using models sensibly |
Using models sensibly
As the LLM landscape evolves, scientists face a rapidly changing set of options. "I'm still in the tinkering and experimentation phase as far as using LLMs locally." they use Llama locally, with either 8 billion or 70 billion parameters, both running on his Mac laptop.
Another advantage is that local models don't change. Commercial developers, on the other hand, can update their models at any time, leading to different results and forcing them to change the prompts or templates. "In most scientific fields, you want things that are reproducible," he explains. "And it's always a concern when you don't have control over the reproducibility of your results."
For another project, he is writing code that aligns MHC molecules based on their 3D structure. To develop and test his algorithms, he needs many different proteins—more than exist in nature. To design plausible new proteins, he uses ProtGPT2 , an open-weights model with 738 million parameters that has been trained on about 50 million sequences 3 .
Sometimes, however, a local app is not enough. For programming, cloud-based GitHub uses Copilot. "It feels like my arm is chopped off if I can't use Copilot for some reason," he says. While there are local LLM-based programming tools (such as Google DeepMind's CodeGemma and one from California-based developer Continue ), in his experience they can't compete with Copilot.
Access points
So how do you run a local LLM? A software called Ollama (available for Mac, Windows, and Linux operating systems) allows users to download open models, including Llama 3.1, Phi-3, Mistral, and Gemma 2, and access them from a command line. Other options include the cross-platform app GPT4All and Llamafile , which can convert LLMs into a single file that runs on six operating systems with or without a graphics processor.
Sharon Machlis, a former editor at the website InfoWorld, lives in Framingham, Massachusetts, and has written a guide to using LLMs locally , covering a dozen options. "The first thing I would suggest," she says, "is to choose the software you select based on your level of how much you want to play around." Some people prefer the simplicity of apps, while others prefer the flexibility of the command line.
Whatever approach you take, local LLMs should soon be good enough for most applications, says Stephen Hood, head of open source AI at the San Francisco-based technology company Mozilla. "The progress in this area in the last year has been amazing," he says.
Users have to decide for themselves what these applications might be. "Don't be afraid to get your hands dirty," advises Zakka. "The results might pleasantly surprise you."
Artificial intelligence |