Local language models: Building a personal AI assistant with LM Studio

The first of a two-part series on how to spin up a large language model (LLM) using an ordinary, consumer-level home computer.

Local language models: Building a personal AI assistant with LM Studio

The rise of generative AI has captivated the world, but interacting with models like ChatGPT often feels limited – constrained by usage caps, privacy concerns and a lack of understanding about how the AI is coming up with the responses it does. 

What if you could have your own personal AI assistant, running locally on your machine? Thanks to tools like LM Studio, Ollama and Msty (amongst others), that is now within reach. This article will delve into LM Studio, but the focus of all these is to provide a user-friendly way to download, run and experiment with large language models (LLMs) directly on your local PC.

Running your own AI can offer privacy, customisation options, unique model training and the ability to work offline. 

This guide will walk you through setting up your own personal AI using LM Studio, from hardware considerations to running your first model. For reference, I’m using a Minisforum AI X1 Pro, incorporating an AMD Ryzen AI 9 HX 370, Radeon 890M iGPU, 2TB SSD and 96GB of DDR5 Memory.

Hardware requirements – the foundation

Running LLMs effectively demands a certain level of computing power. While you can technically run them on older machines, the experience will likely be frustratingly slow. Let's break down what you need:

  • RAM is king: LLMs very considerably in size, but in general, they are massive –requiring significant memory to load and operate. The more RAM you have, the larger and more capable models you can comfortably run. As a general guideline:
    • 8-16GB: Minimal, suitable only for very small, highly quantised models (expect slow performance).
    • 16-32GB: A good starting point for smaller, quantised models. You’ll be able to experiment with some of the more popular 7B parameter models.
    • >32GB: This is really where you want to be when running your own LLM. This unlocks a much wider range of models and provides a smoother experience.
  • CPU considerations: The CPU plays a role, but it’s less critical than RAM. While faster CPUs are always beneficial, they won't be your primary bottleneck. Any modern multi-core processor from the last few years will more than suffice. 
  • GPU (optional): A dedicated GPU can significantly accelerate the inference process (how quickly the AI generates responses) and is vital for training models and complex calculations. However, even a modern integrated CPU or older dedicated GPU will suffice. It also worth noting that configuring and optimising GPU usage can be more complicated (we’ll get into that in more detail in the next article).
  • Storage: The speed of your storage is crucial for loading models quickly. As such, an SSD (rather than an HDD) is vital to ensure your models load in good time.

Installation and choosing your first model

LM Studio (as with its alternatives) is available for Windows, macOS and Linux and can be downloaded from its website. The installation process is straightforward: simply follow the on-screen instructions. On macOS, you may encounter permission issues, so ensure LM Studio has access to the necessary folders in your system preferences.

Before you can actually run an AI, you need a model. LM Studio makes it easy to browse and download models from Hugging Face Hub, a central repository for LLMs, with a huge range of options:

  • What are LLMs? Think of them as incredibly complex content generators, trained on massive datasets. They can answer questions, write stories, create images, translate languages, help with coding, etc – all the things you’ve come to expect from modern generative AI systems. Many have different strengths and capabilities - such as reasoning, image input, tool use and so on - so you’ll want to experiment with which models suit your needs and PC hardware.
  • Model Size & Quantisation: Models come in various sizes (measured in "parameters," like 7B, 13B, or even larger). Larger models generally offer better performance but require more resources. "Quantisation" is a technique that reduces the size of these models without significantly impacting quality – think of it as compressing them for efficiency. Common quantisation levels include Q4 (more compressed, faster) and Q5 (higher quality, slightly slower).
  • Recommended Beginner Models:
    • TinyLlama: (Around 1GB) - A great starting point if you're limited on resources or want to test your setup.
    • Mistral 7B: (Around 4-8 GB depending on quantisation) – Offers a good balance of size and capability.
    • Gemma 3: (Various sizes available) - A popular choice based on Google, models are well-suited for a variety of text generation and image understanding tasks and with a large community and plenty of support.

Getting up and running – your first conversation:

Getting started is about as simple as you would hope, but here’s a short primer on the process to get up and running:

  1. Open LM Studio.
  2. Click the "Browse Models" button.
  3. Search for one of the recommended models (e.g., “Mistral-7B”).
  4. Select a quantised version (e.g., “TheBloke/Mistral-7B-Instruct-v0.1-GGUF” and choose Q4_K_M).
  5. Click "Download." LMStudio will automatically download the model files to your computer.
  6. Once downloaded, click the "Load Model" button and select the model you just downloaded.
  7. You’ll see a chat interface appear. Type in a prompt (e.g., “Write a short poem about cats”).
  8. Adjust initial settings: Temperature (start with 0.7), Top P (start with 0.9), Max Tokens (start with 256). These control the randomness and length of the AI’s responses. 

Should you encounter any issues, LM Studio has a ton of great documentation on its site and there are lots of forums on Reddit and similar water holes to find answers.

If all you want is the security, privacy and/or convenience of running your own AI, then that should be all you need to get you up and running. It’s when you start tinkering that things start to get more complicated and we’ll delve into that in more detail in the next article, which will be published next week here on Machine.

Do you have a story or insights to share? Get in touch and let us know. 

Follow Machine on XBlueSky and LinkedIn