create account

Run even larger AI models locally with LM Studio by themarkymark

View this thread on: hive.blogpeakd.comecency.com
· @themarkymark ·
$43.99
Run even larger AI models locally with LM Studio
 https://i.imgur.com/WlaFVVO.png 

A few days ago, I wrote a [post](https://peakd.com/hive-167922/@themarkymark/how-to-run-ai-directly-on-your-own-pc) about how to run large language AI models on your local PC using Ollama.  I am a big fan of Ollama, but I have been using a new tool that is even better for interactive use.

LM Studio offers many of the same features and ease of use as Ollama and a lot more.  It runs on Windows, Mac, and Linux and can be used interactively, or as a server that mimics OpenAI's API.

I've been liking LM Studio so much, I think I am going to remove Ollama from my machine.  I have been using Ollama interactively as well as a server for other processes.  I even have Ollama linked into VS Code to act as my own version of Github Copilot.

## Installing LM Studio

Super easy, barely an inconvience.  Just go to https://lmstudio.ai/ and choose your operating system.

## Using LM Studio

Once you have LM Studio installed, you are going to first need to download some models.  Depending on your system, and how much VRAM you have, you choices may be limited.   One of the great things about LM Studio is you can use VRAM along with your system ram (at a big performance penalty).  This will allow you to use much larger and less quantized models.

I have an AMD 5950X with 64G DDR4 and an nVidia 3090 in my main system.  This gives me 24G VRAM and 64G of system ram.  One of the models I have been playing with lately is Dolphin-Mixtral, which is a MoE model.  I'm not going to get into a MoE model in this post, but it is a newer approach to LLM that uses multiple smaller models to provide fine tuned experts to break up responses.

Let's look at my options for this model and my hardware.

 https://i.imgur.com/stvF38Q.png 

First we got to go to the search models tab, and then select the 2.7 Mixtral version.  This is the latest release for Dolphin-Mixtral.

https://i.imgur.com/GTVRrHx.png 

At the top of LM Studio, you can see my resources available.  Which is the amount I said above but with the current overhead factored in. 

Select the model, and on the right you will see all the files associated with it.

 https://i.imgur.com/Q0r23mj.png 

You can see I have two installed which I chose specifically.  The first one will fit entirely on my GPU and run at top performance, the other is a 4 bit quantized version which is a lot better but requires a lot of ram.  If you look at the first model, it is a 2 bit quantized model, which means the precision is highly reduced resulting in more potentially inaccurate choices as you go through the neural network.  I would recommend using a 4 bit model if possible, the source model is typically 16 bit, so anything less than this will have reduced accuracy.  4 bit is generally a good compromise while making it accessable with consumer hardware.

Let's try the 2 bit version, and see how that goes.

https://i.imgur.com/8BHgkp8.png 
First you need to go into the chat tab, and then load the model up top.

 https://i.imgur.com/z9fkHIE.png 

On the right you will see some choices, for this model we are going to want to set GPU Layers to -1, this will force all layers onto the GPU, this is ideal for this model as it will fit on my 24G 3090.  If your GPU can't fit it, you will get an error.  You will also want to set the context window, this is how much data the model can reference.  2048 is the default, but the more tokens you have the further back the conversation can go.  2048 is a good starting point for most tasks, if you are consuming more information you may need to increase this.

The first prompt I am going to use to test, is:

`I have a hot dog on a plate in the kitchen, I take the plate into the living room and sit down.  Where is the hot dog?`

 https://i.imgur.com/nkU2hYy.png 

Even the 2 bit model is able to answer this, despite many other models failing.

 https://i.imgur.com/OHw79XE.png 

On the bottom, we can see some performance numbers to see how fast we are generating responses.  48 tokens per second is a very acceptable speed.  In fact, this is faster than you can read.

 https://i.imgur.com/qdA4d6Z.png 

I'm going to switch to the 4 bit quantized version, this version requires 26GB, just over my available VRam, so before I load it, I need to change the GPU layers parameter.  I found through some testing I can offload 20 layers to the GPU and use most of the available VRAM.

 https://i.imgur.com/MQrS1ff.png 

This configuration though I lost a lot of performance, dropping down just under 8 tokens per second.  This is still usable, and not as fast as most poeple can read, but not slow enough that you are waiting forever.  Most of the model is fitted on the GPU, with a few layers done on the CPU and system ram.

 https://i.imgur.com/i1PzG2u.png 

I can tweak the settings a little bit, and get 22 layers on the GPU for a slight improvement but I can't get the last couple layers on the GPU due to the ram requirements.  This gave me a slight increase in performance, but nothing major.  Depending on how much VRAM you have, your results will vary.  I can also increase the CPU threads to 12 ( I have 16 native cores on my CPU) to get similar performance without increasing layers.

 https://i.imgur.com/592DYsZ.png 

Just as important as your prompt, is the system prompt you give the model before asking it a question.  The default prompt is very simple and can be modified to suit your needs.

For example, you can give it a prompt "You are an expert lawyer, and your client gives you a call and asks a question.  Please answer their questions to the best of your ability".  You can save this as a preset "Lawyer".

 https://i.imgur.com/iXR3A2X.png 

LM Studio also exposes a lot more advanced settings you can use to tweak your experience.  From my experience, LM Studio is a bit buggy, at least on the Linux beta and you may have better results from Ollama if the bugs creep up in your use.

For most people, the Dolphin Mixtral may be too big of a model to work with, and you might want to look at someting like Open Orca 7B.  As always, you can explore models on Hugging Face, the goto stop for Open LLM models.



Posted Using [InLeo Alpha](https://inleo.io/@themarkymark/run-even-larger-ai-models-locally-with-lm-studio)
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 624 others
properties (23)
authorthemarkymark
permlinkrun-even-larger-ai-models-locally-with-lm-studio
categoryhive-167922
json_metadata{"app":"leothreads/0.3","format":"markdown","tags":["hive-167922","ai","technology","vyb","pob","hive-engine","neoxian","archon","palnet","cent"],"canonical_url":"https://inleo.io/@themarkymark/run-even-larger-ai-models-locally-with-lm-studio","links":["https://peakd.com/hive-167922/@themarkymark/how-to-run-ai-directly-on-your-own-pc)","https://lmstudio.ai/","https://inleo.io/@themarkymark/run-even-larger-ai-models-locally-with-lm-studio)"],"images":["https://i.imgur.com/WlaFVVO.png","https://i.imgur.com/stvF38Q.png","https://i.imgur.com/GTVRrHx.png","https://i.imgur.com/Q0r23mj.png","https://i.imgur.com/8BHgkp8.png","https://i.imgur.com/z9fkHIE.png","https://i.imgur.com/nkU2hYy.png","https://i.imgur.com/OHw79XE.png","https://i.imgur.com/qdA4d6Z.png","https://i.imgur.com/MQrS1ff.png","https://i.imgur.com/i1PzG2u.png","https://i.imgur.com/592DYsZ.png","https://i.imgur.com/iXR3A2X.png"],"dimensions":{"https://i.imgur.com/WlaFVVO.png":{"width":1708,"height":627},"https://i.imgur.com/9zcfVJz.png":{"width":579,"height":84},"https://i.imgur.com/Qxf4ysA.png":{"width":661,"height":351},"https://i.imgur.com/stvF38Q.png":{"width":661,"height":488},"https://i.imgur.com/GTVRrHx.png":{"width":536,"height":55},"https://i.imgur.com/Q0r23mj.png":{"width":1202,"height":790},"https://i.imgur.com/8BHgkp8.png":{"width":1691,"height":416},"https://i.imgur.com/hM4ySXd.png":{"width":314,"height":355},"https://i.imgur.com/nkU2hYy.png":{"width":917,"height":144},"https://i.imgur.com/z9fkHIE.png":{"width":325,"height":505},"https://i.imgur.com/OHw79XE.png":{"width":981,"height":57},"https://i.imgur.com/qdA4d6Z.png":{"width":325,"height":119},"https://i.imgur.com/MQrS1ff.png":{"width":1004,"height":63},"https://i.imgur.com/i1PzG2u.png":{"width":960,"height":45},"https://i.imgur.com/592DYsZ.png":{"width":320,"height":130},"https://i.imgur.com/iXR3A2X.png":{"width":324,"height":329}}}
created2024-02-06 05:06:27
last_update2024-02-06 05:06:27
depth0
children12
last_payout2024-02-13 05:06:27
cashout_time1969-12-31 23:59:59
total_payout_value22.048 HBD
curator_payout_value21.940 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length6,272
author_reputation1,669,704,010,439,681
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,071,581
net_rshares105,666,086,166,593
author_curate_reward""
vote details (688)
@apshamilton · (edited)
Very interesting. I've been looking for an easy to use a large LLM that could use the 64Gb of unified memory on my M1 Max chip.

I don't like interacting with AI in the cloud that collects my data and is not private.

I was surprised to learn that Apple has released a M3 Max with 128Gb unified memory. That would really be powerful and could run huge models.

I'll let you know how it goes.
👍  , ,
properties (23)
authorapshamilton
permlinkre-themarkymark-s8f9c1
categoryhive-167922
json_metadata{"tags":"hive-167922"}
created2024-02-06 06:43:15
last_update2024-02-06 06:44:15
depth1
children3
last_payout2024-02-13 06:43:15
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length391
author_reputation186,516,695,188,555
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,072,950
net_rshares9,924,641,145
author_curate_reward""
vote details (3)
@themarkymark ·
Mac Studio and even Mac Minis are very popular option for LLM due to how unified memory works.  Nowhere can you get ~188 VRAM for less than the cost of even a single A100 40G.
👍  , , , , , ,
properties (23)
authorthemarkymark
permlinkre-apshamilton-s8f9f0
categoryhive-167922
json_metadata{"tags":["hive-167922"],"app":"peakd/2024.1.1"}
created2024-02-06 06:45:00
last_update2024-02-06 06:45:00
depth2
children2
last_payout2024-02-13 06:45:00
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length175
author_reputation1,669,704,010,439,681
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,072,978
net_rshares9,970,475,746
author_curate_reward""
vote details (7)
@apshamilton ·
$0.07
I'm getting 23 tokens per second using the 5 bit Mixtal 2.7 model.
👍  , , ,
properties (23)
authorapshamilton
permlinkre-themarkymark-s8fjax
categoryhive-167922
json_metadata{"tags":["hive-167922"],"app":"peakd/2024.1.1"}
created2024-02-06 10:18:33
last_update2024-02-06 10:18:33
depth3
children1
last_payout2024-02-13 10:18:33
cashout_time1969-12-31 23:59:59
total_payout_value0.034 HBD
curator_payout_value0.034 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length66
author_reputation186,516,695,188,555
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,076,178
net_rshares167,888,678,792
author_curate_reward""
vote details (4)
@eniolw ·
Thanks for your contribution to STEM content.

Congrats!
properties (22)
authoreniolw
permlinkre-themarkymark-202426t143955983z
categoryhive-167922
json_metadata{"tags":["hive-167922","ai","technology","vyb","pob","hive-engine","neoxian","archon","palnet","cent"],"app":"ecency/3.0.37-vision","format":"markdown+html"}
created2024-02-06 18:39:57
last_update2024-02-06 18:39:57
depth1
children0
last_payout2024-02-13 18:39:57
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length56
author_reputation200,516,033,321,113
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,086,070
net_rshares0
@galberto ·
Mmmm i am not trying this, 
It Is so very complicated to me, i uses the Telegram bot only for traducción, 
By the way i have a Homework to you, kevinwong here are publishing an Is proyect calles taunet with His Coín named agrs, they appears as the wonderfull development proyect, right bow they do not have a product finish, my answer Is this programing in github caller IDNI could be confiable or it been  only a scam.

Thanks a Lot AND sorry for the abuse.
I appreciate your opinión.

 
👍  , ,
properties (23)
authorgalberto
permlinkre-themarkymark-s8fxnu
categoryhive-167922
json_metadata{"tags":["hive-167922"],"app":"peakd/2024.1.1"}
created2024-02-06 15:28:45
last_update2024-02-06 15:28:45
depth1
children0
last_payout2024-02-13 15:28:45
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length488
author_reputation264,672,751,010,884
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,081,311
net_rshares122,912,342
author_curate_reward""
vote details (3)
@ganjafarmer ·
It's pretty insane how even the common person has access to artificial intelligence.

Awesome post and good luck with your experiment!
properties (22)
authorganjafarmer
permlinkre-themarkymark-202425t221724193z
categoryhive-167922
json_metadata{"type":"comment","tags":["hive-167922","ai","technology","vyb","pob","hive-engine","neoxian","archon","palnet","cent"],"app":"ecency/3.0.45-mobile","format":"markdown+html"}
created2024-02-06 06:17:27
last_update2024-02-06 06:17:27
depth1
children0
last_payout2024-02-13 06:17:27
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length134
author_reputation661,712,386,697,554
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,072,575
net_rshares0
@latinowinner ·
prior knowledge required
👍  , ,
properties (23)
authorlatinowinner
permlinkre-themarkymark-202426t17615761z
categoryhive-167922
json_metadata{"tags":["hive-167922","ai","technology","vyb","pob","hive-engine","neoxian","archon","palnet","cent"],"app":"ecency/3.0.37-vision","format":"markdown+html"}
created2024-02-06 07:06:18
last_update2024-02-06 07:06:18
depth1
children1
last_payout2024-02-13 07:06:18
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length24
author_reputation880,609,663,227
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,073,252
net_rshares126,465,252
author_curate_reward""
vote details (3)
@themarkymark ·
Not really, this gets you up and running.
👍  , ,
properties (23)
authorthemarkymark
permlinkre-latinowinner-s8fakq
categoryhive-167922
json_metadata{"tags":["hive-167922"],"app":"peakd/2024.1.1"}
created2024-02-06 07:10:03
last_update2024-02-06 07:10:03
depth2
children0
last_payout2024-02-13 07:10:03
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length41
author_reputation1,669,704,010,439,681
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,073,328
net_rshares124,671,983
author_curate_reward""
vote details (3)
@mdasein ·
 !PGM
properties (22)
authormdasein
permlinkre-themarkymark-202426t15369224z
categoryhive-167922
json_metadata{"tags":["hive-167922","ai","technology","vyb","pob","hive-engine","neoxian","archon","palnet","cent"],"app":"ecency/3.0.43-mobile","format":"markdown+html"}
created2024-02-06 07:36:09
last_update2024-02-06 07:36:09
depth1
children1
last_payout2024-02-13 07:36:09
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length5
author_reputation3,554,996,729,245
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,073,874
net_rshares0
@pgm-curator ·
<center>Sent 0.1 PGM - 0.1 LVL- 1 STARBITS  - 0.05 DEC - 1 SBT - 0.1 THG - 0.000001 SQM - 0.1 BUDS - 0.01 WOO - 0.005 SCRAP -  0.001 INK tokens </center>

<center><sub>remaining commands 12</sub></center>


**BUY AND STAKE THE PGM TO SEND A LOT OF TOKENS!**

The tokens that the command sends are: 0.1 PGM-0.1 LVL-0.1 THGAMING-0.05 DEC-15 SBT-1 STARBITS-[0.00000001 BTC (SWAP.BTC) only if you have 2500 PGM in stake or more ]

5000 PGM IN STAKE = 2x rewards! 

![image.png](https://files.peakd.com/file/peakd-hive/zottone444/23t7AyKqAfdxKEJPQrpePMW15BCPhbyrf5VoHWxhBFcEcPLjDUVVQAh9ZAopbmoJDekS6.png)
Discord [![image.png](https://files.peakd.com/file/peakd-hive/hive-135941/23wfr3mtLS9ddSpifBvh7mwLx1rN3eoaSvbwUxTngsNR1GQ8EiZTrC9P9RwZxHCCfK8e5.png)](https://discord.gg/KCvuNTEjWw)


Support the curation account @ pgm-curator with a delegation [10 HP](https://hivesigner.com/sign/op/WyJkZWxlZ2F0ZV92ZXN0aW5nX3NoYXJlcyIseyJkZWxlZ2F0b3IiOiJfX3NpZ25lciIsImRlbGVnYXRlZSI6InBnbS1jdXJhdG9yIiwidmVzdGluZ19zaGFyZXMiOiIxMCJ9XQ..) - [50 HP](https://hivesigner.com/sign/op/WyJkZWxlZ2F0ZV92ZXN0aW5nX3NoYXJlcyIseyJkZWxlZ2F0b3IiOiJfX3NpZ25lciIsImRlbGVnYXRlZSI6InBnbS1jdXJhdG9yIiwidmVzdGluZ19zaGFyZXMiOiI1MCJ9XQ..) - [100 HP](https://hivesigner.com/sign/op/WyJkZWxlZ2F0ZV92ZXN0aW5nX3NoYXJlcyIseyJkZWxlZ2F0b3IiOiJfX3NpZ25lciIsImRlbGVnYXRlZSI6InBnbS1jdXJhb3RyIiwidmVzdGluZ19zaGFyZXMiOiIxMDAifV0.) - [500 HP](https://hivesigner.com/sign/op/WyJkZWxlZ2F0ZV92ZXN0aW5nX3NoYXJlcyIseyJkZWxlZ2F0b3IiOiJfX3NpZ25lciIsImRlbGVnYXRlZSI6InBnbS1jdXJhdG9yIiwidmVzdGluZ19zaGFyZXMiOiI1MDAifV0.) - [1000 HP](https://hivesigner.com/sign/op/WyJ0cmFuc2Zlcl90b192ZXN0aW5nIix7ImZyb20iOiJfX3NpZ25lciIsInRvIjoicGdtLWN1cmF0b3IiLCJhbW91bnQiOiIxMDAwIn1d)

Get **potential** votes from @ pgm-curator by paying in PGM, here is a [guide](https://peakd.com/hive-146620/@zottone444/pay-1-pgm-and-get-4-votes-itaegn)



<sub>I'm a bot, if you want a hand ask @ zottone444</sub>

***
properties (22)
authorpgm-curator
permlinkpgm-curatormdasein1707204990594
categoryhive-167922
json_metadata{"tags":[],"app":"pgm/0.1","format":"markdown+html"}
created2024-02-06 07:36:30
last_update2024-02-06 07:36:30
depth2
children0
last_payout2024-02-13 07:36:30
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length1,930
author_reputation3,409,490,822,394
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,073,880
net_rshares0
@stemsocial ·
re-themarkymark-run-even-larger-ai-models-locally-with-lm-studio-20240206t184610974z
<div class='text-justify'> <div class='pull-left'>
 <img src='https://stem.openhive.network/images/stemsocialsupport7.png'> </div>

Thanks for your contribution to the <a href='/trending/hive-196387'>STEMsocial community</a>. Feel free to join us on <a href='https://discord.gg/9c7pKVD'>discord</a> to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support.&nbsp;<br />&nbsp;<br />
</div>
properties (22)
authorstemsocial
permlinkre-themarkymark-run-even-larger-ai-models-locally-with-lm-studio-20240206t184610974z
categoryhive-167922
json_metadata{"app":"STEMsocial"}
created2024-02-06 18:46:09
last_update2024-02-06 18:46:09
depth1
children0
last_payout2024-02-13 18:46:09
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length565
author_reputation22,460,334,324,555
root_title"Run even larger AI models locally with LM Studio"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id131,086,255
net_rshares0