Run a full model of R1-0528 in your area

Photo by the writer
Deepseek-R1-0528 The latest renewal of DeepseeEek's R1 requirement requires 715GB of the disk space of the disk, which makes it one of the most open models available. However, because of advanced measurement strategies from AdslothThe model size can be reduced in 162GB, 80% reduction. This allows users to recognize the full power of model with the lowest hardware requirements, albeit by small work performance.
In this lesson, we will:
- Set Ollama and Web Ui open to use your local Deepseek model.
- Download and prepare for a version of 1.78-bit used (IQ1_S) of model.
- Run the model using both GPU + CPU and CPU-only setup.
Step 0: Requirements
To launch a comprehensive version of IQ1_S, your program should meet these following requirements:
GPU requirements: At least 1x 24GB GPU (eg, Nvidia RTX 4090 or A6000) and 128GB RAM. For this tip, you can expect a generation speed of 5 / second tokens.
RAM requirements: The minimum of 64GB RAM is required to conduct model to use model model without GPU but the performance will be limited to 1 token / second.
Total Setup: In order to work very well (5+ tokens / second), you need at least 180GB of integrated memory or combination of 180GB RAM + vram.
Storage: Make sure you have at least 200GB of the free space for the model and its dependability.
Step 1: Enter the dependence and Ollama
Update your system and enter the tools needed. Ollama is a lack of heavy use of large-language models in the area. Add to the spread of Ubuntu using the following commands:
apt-get update
apt-get install pciutils -y
curl -fsSL | sh
Step 2: Download and use model
Run the 1.78-bit version used (IQ1_S) of the deeper model-R1-0528 using the following command:
ollama serve &
ollama run hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0

Step 3: Set and use Web Open UI
Pull the Ui Docker web site open photo with Cuda Support. Run the UI vessel of UI for GPU support and Ollama integration.
This command will:
- Start the UI open Web server in Port 8080
- Enable GPU rapid using
--gpus all
flag - Mount the Required Data Guide (
-v open-webui:/app/backend/data
Selected
docker pull ghcr.io/open-webui/open-webui:cuda
docker run -d -p 9783:8080 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:cuda
When the container operate, access Web Ui Interface open in your browser at http://localhost:8080/
.
Step 4: Runn Deepseek R1 0528 in the open webui
Select hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0
model from the model menu.

If the Ollama server fails to use GPU well, you can switch to the killing CPU. While this will reduce the operation (nearly 1 / second token), it ensures that the model can still run.
# Kill any existing Ollama processes
pkill ollama
# Clear GPU memory
sudo fuser -v /dev/nvidia*
# Restart Ollama service
CUDA_VISIBLE_DEVICES="" ollama serve
When the model works, you can work with one with the UI open UI. However, note that the speed will be limited to 1 token / second due to lack of GPU acceleration.

The last thoughts
Running even a version made into a challenge. You need an instant internet connection to download the model, and when download fails, you must restart the whole process from the beginning. I also faced many problems trying to use on my GPU, as I kept finding GGUF's related mistakes. Without trying several ordinary repairs to the GPU errors, nothing works, so I finally turned off everything on the CPU. While this is doing the work, now it takes just 10 minutes in model to produce an answer, far away from the ideal item.
I'm sure there are better solutions, perhaps using Illama.cpp, but let me trust me, it took me all day so I could run.
Abid Awa (@ 1abidaswan) is a certified scientist for a scientist who likes the machine reading models. Currently, focus on the creation of the content and writing technical blogs in a machine learning and data scientific technology. Avid holds a Master degree in technical management and Bachelor degree in Telecommunication Engineering. His viewpoint builds AI product uses a Graph Neural network for students who strive to be ill.