SCNet 超算互联网 LLM Fine-Tuning LoRa 实例
除了头部大厂的API调用,大多数的民用LLM大模型的研究和应用在500B一级和以下(排除TAALAS的技术)已有明显区分。
1. 以OpenAI GPT-OSS-120B 为代表的千亿(100B+ 至 500B )的大模型。这一赛道目前呈现低比特率,如4bits (native trained 原生训练精度),强信息编码(Information Encoding),信息优化效率提高而不降低为基准。例如,以GPT-OSS-120B为标杆,直接对标GPT-4类大模型。在没有强大模型优化,或者理论支撑的情况下,国内或不会挑战万亿参数大模型,即使有,也难以与1200亿的GPT-OSS-120B相竞争。这是国内大模型发展数理受限的阶段性局限,并非对十万亿和百万亿参数的否定。然而,千亿数量的大模型通常需要多卡运作,存在一定技术壁垒。
2. 以OpenAI GPT-OSS-20B 为代表的百亿级的大模型,这其中以14B和7B为经典的Microsoft Phi-4系列,和Qwen, DeepSeek 系列的蒸馏模型(distill),为主流一线二线厂商模型。这些大模型在工业4bit的帮助下可以直接在大众计算机设备上运行(30-50GB),其Token生成数与个体用户的信息理解速率相当,尤其,其微调所需计算量较小,适合中小企业直接在模型层面进行二次开发,特定情况下成本或低至数千元。在模型效率提升或<4bit优化后,可成为Edge Compute的主流。
3. 以DeepSeek为代表的十亿级别的大模型,其中以DeepSeek 1.5B蒸馏模型为经典,其占用内存通常在3GB或以下。这类模型进一步优化后可直接放置入移动手机端或小型电脑。而且,其技术参数与百亿级大模型相当,适合教学使用,训练成本极低,通常在数十元以内。所获得的技术通常可以快速部署到百亿级大模型的运行,是良好的试错方法。
接下来,文章将展示如何使用SCNet节点对代号为“deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B” 的大模型进行微调,目的是介绍SCNet,以及云环境部署,还有中小企业所需的快速开发代码。本文为公益类代码,由DeepSeek辅助生成,经过实例测试。
1. 注册超算互联 https://www.scnet.cn
https://www.scnet.cn2. 点击右上角红色按钮“控制台”
3. 点击“服务导航”->“人工智能”(蓝色按钮)
这会进入人工智能Notebook所需界面 https://www.scnet.cn/ui/console/index.html#/notebook
https://www.scnet.cn/ui/console/index.html#/notebook4. 点击右上角“费用”->“总览”
5. 点击“充值” ->“支付宝”
具体充值金额按服务所需选择,接下来的案例消耗在10元以内(实测约2元)。
6.充值完成后返回人工智能Notebook 界面(https://www.scnet.cn/ui/console/index.html#/notebook)
7. 点击 “Notebook”->“创建Notebook” 选择“013组” (华东一区【昆山】),“加速卡数量1" 使用4090加速卡,包含24GB缓存(避免初学者调试需求)。
8. 点击”开发镜像“->”基础镜像“->"框架名称PyTorch"->”框架版本2.6.0“->"Python版本py3.12-ubuntu22.04"->“CUDA/DTK版本 cuda12.4” 点击右下角(红色)“创建”按钮
这会自动创建并切换回Notebook界面,在此界面可以直接操作Jupyter Notebook,或使用VS Code通过Remote SSH登录。
9.点击“快捷工具”->“JupyterLab”
这会加载界面,通常,系统会自动允许网络连接,若不能,则需联系客服。
10. 点击“root”->“笔记本”-“Python3”
同时,建议点击页面中间上方”+“新增标签页,打开“其他”->"终端"
注意,容器内默认为root账号。
11. 在“终端”内输入(代码可直接复制粘贴)
pip install --upgrade pip
如果不行,则切换阿里源
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
pip install --upgrade pip
12. 安装相关library
pip install transformers accelerate peft bitsandbytes datasets trl scikit-learn pandas
13.下载 crowdflower/twitter-airline-sentiment 的数据库为案例(需要注册账户,免费)
https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment
https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment点击“Download”->"Download dataset as zip",大约3MB的.csv 文件,其内容大致如下。
tweet_id sentiment author content
1956967341 empty xoshayzers @tiffanylue i know i was listenin to bad habit earlier and i started freakin at his part =[
1956967666 sadness wannamama Layin n bed with a headache ughhhh...waitin on your call...
这是一个非常知名的数据文件,也可以替换为选择的任意数据库。
14. 直接解压.zip 并将“twitter-airline-sentimentSentiment_Analysis.csv” 直接拖拽至jupyter notebook左侧与.ipynb相同的文件夹内(/root/)。
15. 下载LLM文件“"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"”至本地 "/root/private_data/DeepSeek1.5B"
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Define local save directory for the 1.5B model
local_model_dir = "/root/private_data/DeepSeek1.5B"
# ✅ Correct Model Name (1.5 Billion parameters)
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
print(f"Loading model: {model_name} (Official size: 1.54B parameters)")
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype=torch.bfloat16, # bfloat16 is safe and memory efficient
device_map="auto" # automatic device placement
)
# Save locally
tokenizer.save_pretrained(local_model_dir)
model.save_pretrained(local_model_dir)
print(f"Model saved to {local_model_dir}")
这会消耗大约10分钟。
16. 附注一段SCNet官方的提示
# https://www.scnet.cn/help/docs/mainsite/ai/notebook/function-introduction/
# 一、关机环境保存/保存镜像
# 可以在关机时保存开发环境或者使用“保存镜像”功能对开发环境进行备份,保证机器具有一致的环境和配置,满足再次启动环境、团队开发环境搭建、在其他平台复现环境等需求,容器实例开关机条件下皆可保存镜像。
# 注意: 为保证镜像正常运行,保存环境镜像时单层镜像数据量不得超过15 GiB,系统会对镜像大小进行校验,若镜像大小超过限额限制,您需要手动将容器环境下文件转移到文件存储中。
# 您可以使用如下代码快速定位当前环境中的大文件(含文件夹):
# cd /
# find . -path "./proc" -prune -o \
# -path "/root/private_data/*" -prune -o \ ##排除个人文件
# -path "/root/public_data/*" -prune -o \ ##排除平台共享文件
# -path "/root/group_data/*" -prune -o \ ##排除团队共享文件
# -path "/public/*" -prune -o \ ##排除共享存储文件
# -path "/work/*" -prune -o \ ##排除共享存储文件
# -type f -exec du -h {} + | sort -hr | head -n 20 ##展示大小排名前20的文件
# 识别到大文件后,使用如下代码将文件迁移至文件存储永久保存:
# mv /root/model_file /root/private_data/model_file
# 迁移后文件可能无法在文件存储中使用(属主为root),您需要在当前环境中执行如下代码修改权限:
# # 其中user_name需要替换为你的计算用户名
# chown user_name:user_name /root/private_data/model_file
17. 对已下载的模型进行一个测试
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
local_model_dir = "/root/private_data/DeepSeek1.5B"
tokenizer = AutoTokenizer.from_pretrained(local_model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
local_model_dir,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Example prompt
input_text = "Explain quantum computing in simple terms"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
18. 读入数据.csv
import pandas as pd
df = pd.read_csv('twitter-airline-sentimentSentiment_Analysis.csv')
first_50 = df.head(5000)
print(f"Loaded {len(first_50)} rows")
print(first_50[['tweet_id', 'sentiment', 'author', 'content']].head())
注意,必须放置在与jupyter notebook相同文件夹下。
19. 准备微调数据(选取头5000条)
# Define instruction
instruction = "Analyze the sentiment of the following tweet:"
# Create a list of formatted texts
formatted_texts = []
for idx, row in first_50.iterrows():
text = f"Instruction: {instruction}\nInput: {row['content']}\nOutput: {row['sentiment']}"
formatted_texts.append(text)
# Convert to a Hugging Face Dataset
from datasets import Dataset
dataset = Dataset.from_dict({"text": formatted_texts})
print(dataset[0]['text'])
20. 4bit quantization并准备模型训练参数
import torch
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TrainingArguments
)
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
from datasets import Dataset
# 4‑bit quantization config (saves memory)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
# Load model with quantization
model_name = "/root/private_data/DeepSeek1.5B" # or the HF name
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
# LoRA configuration
lora_config = LoraConfig(
r=8,
lora_alpha=32,
target_modules=["q_proj", "v_proj"], # typical for DeepSeek
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
# Wrap model with LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters() # should show ~0.1% trainable
21. 准备训练数据
def tokenize_function(examples):
return tokenizer(
examples["text"],
truncation=True,
padding="max_length",
max_length=512,
return_tensors=None # we'll handle with data collator
)
tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])
tokenized_dataset.set_format("torch", columns=["input_ids", "attention_mask"])
def set_labels(example):
example["labels"] = example["input_ids"].clone()
return example
tokenized_dataset = tokenized_dataset.map(set_labels)
22. 接下来有不同训练方法,为了简便,采用与Huggingface相同的训练方式
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TrainingArguments,
Trainer,
DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False # causal LM
)
23. 设定训练参数
training_args = TrainingArguments(
output_dir="./results",
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
fp16=True,
logging_steps=3,
# logging_steps=10,
save_strategy="epoch",
num_train_epochs=3,
optim="paged_adamw_8bit",
report_to="none"
)
24. 开始训练(耗时约16分钟)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset,
data_collator=data_collator,
)
trainer.train()
25. 保存微调的模型
output_dir = "/root/private_data/DeepSeek1.5B_finetuned"
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
print(f"LoRA adapter saved to {output_dir}")
26. 使用微调的模型进行一个示范
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# Paths
# base_model_name = "deepseek-ai/deepseek-llm-1.5b-base" # or your local path if saved
base_model_name = "/root/private_data/DeepSeek1.5B"
adapter_path = "/root/private_data/DeepSeek1.5B_finetuned"
# Optional: 4‑bit quantization (same as during training)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token # important for generation
# Load base model (with or without quantization)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
quantization_config=bnb_config, # remove if you didn't use quantization
device_map="auto",
trust_remote_code=True
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_path)
# Switch to evaluation mode
model.eval()
# --- Test the model ---
# Example tweet input
tweet = "I love this new phone! It's amazing 😍"
instruction = "Analyze the sentiment of the following tweet:"
# Format the prompt exactly as during training
prompt = f"Instruction: {instruction}\nInput: {tweet}\nOutput:"
# Tokenize and move to same device as model
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate (limit new tokens to a short answer, e.g., sentiment label)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=20, # sentiment label is short
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
# Decode only the newly generated part (skip the prompt)
generated_ids = outputs[0][inputs.input_ids.shape[1]:] # take only new tokens
generated_text = tokenizer.decode(generated_ids, skip_special_tokens=True)
print(f"Tweet: {tweet}")
print(f"Predicted sentiment: {generated_text}")
27. 使用微调的模型对比微调所使用的数据集进行案例示范
import torch
import pandas as pd
import random
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# --------------------------
# 1. Paths and model loading
# --------------------------
# base_model_name = "deepseek-ai/deepseek-llm-1.5b-base" # or your local path if saved
base_model_name = "/root/private_data/DeepSeek1.5B"
adapter_path = "/root/private_data/DeepSeek1.5B_finetuned"
# If you used 4‑bit quantization during training, load with same config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token # important for generation
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
quantization_config=bnb_config, # remove if you didn't use quantization
device_map="auto",
trust_remote_code=True
)
model = PeftModel.from_pretrained(base_model, adapter_path)
model.eval() # inference mode
# --------------------------
# 2. Load the original CSV
# --------------------------
csv_path = "twitter-airline-sentimentSentiment_Analysis.csv" # adjust if needed
df = pd.read_csv(csv_path)
# Ensure we have the required columns
print(f"CSV loaded with {len(df)} rows. Columns: {df.columns.tolist()}")
# --------------------------
# 3. Randomly pick 5 tweets
# --------------------------
sample_rows = df.sample(n=5, random_state=42) # fixed seed for reproducibility
instruction = "Analyze the sentiment of the following tweet:"
# --------------------------
# 4. Test each sample
# --------------------------
for idx, row in sample_rows.iterrows():
tweet = row['content']
actual_sentiment = row['sentiment']
# Build prompt exactly as during training
prompt = f"Instruction: {instruction}\nInput: {tweet}\nOutput:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=20,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
# Decode only the newly generated tokens (skip the prompt)
generated_ids = outputs[0][inputs.input_ids.shape[1]:]
predicted_text = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()
# Optional: clean up predicted text (remove trailing newline, etc.)
predicted_text = predicted_text.split('\n')[0] # take first line
print("\n" + "="*60)
print(f"Tweet: {tweet[:100]}...")
print(f"Actual sentiment: {actual_sentiment}")
print(f"Predicted sentiment: {predicted_text}")
print("="*60)
28. 返回"控制台"->“Notebook”->"操作" 关闭容器防止额外付费。
29. 在左侧“人工智能”->“文件管理“内,可下载微调后的”DeepSeek1.5B_finetuned“模型。
至此,一个单卡微调的十亿参数的模型的示例便完成了。
这类微调后的模型可在本地高效部署,大量节约API调用的时间成本(延迟)和费用。相同方法可以直接类推到80GB以内百亿的大模型,如7B的LLM模型。
我在找工作,HR或项目合作请联系:yucongcai_business@outlook.com
与科研相关的请联系:yucongcai_research@outlook.com
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐


所有评论(0)