基于 Karpathy 的开源项目 nanoGPT,在本地离线训练数据和推理,体验 LLM 是如何工作的。

Quick Start

拉取 nanoGPT

git clone https://github.com/karpathy/nanoGPT.git

输出:

Cloning into 'nanoGPT'...
remote: Enumerating objects: 689, done.
remote: Total 689 (delta 0), reused 0 (delta 0), pack-reused 689 (from 1)
Receiving objects: 100% (689/689), 975.25 KiB | 33.00 KiB/s, done.
Resolving deltas: 100% (382/382), done.

安装依赖包

cd nanoGPT
pip3 install requests
pip3 install tiktoken

准备测试数据(莎士比亚集)

python3 data/shakespeare_char/prepare.py

输出:

Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
length of dataset in characters: 1,115,394
all the unique characters: 
 !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
vocab size: 65
train has 1,003,854 tokens
val has 111,540 tokens

准备完会在 data/shakespeare_char 看到莎士比亚集 input.txt(原始数据),以及 train.bin(训练数据)和 val.bin(验证数据),下一步训练数据。

修改 Apple M4 Pro 的配置

如果是 Apple M4 Pro 上进行训练,需要将 cuda 改成 mps:
train.py:

# device = 'cuda' # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1' etc., or try 'mps' on macbooks
device = 'mps' # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1' etc., or try 'mps' on macbooks

sample.py:

# device = 'cuda' # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1', etc.
device = 'mps' # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1', etc.

测试是否支持 mps:

python3 -c "import torch; print(torch.backends.mps.is_available())"

输出:

True

开始训练

6 个隐藏层,6 个注意力头,384 个词嵌入维度,dropout 0.2,配置信息保存在 config/train_shakespeare_char.py 中。

python3 train.py config/train_shakespeare_char.py

输出:

Overriding config with config/train_shakespeare_char.py:
# train a miniature character-level shakespeare model
# good for debugging and playing on macbooks and such

out_dir = 'out-shakespeare-char'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

# we expect to overfit on this small dataset, so only save when val improves
always_save_checkpoint = False

wandb_log = False # override via command line if you like
wandb_project = 'shakespeare-char'
wandb_run_name = 'mini-gpt'

dataset = 'shakespeare_char'
gradient_accumulation_steps = 1
batch_size = 64
block_size = 256 # context of up to 256 previous characters

# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

warmup_iters = 100 # not super necessary potentially

# on macbook also add
# device = 'cpu'  # run on cpu only
# compile = False # do not torch compile the model

tokens per iteration will be: 16,384
found vocab_size = 65 (inside data/shakespeare_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 10.65M
nanoGPT/train.py:197: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  scaler = torch.cuda.amp.GradScaler(enabled=(dtype == 'float16'))
Python/3.9/lib/python/site-packages/torch/amp/grad_scaler.py:136: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(
num decayed parameter tensors: 26, with 10,740,096 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: False
compiling the model... (takes a ~minute)
W0121 14:33:49.331229 63390 torch/_inductor/utils.py:1436] [0/0] Not enough SMs to use max_autotune_gemm mode
step 0: train loss 4.2874, val loss 4.2823
iter 0: loss 4.2672, time 165246.57ms, mfu -100.00%
iter 10: loss 3.1471, time 668.24ms, mfu 0.56%
iter 20: loss 2.7730, time 666.49ms, mfu 0.56%
iter 30: loss 2.6538, time 672.80ms, mfu 0.56%
iter 40: loss 2.5960, time 668.94ms, mfu 0.56%
iter 50: loss 2.5479, time 671.95ms, mfu 0.56%
iter 60: loss 2.5225, time 671.65ms, mfu 0.56%
iter 70: loss 2.5075, time 674.10ms, mfu 0.56%
iter 80: loss 2.5035, time 667.45ms, mfu 0.56%
iter 90: loss 2.4759, time 669.39ms, mfu 0.56%
iter 100: loss 2.4703, time 669.11ms, mfu 0.56%
iter 110: loss 2.4623, time 670.90ms, mfu 0.56%
iter 120: loss 2.4334, time 668.60ms, mfu 0.56%
iter 130: loss 2.4249, time 668.51ms, mfu 0.56%
iter 140: loss 2.4050, time 672.67ms, mfu 0.56%
iter 150: loss 2.4091, time 670.49ms, mfu 0.56%
iter 160: loss 2.3787, time 668.70ms, mfu 0.56%
iter 170: loss 2.3538, time 668.54ms, mfu 0.56%
iter 180: loss 2.3063, time 667.35ms, mfu 0.56%
iter 190: loss 2.2496, time 668.32ms, mfu 0.56%
iter 200: loss 2.2144, time 675.80ms, mfu 0.56%
iter 210: loss 2.1572, time 670.36ms, mfu 0.56%
iter 220: loss 2.1547, time 668.44ms, mfu 0.56%
iter 230: loss 2.0977, time 666.49ms, mfu 0.56%
iter 240: loss 2.1005, time 668.09ms, mfu 0.56%
step 250: train loss 1.9921, val loss 2.0831
saving checkpoint to out-shakespeare-char
iter 250: loss 2.0691, time 159878.32ms, mfu 0.50%
iter 260: loss 1.9987, time 666.61ms, mfu 0.51%
iter 270: loss 2.0043, time 666.89ms, mfu 0.51%
iter 280: loss 2.0042, time 667.78ms, mfu 0.52%
iter 290: loss 1.9525, time 666.76ms, mfu 0.52%
iter 300: loss 1.9256, time 668.33ms, mfu 0.52%
iter 310: loss 1.8933, time 666.65ms, mfu 0.53%
iter 320: loss 1.8855, time 667.16ms, mfu 0.53%
iter 330: loss 1.8544, time 667.40ms, mfu 0.53%
iter 340: loss 1.8091, time 675.41ms, mfu 0.54%
iter 350: loss 1.8614, time 668.27ms, mfu 0.54%
iter 360: loss 1.8010, time 676.24ms, mfu 0.54%
iter 370: loss 1.7670, time 666.52ms, mfu 0.54%
iter 380: loss 1.7564, time 666.85ms, mfu 0.54%
iter 390: loss 1.7554, time 676.99ms, mfu 0.54%
iter 400: loss 1.7919, time 666.07ms, mfu 0.55%
iter 410: loss 1.7239, time 666.57ms, mfu 0.55%
iter 420: loss 1.7457, time 668.01ms, mfu 0.55%
iter 430: loss 1.7143, time 667.79ms, mfu 0.55%
iter 440: loss 1.6852, time 666.44ms, mfu 0.55%
iter 450: loss 1.6803, time 667.54ms, mfu 0.55%
iter 460: loss 1.6169, time 666.94ms, mfu 0.55%
iter 470: loss 1.6869, time 669.97ms, mfu 0.55%
iter 480: loss 1.6547, time 674.67ms, mfu 0.55%
iter 490: loss 1.6231, time 667.73ms, mfu 0.55%
step 500: train loss 1.5549, val loss 1.7608
saving checkpoint to out-shakespeare-char
iter 500: loss 1.6239, time 158930.90ms, mfu 0.50%
iter 510: loss 1.6324, time 667.30ms, mfu 0.50%
iter 520: loss 1.6264, time 667.00ms, mfu 0.51%
iter 530: loss 1.5836, time 666.78ms, mfu 0.51%
iter 540: loss 1.6385, time 666.64ms, mfu 0.52%
iter 550: loss 1.5973, time 666.87ms, mfu 0.52%
iter 560: loss 1.5876, time 666.07ms, mfu 0.53%
iter 570: loss 1.5980, time 667.18ms, mfu 0.53%
iter 580: loss 1.5607, time 668.87ms, mfu 0.53%
iter 590: loss 1.5277, time 667.66ms, mfu 0.53%
iter 600: loss 1.5400, time 667.09ms, mfu 0.54%
iter 610: loss 1.5701, time 668.65ms, mfu 0.54%
iter 620: loss 1.5543, time 668.49ms, mfu 0.54%
iter 630: loss 1.5390, time 668.21ms, mfu 0.54%
iter 640: loss 1.4888, time 666.82ms, mfu 0.54%
iter 650: loss 1.5250, time 667.27ms, mfu 0.55%
iter 660: loss 1.5253, time 666.81ms, mfu 0.55%
iter 670: loss 1.4661, time 670.23ms, mfu 0.55%
iter 680: loss 1.5329, time 669.35ms, mfu 0.55%
iter 690: loss 1.4827, time 670.71ms, mfu 0.55%
iter 700: loss 1.5092, time 677.90ms, mfu 0.55%
iter 710: loss 1.4750, time 669.38ms, mfu 0.55%
iter 720: loss 1.4606, time 668.19ms, mfu 0.55%
iter 730: loss 1.4387, time 667.14ms, mfu 0.55%
iter 740: loss 1.4382, time 667.24ms, mfu 0.55%
step 750: train loss 1.3788, val loss 1.6023
saving checkpoint to out-shakespeare-char
iter 750: loss 1.4408, time 159288.93ms, mfu 0.50%
iter 760: loss 1.4560, time 667.34ms, mfu 0.50%
iter 770: loss 1.4393, time 675.43ms, mfu 0.51%
iter 780: loss 1.4360, time 667.56ms, mfu 0.51%
iter 790: loss 1.4365, time 668.36ms, mfu 0.52%
iter 800: loss 1.4433, time 671.15ms, mfu 0.52%
iter 810: loss 1.4257, time 668.62ms, mfu 0.53%
iter 820: loss 1.4188, time 666.77ms, mfu 0.53%
iter 830: loss 1.4126, time 673.48ms, mfu 0.53%

大约 19 分钟,loss 降低到 1.4,可以 Ctrl+C 中断。低于 1.6 差不多可以使用,定期会在 checkpoint 的时候将模型参数保存在 out-shakespeare-char/ckpt.pt 中

测试推理

写一段莎士比亚风格的剧本测试一下,1000w 参数的小模型:

python3 sample.py --out_dir=out-shakespeare-char

输出:

Overriding: out_dir = out-shakespeare-char
number of parameters: 10.65M
Loading meta from data/shakespeare_char/meta.pkl...

Forsway:
Yet is the strenge officience:
Alask, and tell him the revenge, and had he so
Of thou art would bone proying batter
When I am in the father liht of it.

CORIOLANUS:
Where I lets the world?

CORIOLANUS:
Our gracess face?

Second Musician:
I proget when what shall you have you deather'd
upon of thrial of thy criek of heaven
That be his mother's brother, then your knew's recomposed
And him to the proceedent of the garltering,
Say thousand his King Richard and and so know his day.
Farewell,
---------------

AUTOLYCUS:
Not thus man that may her of the counsel of worsell.

Clown:
I pray lords are to him of the less, becaused
the and mark of projectuon: what his and
shall for else him arms die should art with me
discole position. Sir, sir, and thou hast he too be, af all
his lies the father, counce, distood boy; I'll grantain the enough
me conceive to her the firection, thou, there is by
thy was of but down player to be our
brother, but do besome to distreet as my garden,
in his thou art me, good whil
---------------

God, Or Lord Annerland Mercutio:
If the made of that a sword of virtuous like your busician:
What you have many moung while have too make so?

LUCIO:
Where's such is the revoice of my present the princes
and all to her in him.

ISABELLA:
I am a man:
I have so, mine that keep that pray us are deavy in
a back of which fair the was of along to make of the tarm
cannot against the blood, and true repart our goods. Then, farewell are with as looks:
if back him sleep't to die my estory, therefore mind 
---------------

Is not:
sir; who stards I dod but convey.
Fear your gone, as if your speeds, you have a
did feel have byend; do the name sound hath
of whereie are the rust too a wife. Take on my mad
with 

AUTOLYCUS:
Is past, that me both more requeen and for all
excose: to be repose of gentleman it: but do it
said not am as and be to woman a king, is a fire world in
envy in his the name of aure foolish the born, fire, thou lady agter
his our which ill heaven have that should for he honour.

Shepherd:
What will
---------------


SICINIUS:
Why, he's a bold,--

SICINIUS:
Must these we will down at here.

CORIOLANUS:
Ay, come, and who atter.

CORIOLANUS:
My lord; I made not, be man!

SICINIUS:
Not for all this a world, 'long at them?

CORIOLANUS:
Let me them of the lady, the comes: be so.

SICINIUS:
I'll not my lord.

MENENIUS:
This is not,
Whose he commend hath me to all our grace.

Second Musicil:
My lord,
I am brother, my lord, mark and Lord Sir:
What do should wh should be the grow?

MENENIUS:
No, be so love so, sir, 
---------------

PRINCE EDWARD:
A good of my liege, good of my kinsmy,
And been the wrong of his truth;
Where is so the onnicious and with of
Which a dangerous made the bed
Shall not devil the word too the bridd:
If he muth a was in to face rest,
So more but the good of truth; and and thou see thou measure,
For the gentle as of the down, soul, our kings stilling,
Not joy the patienciance of thy heading,
When he she disconder had by thy with our servant of hence
To be eached of ta'en by and by the will.

First Se
---------------

LUCIO:
On their macknower, sir, being love
than thou be master'd by this the people, my liege,
As if you have to my time to this with him.

CLAUDIO:
Come, unran the traitors.

DUCHESS OF YORK:
What was and under?

DUCHESS OF YORK:
We have have much at should be with with him?

DUKE OF YORK:
You shelp, but the advant of sheep,
That come a great and I have the while
To some flatter a happy one and the earth,
Could be do thus for month trich
A chooper togething-and sons see all so the face
Rome. Th
---------------

RICHMOND:
I would not dear-day--where courted, stands it there
swords on the house, to can his stronged in his
and beauty long. More too fair a cunsel-bloods awhile!

ROMEO:
Who redector him's like and assemble to guesty,
As is the sink of our gagenerall, poor is
no man as thine days gone with the father, and this read.

ROMEO:
Thou they poor a part, that where hath by shrowes,
Could thou have thou say subjects.

ROMEO:
A live of a tribution, heaven and those is beard
The like to day the wrong w
---------------

ISABELLA:
O Montague, thou art is excreated
To be honour had to present as such of the own:
And so brother, how which is bid for your lord.

CATESBY:
Madam, good me, sir, I'll get your place:
O be word: and I proceed to say:
Do be Romeo for you dead this desire of your air.

AUTOLYCUS:
Come, he was be have such and by the beat; Give,
that your both not with the one his ord,
that within us: but the like of the chargeight
sweeter son of not what he garlet to much you, let as will nor
cold work
at 
---------------

Where not enought good as the rite of the breath,
And they have hearth, 'er sorrow purpose in
Crown, and convunce that have shen you'll to her: for, my lord!

AUFIDIUS:
Come, madam, good and too look to thee armon.

MERCUTIO:
Camillo, leave a thousand Marcius, being him, but with him
contentime to be so.

MENENIUS:
Only, then is as a convey
cate is i' the son?

MENENIUS:
I'll have it
is a deam name and and in arful of
other of my gaves. Let him in the worstand my saint
to tears his in the true w
---------------

由于是基于字符的训练,所以模型是根据前一个字母推测下一个字母,因此可以看到大部分单词的拼写没什么问题,但是单词和单词之间的语法,以及句子的逻辑是不通顺的。

用我们自己的 txt 训练一个小模型

准备测试数据

在 data 下新建一个 asongoficeandfire_char 的目录,把我们自己的测试数据放进去(我用的是《冰与火之歌》)
文件名 input.txt(5.8MB)
把 data/shakespeare_char/prepare.py 复制到 data/asongoficeandfire_char 目录下。
同样执行:

python3 data/asongoficeandfire_char/prepare.py

准备完会在 data/asongoficeandfire_char 看到 train.bin(训练数据)和 val.bin(验证数据),下一步训练数据。

训练数据

复制一份配置文件:

cp config/train_shakespeare_char.py config/train_asongoficeandfire_char.py

修改如下地方:

4c4
< out_dir = 'out-shakespeare-char'
---
> out_dir = 'out-asongoficeandfire-char'
13c13
< wandb_project = 'shakespeare-char'
---
> wandb_project = 'asongoficeandfire-char'
16c16
< dataset = 'shakespeare_char'
---
> dataset = 'asongoficeandfire_char'

6 个隐藏层,6 个注意力头,384 个词嵌入维度,dropout 0.2,配置信息保存在 config/train_asongoficeandfire_char.py 中。
训练:

python3 train.py config/train_asongoficeandfire_char.py  

输出:

Overriding config with config/train_asongoficeandfire_char.py:
# train a miniature character-level shakespeare model
# good for debugging and playing on macbooks and such

out_dir = 'out-asongoficeandfire-char'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

# we expect to overfit on this small dataset, so only save when val improves
always_save_checkpoint = False

wandb_log = False # override via command line if you like
wandb_project = 'asongoficeandfire-char'
wandb_run_name = 'mini-gpt'

dataset = 'asongoficeandfire_char'
gradient_accumulation_steps = 1
batch_size = 64
block_size = 256 # context of up to 256 previous characters

# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

warmup_iters = 100 # not super necessary potentially

# on macbook also add
# device = 'cpu'  # run on cpu only
# compile = False # do not torch compile the model

tokens per iteration will be: 16,384
found vocab_size = 91 (inside data/asongoficeandfire_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 10.66M
nanoGPT/train.py:197: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  scaler = torch.cuda.amp.GradScaler(enabled=(dtype == 'float16'))
/opt/miniconda3/lib/python3.13/site-packages/torch/cuda/amp/grad_scaler.py:31: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  super().__init__(
num decayed parameter tensors: 26, with 10,750,080 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: False
compiling the model... (takes a ~minute)
W0502 20:10:45.979000 53411 site-packages/torch/_inductor/utils.py:1679] [0/0] Not enough SMs to use max_autotune_gemm mode
step 0: train loss 4.5436, val loss 4.5478
iter 0: loss 4.5548, time 172993.96ms, mfu -100.00%
iter 10: loss 3.0827, time 625.62ms, mfu 0.60%
iter 20: loss 2.7973, time 610.92ms, mfu 0.60%
iter 30: loss 2.5831, time 621.93ms, mfu 0.60%
iter 40: loss 2.5245, time 610.92ms, mfu 0.60%
iter 50: loss 2.5052, time 610.63ms, mfu 0.60%
iter 60: loss 2.4299, time 610.76ms, mfu 0.60%
iter 70: loss 2.4883, time 638.31ms, mfu 0.60%
iter 80: loss 2.4324, time 618.21ms, mfu 0.60%
iter 90: loss 2.4496, time 611.00ms, mfu 0.60%
iter 100: loss 2.4527, time 610.55ms, mfu 0.60%
iter 110: loss 2.4480, time 610.59ms, mfu 0.60%
iter 120: loss 2.4027, time 610.98ms, mfu 0.60%
iter 130: loss 2.3871, time 609.80ms, mfu 0.60%
iter 140: loss 2.4171, time 610.24ms, mfu 0.61%
iter 150: loss 2.3562, time 609.76ms, mfu 0.61%
iter 160: loss 2.3860, time 609.25ms, mfu 0.61%
iter 170: loss 2.3322, time 610.08ms, mfu 0.61%
iter 180: loss 2.2883, time 613.37ms, mfu 0.61%
iter 190: loss 2.2512, time 611.06ms, mfu 0.61%
iter 200: loss 2.1991, time 611.03ms, mfu 0.61%
iter 210: loss 2.1816, time 610.19ms, mfu 0.61%
iter 220: loss 2.1134, time 609.65ms, mfu 0.61%
iter 230: loss 2.0926, time 611.06ms, mfu 0.61%
iter 240: loss 2.0479, time 609.09ms, mfu 0.61%
step 250: train loss 1.9957, val loss 2.0624
saving checkpoint to out-asongoficeandfire-char
iter 250: loss 2.0490, time 165542.54ms, mfu 0.55%
iter 260: loss 2.0540, time 609.25ms, mfu 0.55%
iter 270: loss 1.9583, time 609.79ms, mfu 0.56%
iter 280: loss 1.9579, time 608.94ms, mfu 0.57%
iter 290: loss 1.9369, time 609.98ms, mfu 0.57%
iter 300: loss 1.8528, time 610.03ms, mfu 0.57%
iter 310: loss 1.9215, time 609.36ms, mfu 0.58%
iter 320: loss 1.9001, time 610.48ms, mfu 0.58%
iter 330: loss 1.8708, time 610.02ms, mfu 0.58%
iter 340: loss 1.8565, time 610.62ms, mfu 0.59%
iter 350: loss 1.8206, time 609.90ms, mfu 0.59%
iter 360: loss 1.8515, time 610.06ms, mfu 0.59%
iter 370: loss 1.7844, time 608.82ms, mfu 0.59%
iter 380: loss 1.8132, time 611.00ms, mfu 0.60%
iter 390: loss 1.7861, time 610.42ms, mfu 0.60%
iter 400: loss 1.7558, time 610.05ms, mfu 0.60%
iter 410: loss 1.7480, time 611.44ms, mfu 0.60%
iter 420: loss 1.7530, time 610.17ms, mfu 0.60%
iter 430: loss 1.7379, time 610.50ms, mfu 0.60%
iter 440: loss 1.7149, time 610.00ms, mfu 0.60%
iter 450: loss 1.7615, time 611.11ms, mfu 0.60%
iter 460: loss 1.6998, time 612.02ms, mfu 0.60%
iter 470: loss 1.7228, time 609.79ms, mfu 0.60%
iter 480: loss 1.7109, time 609.83ms, mfu 0.61%
iter 490: loss 1.6982, time 609.95ms, mfu 0.61%
step 500: train loss 1.6007, val loss 1.6705
saving checkpoint to out-asongoficeandfire-char
iter 500: loss 1.6952, time 165280.89ms, mfu 0.55%
iter 510: loss 1.6745, time 609.47ms, mfu 0.55%
iter 520: loss 1.6473, time 610.12ms, mfu 0.56%
iter 530: loss 1.6507, time 609.41ms, mfu 0.56%
iter 540: loss 1.6124, time 610.71ms, mfu 0.57%
iter 550: loss 1.6553, time 611.15ms, mfu 0.57%
iter 560: loss 1.6538, time 610.98ms, mfu 0.58%
iter 570: loss 1.6473, time 610.84ms, mfu 0.58%
iter 580: loss 1.6162, time 611.67ms, mfu 0.58%
iter 590: loss 1.6045, time 609.62ms, mfu 0.59%
iter 600: loss 1.5918, time 609.71ms, mfu 0.59%
iter 610: loss 1.6271, time 610.66ms, mfu 0.59%
iter 620: loss 1.5555, time 609.99ms, mfu 0.59%
iter 630: loss 1.5488, time 611.31ms, mfu 0.59%
iter 640: loss 1.5651, time 609.69ms, mfu 0.60%
iter 650: loss 1.5554, time 609.54ms, mfu 0.60%
iter 660: loss 1.5601, time 610.16ms, mfu 0.60%
iter 670: loss 1.5758, time 611.34ms, mfu 0.60%
iter 680: loss 1.5601, time 609.27ms, mfu 0.60%
iter 690: loss 1.5667, time 610.65ms, mfu 0.60%
iter 700: loss 1.5440, time 609.80ms, mfu 0.60%
iter 710: loss 1.5302, time 611.40ms, mfu 0.60%
iter 720: loss 1.5103, time 611.30ms, mfu 0.60%
iter 730: loss 1.5121, time 610.90ms, mfu 0.61%
iter 740: loss 1.5333, time 609.31ms, mfu 0.61%
step 750: train loss 1.4190, val loss 1.4961
saving checkpoint to out-asongoficeandfire-char
iter 750: loss 1.5197, time 165522.42ms, mfu 0.55%
iter 760: loss 1.5246, time 612.17ms, mfu 0.55%
iter 770: loss 1.4775, time 609.98ms, mfu 0.56%
iter 780: loss 1.4737, time 610.42ms, mfu 0.56%
iter 790: loss 1.4944, time 609.79ms, mfu 0.57%
iter 800: loss 1.4577, time 609.83ms, mfu 0.57%
iter 810: loss 1.4758, time 610.02ms, mfu 0.58%
iter 820: loss 1.4741, time 610.93ms, mfu 0.58%
iter 830: loss 1.4691, time 611.01ms, mfu 0.58%
iter 840: loss 1.4412, time 609.83ms, mfu 0.59%
iter 850: loss 1.4609, time 609.65ms, mfu 0.59%
iter 860: loss 1.4543, time 610.71ms, mfu 0.59%
iter 870: loss 1.4411, time 611.09ms, mfu 0.59%
iter 880: loss 1.4776, time 610.08ms, mfu 0.59%
iter 890: loss 1.4602, time 609.96ms, mfu 0.60%
iter 900: loss 1.4391, time 609.54ms, mfu 0.60%
iter 910: loss 1.4291, time 610.17ms, mfu 0.60%
iter 920: loss 1.4404, time 609.80ms, mfu 0.60%
iter 930: loss 1.4114, time 611.21ms, mfu 0.60%
iter 940: loss 1.4511, time 610.55ms, mfu 0.60%
iter 950: loss 1.4100, time 610.46ms, mfu 0.60%
iter 960: loss 1.3927, time 609.98ms, mfu 0.60%
iter 970: loss 1.4331, time 610.41ms, mfu 0.60%
iter 980: loss 1.4205, time 610.22ms, mfu 0.61%
iter 990: loss 1.4186, time 610.06ms, mfu 0.61%
step 1000: train loss 1.3195, val loss 1.3950
saving checkpoint to out-asongoficeandfire-char
iter 1000: loss 1.4148, time 165361.70ms, mfu 0.55%
iter 1010: loss 1.3690, time 610.01ms, mfu 0.55%
iter 1020: loss 1.3649, time 610.65ms, mfu 0.56%
iter 1030: loss 1.3518, time 609.17ms, mfu 0.56%
iter 1040: loss 1.3759, time 609.26ms, mfu 0.57%
iter 1050: loss 1.3811, time 610.64ms, mfu 0.57%
iter 1060: loss 1.3744, time 609.81ms, mfu 0.58%
iter 1070: loss 1.3916, time 611.42ms, mfu 0.58%
iter 1080: loss 1.3897, time 610.35ms, mfu 0.58%
iter 1090: loss 1.3954, time 609.59ms, mfu 0.59%
iter 1100: loss 1.3731, time 609.40ms, mfu 0.59%
iter 1110: loss 1.3334, time 609.35ms, mfu 0.59%
iter 1120: loss 1.3699, time 610.22ms, mfu 0.59%
iter 1130: loss 1.3531, time 609.01ms, mfu 0.59%
iter 1140: loss 1.3506, time 611.00ms, mfu 0.60%
iter 1150: loss 1.3648, time 610.27ms, mfu 0.60%
iter 1160: loss 1.3725, time 610.03ms, mfu 0.60%
iter 1170: loss 1.3607, time 609.37ms, mfu 0.60%
iter 1180: loss 1.3461, time 609.58ms, mfu 0.60%
iter 1190: loss 1.3310, time 609.92ms, mfu 0.60%
iter 1200: loss 1.3456, time 610.99ms, mfu 0.60%
iter 1210: loss 1.3307, time 609.77ms, mfu 0.60%
iter 1220: loss 1.3360, time 609.41ms, mfu 0.60%
iter 1230: loss 1.2786, time 609.96ms, mfu 0.61%
iter 1240: loss 1.3022, time 610.78ms, mfu 0.61%

测试推理

测试一下:

python sample.py --out_dir=out-asongoficeandfire-char

输出:

Overriding: out_dir = out-asongoficeandfire-char
number of parameters: 10.66M
Loading meta from data/asongoficeandfire_char/meta.pkl...

I am heard on it.” Jon Stark said of the girls were to small
and come to take him.
“I your lord is it,” said Landing Robert.
“Not return to be better in Bran.” He had looked the long house with a crew castle and for the walls the
Greatjon, basing a wap and glance snaked. “I’ll leave not one of for
the king. The spriy
have a silk and king the anger of the three must changey be might gone of young like table the smurden trade her across
with the last streets and dragons and glover grown of them. T
---------------

“You will wake a frage,” Renly paced and crying down his green playing, “I’ll feel any with
the gods are safe shiping of now, no mide. It was it was so still be that was a man would take
doors that he did, but should even no changed to raid it was alone. I was gone. The mounts of the from the
arch men with Joffrey Tower. And we’ll take her an others to the rollen the three commands
and her cheeks and days they were to like my things. Bed, he looked to have implains to be about the blood in the M
---------------

“The Coakfast keep . . . . . and would be a brother strength was a dare and freer starting her brother as she tried. Jaime
was no trule gods of their chacks moving behind the Two Mandon Toad and taste as he would never ask him. The king do not missand
be after something. She could ever talk and glance the man had liked his hands understanded the knight.
There was the death boars, and five her swords and looked toward the shadows and the floor the white stroke of the bottom the busy of
the wildli
---------------

the summer was brooked with Rakhaart after Ser Mormont’s
Calba Grace at the step of the Kingsguards were enough. The lates of the others and their running of Hound’s
Hands or the keeping to grave. “As well,” said Lord Tywin. “I can kill me.”
“We mean or see their lumb and know some legs of the fire, silver the boy of the gates of leather and should treat a cappenry
keep as he thought, and he realized at on her for her lords that thousand the knights. I would never see to way an old remembers for
---------------

you between us to lovely say that your east the court, even I do not kill me at the queen.
What is of him the Night’s Watchfather and some burning this Grace had ever heard.”
“Maybe as she would have leaving him leaving her whip?”
Jon said and turned away from his keep, his long to her head to pull her father made her lip. “Be other brothers have the knife.”
By the sung her mouth to his shumber face, but felt her shouting. The Wall, and whatever he’ll be to put it. If you was had come when there
---------------

Do you know it rang to be exile so go of his times autiful and it, he was looking from his feet singers not had been to brid
or spear and more his feet part.
But he had noticed to get a hand of the laughter of the strip of their red houses of his heads and looked on the great
fire. The others could not sure that them int little stair of the woods. Their bloodride
that mare stood it as he wondered to Jojen as the vasi walked once the moonle with a pnoches of thick of them
north, and the bridge fo
---------------

he’d heard us, and yield. “Lord Maester Pycelle reach that I can make you know the wine of your lives, must have a beat of the
Watch? I will say him as well. I have ever been them to be both the king’s black of the murmies, silvers are on my
brothers, or the black when they could be refused to see his fleet blood and drumblinked her brothers. I
should not uneath the wish cover in his house.”
Joffrey glanced them a litter in a skfleet. He was a knife of the captains of the sall of the
greatjons a
---------------

of the gaarious for a letter. Yet by so much out of the well, you are the party
man said. I will take your me down by us. You have your been looked the doubless of the circles are the king and
to wear that the cellar the fors in the Watch, then he had could sent to look and shoats, and the soat of the great no
accountain of the brothers. Some was said the march as he would are carefully, but it was the ravens of his
feet. Jon relicacted his tears. “I want steels have the Girld Harrenhal and blun
---------------

there’s hand, he two realized it was come and someone on his chairs. “Why are my brother?”
“The southman were never singing for a breather brothers.”
“It was. But it is a sume good commands . . . . . but you saw not he was.”
“Les Lord Tywen . . . . are you like until me. Ser Old Baros had died not seen that I am not alive that is a
poeth. He was one as well, the queen same to be bastard, could need to look be more for than I am well.”
“The old man sooner saw.” The black sound put his softly.
The
---------------

The stonest was that were from the king life. He told his father. If I had done not brought of
it if the nurse, she was in the boy. You mean with that traitors and bring with sellswords were that had died there never taste to tell
of the tears. Around Arya had been Aeron the houses of the wildling was on the life. A moment enough the sloud hands faded,
and mere the gate of the throne south of his coins. The city day said as he well he saw her, “and a
bastard guard that brother is distant away he
---------------

同样可以发现,由于是基于字符的训练,所以模型是根据前一个字母推测下一个字母,因此可以看到大部分单词的拼写没什么问题,但是单词和单词之间的语法,以及句子的逻辑是不通顺的。

训练一个基于 token 的小模型

预处理测试数据

基于上述原因,我们放弃字符级,改用 GPT-2 分词器,实现 token 级分词。
其实在 data/openwebtext/prepare.py 和 data/shakespeare/prepare.py 中有两个基于 GPT-2 的分词器,为了方便,我们让 AI 写一个基于 GPT-2 的分析器,如下:

"""
Custom Prepare Script for nanoGPT
基于 HazyResearch/flash-attention 的逻辑优化,支持大文件处理
"""

import os
import numpy as np
import tiktoken
from tqdm import tqdm

# ==========================================
# 1. 配置区 (请根据你的需求修改)
# ==========================================

# 输入文件名 (请确保你的txt文件重命名为此)
input_file_path = 'input.txt' 

# 编码方式: 
# - 'gpt2': 适合英文,词表大小 50257
# - 'cl100k_base': 适合中英混合 (如 GPT-4), 如果是中文请取消注释下面这行
enc = tiktoken.get_encoding("gpt2") 
# enc = tiktoken.get_encoding("cl100k_base") 

# 数据集切分比例 (0.9 = 90% 训练, 10% 验证)
train_ratio = 0.9 

# 写入磁盘时的批处理大小 (优化大文件写入速度)
total_batches = 128 

# ==========================================
# 2. 核心逻辑 (无需修改)
# ==========================================

def process_text(text, encoder):
    """将文本字符串转换为 Token ID 列表"""
    ids = encoder.encode_ordinary(text) # 忽略特殊 Token
    ids.append(encoder.eot_token)        # 添加结束符
    return ids

if __name__ == "__main__":
    # 检查输入文件
    if not os.path.exists(input_file_path):
        raise FileNotFoundError(f"未找到 {input_file_path}。请将你的文本文件放入此文件夹并重命名为 input.txt")

    print(f"正在读取 {input_file_path} ...")
    with open(input_file_path, 'r', encoding='utf-8') as f:
        data = f.read()
    print(f"原始数据长度: {len(data):,} 字符")

    # --- Tokenization (分词) ---
    print("正在 Tokenizing (这可能需要几分钟) ...")
    ids = process_text(data, enc)
    print(f"Tokenization 完成。共有 {len(ids):,} 个 Tokens。")

    # --- 划分数据集 ---
    split = int(train_ratio * len(ids))
    train_ids = ids[:split]
    val_ids = ids[split:]

    print(f"训练集 Tokens: {len(train_ids):,}")
    print(f"验证集 Tokens: {len(val_ids):,}")

    # --- 保存为 Binary (二进制文件) ---
    # 使用 memmap 优化大文件写入
    for split_name, token_ids in [("train", train_ids), ("val", val_ids)]:
        filename = f"{split_name}.bin"
        dtype = np.uint16 # GPT-2 的词表最大值 < 65536,所以可以用 2字节存储

        # 创建内存映射文件
        arr = np.memmap(filename, dtype=dtype, mode='w+', shape=(len(token_ids),))

        print(f"正在写入 {filename} ...")
        batch_size = len(token_ids) // total_batches
        with tqdm(total=len(token_ids), desc=f"写入进度") as pbar:
            for i in range(0, len(token_ids), batch_size):
                end = min(i + batch_size, len(token_ids))
                arr_batch = token_ids[i:end]
                arr[i:end] = arr_batch
                pbar.update(len(arr_batch))

        arr.flush() # 确保数据写入磁盘
        print(f"✅ {filename} 生成完毕")

    # --- 保存 Meta 信息 ---
    # 这一步告诉模型词表有多大
    meta = {
        'vocab_size': enc.n_vocab,
        'data_file': input_file_path
    }
    import pickle
    with open('meta.pkl', 'wb') as f:
        pickle.dump(meta, f)
    print(f"\n✅ 数据预处理完全成功!")
    print(f"   词表大小 (Vocab Size): {enc.n_vocab}")
    print(f"   现在你可以运行 train.py 开始训练了。")

分词预处理一下:

python3 prepare.py

输出:

正在读取 input.txt ...
原始数据长度: 5,661,468 字符
正在 Tokenizing (这可能需要几分钟) ...
Tokenization 完成。共有 1,601,891 个 Tokens。
训练集 Tokens: 1,441,701
验证集 Tokens: 160,190
正在写入 train.bin ...
写入进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1441701/1441701 [00:00<00:00, 45854210.27it/s]
✅ train.bin 生成完毕
正在写入 val.bin ...
写入进度: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 160190/160190 [00:00<00:00, 39929016.33it/s]
✅ val.bin 生成完毕

✅ 数据预处理完全成功!
   词表大小 (Vocab Size): 50257
   现在你可以运行 train.py 开始训练了。

预处理完,trainer.bin(2.9MB)和 val.bin(320KB)比基于字符分词的要小(10.2MB 和 1.1MB)。

训练数据

复制一份配置文件:

cp config/train_asongoficeandfire_char.py config/train_asongoficeandfire_word.py

修改如下地方:

4c4
< out_dir = 'out-asongoficeandfire-char'
---
> out_dir = 'out-asongoficeandfire-word'
13c13
< wandb_project = 'asongoficeandfire-char'
---
> wandb_project = 'asongoficeandfire-word'
16c16
< dataset = 'asongoficeandfire_char'
---
> dataset = 'asongoficeandfire_word'

6 个隐藏层,6 个注意力头,384 个词嵌入维度,dropout 0.2,配置信息保存在 config/train_asongoficeandfire_word.py 中。
训练:

python3 train.py config/train_asongoficeandfire_word.py 

输出:

Overriding config with config/train_asongoficeandfire_word.py:
# train a miniature character-level shakespeare model
# good for debugging and playing on macbooks and such

out_dir = 'out-asongoficeandfire-word'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

# we expect to overfit on this small dataset, so only save when val improves
always_save_checkpoint = False

wandb_log = False # override via command line if you like
wandb_project = 'asongoficeandfire-word'
wandb_run_name = 'mini-gpt'

dataset = 'asongoficeandfire_word'
gradient_accumulation_steps = 1
batch_size = 64
block_size = 256 # context of up to 256 previous characters

# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

warmup_iters = 100 # not super necessary potentially

# on macbook also add
# device = 'cpu'  # run on cpu only
# compile = False # do not torch compile the model

tokens per iteration will be: 16,384
found vocab_size = 50257 (inside data/asongoficeandfire_word/meta.pkl)
Initializing a new model from scratch
number of parameters: 29.92M
nanoGPT/train.py:197: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  scaler = torch.cuda.amp.GradScaler(enabled=(dtype == 'float16'))
/opt/miniconda3/lib/python3.13/site-packages/torch/cuda/amp/grad_scaler.py:31: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  super().__init__(
num decayed parameter tensors: 26, with 30,013,824 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: False
compiling the model... (takes a ~minute)
W0502 21:01:29.912000 59938 site-packages/torch/_inductor/utils.py:1679] [0/0] Not enough SMs to use max_autotune_gemm mode
step 0: train loss 10.8760, val loss 10.8801
iter 0: loss 10.8862, time 243357.58ms, mfu -100.00%
iter 10: loss 9.8659, time 1214.07ms, mfu 0.81%
iter 20: loss 8.8830, time 1218.86ms, mfu 0.81%
iter 30: loss 7.8094, time 1217.19ms, mfu 0.81%
iter 40: loss 6.7248, time 1218.30ms, mfu 0.81%
iter 50: loss 6.1576, time 1218.33ms, mfu 0.81%
iter 60: loss 5.8677, time 1217.64ms, mfu 0.81%
iter 70: loss 5.7128, time 1216.23ms, mfu 0.81%
iter 80: loss 5.5774, time 1217.37ms, mfu 0.81%
iter 90: loss 5.3840, time 1220.09ms, mfu 0.81%
iter 100: loss 5.2627, time 1218.38ms, mfu 0.81%
iter 110: loss 5.1833, time 1229.76ms, mfu 0.80%
iter 120: loss 5.0745, time 1224.98ms, mfu 0.80%
iter 130: loss 4.9598, time 1223.08ms, mfu 0.80%
iter 140: loss 4.9583, time 1251.08ms, mfu 0.80%
iter 150: loss 4.7546, time 1223.48ms, mfu 0.80%
iter 160: loss 4.7645, time 1221.72ms, mfu 0.80%
iter 170: loss 4.7903, time 1217.24ms, mfu 0.80%
iter 180: loss 4.6545, time 1216.54ms, mfu 0.80%
iter 190: loss 4.6390, time 1217.79ms, mfu 0.80%
iter 200: loss 4.5954, time 1221.09ms, mfu 0.80%
iter 210: loss 4.4918, time 1218.72ms, mfu 0.80%
iter 220: loss 4.4831, time 1218.68ms, mfu 0.80%
iter 230: loss 4.4957, time 1216.23ms, mfu 0.80%
iter 240: loss 4.4754, time 1213.56ms, mfu 0.80%
step 250: train loss 4.3560, val loss 4.5743
saving checkpoint to out-asongoficeandfire-word
iter 250: loss 4.4230, time 239420.26ms, mfu 0.72%
iter 260: loss 4.3790, time 1274.24ms, mfu 0.73%
iter 270: loss 4.3752, time 1216.32ms, mfu 0.74%
iter 280: loss 4.3254, time 1216.51ms, mfu 0.74%
iter 290: loss 4.4234, time 1220.50ms, mfu 0.75%
iter 300: loss 4.2884, time 1223.03ms, mfu 0.75%
iter 310: loss 4.2701, time 1222.24ms, mfu 0.76%
iter 320: loss 4.1601, time 1217.64ms, mfu 0.76%
iter 330: loss 4.2262, time 1218.16ms, mfu 0.77%
iter 340: loss 4.1729, time 1218.43ms, mfu 0.77%
iter 350: loss 4.2060, time 1218.74ms, mfu 0.77%
iter 360: loss 4.1326, time 1219.13ms, mfu 0.78%
iter 370: loss 4.1302, time 1223.02ms, mfu 0.78%
iter 380: loss 4.1480, time 1218.02ms, mfu 0.78%
iter 390: loss 4.0678, time 1218.83ms, mfu 0.78%
iter 400: loss 4.1149, time 1215.53ms, mfu 0.79%
iter 410: loss 4.0053, time 1221.02ms, mfu 0.79%
iter 420: loss 3.9902, time 1220.79ms, mfu 0.79%
iter 430: loss 4.0605, time 1222.26ms, mfu 0.79%
iter 440: loss 4.1128, time 1221.18ms, mfu 0.79%
iter 450: loss 4.0154, time 1219.74ms, mfu 0.79%
iter 460: loss 4.0161, time 1218.40ms, mfu 0.79%
iter 470: loss 3.9588, time 1219.89ms, mfu 0.80%
iter 480: loss 3.9151, time 1217.58ms, mfu 0.80%
iter 490: loss 3.9258, time 1219.13ms, mfu 0.80%
step 500: train loss 3.7829, val loss 4.1120
saving checkpoint to out-asongoficeandfire-word
iter 500: loss 3.8910, time 239641.06ms, mfu 0.72%
iter 510: loss 3.9043, time 1216.04ms, mfu 0.73%
iter 520: loss 3.9171, time 1218.64ms, mfu 0.73%
iter 530: loss 3.8916, time 1218.84ms, mfu 0.74%
iter 540: loss 3.8134, time 1216.70ms, mfu 0.75%
iter 550: loss 3.8795, time 1216.86ms, mfu 0.75%
iter 560: loss 3.8487, time 1218.86ms, mfu 0.76%
iter 570: loss 3.8268, time 1216.00ms, mfu 0.76%
iter 580: loss 3.7648, time 1218.24ms, mfu 0.77%
iter 590: loss 3.7723, time 1220.87ms, mfu 0.77%
iter 600: loss 3.7328, time 1219.05ms, mfu 0.77%
iter 610: loss 3.8129, time 1214.36ms, mfu 0.78%
iter 620: loss 3.6499, time 1218.68ms, mfu 0.78%
iter 630: loss 3.7692, time 1216.56ms, mfu 0.78%
iter 640: loss 3.6257, time 1214.82ms, mfu 0.79%
iter 650: loss 3.7179, time 1217.31ms, mfu 0.79%
iter 660: loss 3.7129, time 1218.96ms, mfu 0.79%
iter 670: loss 3.7242, time 1215.93ms, mfu 0.79%
iter 680: loss 3.6647, time 1213.61ms, mfu 0.79%
iter 690: loss 3.5218, time 1219.88ms, mfu 0.79%
iter 700: loss 3.6957, time 1217.66ms, mfu 0.79%
iter 710: loss 3.5566, time 1219.21ms, mfu 0.80%
iter 720: loss 3.6315, time 1216.16ms, mfu 0.80%
iter 730: loss 3.6753, time 1216.13ms, mfu 0.80%
iter 740: loss 3.5701, time 1217.38ms, mfu 0.80%
step 750: train loss 3.4468, val loss 3.9381
saving checkpoint to out-asongoficeandfire-word
iter 750: loss 3.5816, time 239422.35ms, mfu 0.72%
iter 760: loss 3.5962, time 1222.57ms, mfu 0.73%
iter 770: loss 3.5349, time 1220.84ms, mfu 0.73%
iter 780: loss 3.4818, time 1219.64ms, mfu 0.74%
iter 790: loss 3.5222, time 1218.80ms, mfu 0.75%
iter 800: loss 3.5227, time 1216.45ms, mfu 0.75%
iter 810: loss 3.5749, time 1216.99ms, mfu 0.76%
iter 820: loss 3.4743, time 1219.79ms, mfu 0.76%
iter 830: loss 3.4716, time 1216.15ms, mfu 0.77%
iter 840: loss 3.4742, time 1215.82ms, mfu 0.77%
iter 850: loss 3.5067, time 1215.94ms, mfu 0.77%
iter 860: loss 3.4255, time 1215.81ms, mfu 0.78%
iter 870: loss 3.5976, time 1216.06ms, mfu 0.78%
iter 880: loss 3.4411, time 1216.96ms, mfu 0.78%
iter 890: loss 3.3929, time 1215.76ms, mfu 0.79%
iter 900: loss 3.3961, time 1214.37ms, mfu 0.79%
iter 910: loss 3.4196, time 1213.39ms, mfu 0.79%
iter 920: loss 3.3436, time 1217.24ms, mfu 0.79%
iter 930: loss 3.4141, time 1218.17ms, mfu 0.79%
iter 940: loss 3.3719, time 1215.94ms, mfu 0.79%
iter 950: loss 3.4000, time 1218.37ms, mfu 0.79%
iter 960: loss 3.4619, time 1216.28ms, mfu 0.80%
iter 970: loss 3.4361, time 1220.33ms, mfu 0.80%
iter 980: loss 3.3229, time 1215.80ms, mfu 0.80%
iter 990: loss 3.2980, time 1220.12ms, mfu 0.80%
step 1000: train loss 3.2038, val loss 3.8717
saving checkpoint to out-asongoficeandfire-word
iter 1000: loss 3.4372, time 239387.01ms, mfu 0.72%
iter 1010: loss 3.3171, time 1218.95ms, mfu 0.73%
iter 1020: loss 3.3237, time 1218.49ms, mfu 0.73%
iter 1030: loss 3.3314, time 1221.38ms, mfu 0.74%
iter 1040: loss 3.3154, time 1223.46ms, mfu 0.75%
iter 1050: loss 3.3465, time 1219.02ms, mfu 0.75%
iter 1060: loss 3.3474, time 1219.99ms, mfu 0.76%
iter 1070: loss 3.2950, time 1219.50ms, mfu 0.76%
iter 1080: loss 3.3415, time 1221.26ms, mfu 0.77%
iter 1090: loss 3.2443, time 1222.44ms, mfu 0.77%
iter 1100: loss 3.1759, time 1218.90ms, mfu 0.77%
iter 1110: loss 3.1530, time 1222.27ms, mfu 0.78%
iter 1120: loss 3.3256, time 1219.09ms, mfu 0.78%
iter 1130: loss 3.2775, time 1214.54ms, mfu 0.78%
iter 1140: loss 3.2037, time 1210.47ms, mfu 0.78%
iter 1150: loss 3.2356, time 1221.05ms, mfu 0.79%
iter 1160: loss 3.1805, time 1217.66ms, mfu 0.79%
iter 1170: loss 3.2078, time 1221.02ms, mfu 0.79%
iter 1180: loss 3.2546, time 1240.49ms, mfu 0.79%
iter 1190: loss 3.1477, time 1216.08ms, mfu 0.79%
iter 1200: loss 3.1249, time 1217.48ms, mfu 0.79%
iter 1210: loss 3.1963, time 1219.41ms, mfu 0.79%
iter 1220: loss 3.2699, time 1220.33ms, mfu 0.79%
iter 1230: loss 3.1415, time 1216.22ms, mfu 0.80%
iter 1240: loss 3.2103, time 1219.69ms, mfu 0.80%
step 1250: train loss 2.9864, val loss 3.8399
saving checkpoint to out-asongoficeandfire-word
iter 1250: loss 3.1314, time 239978.12ms, mfu 0.72%
iter 1260: loss 3.2490, time 1217.34ms, mfu 0.73%
iter 1270: loss 3.1089, time 1217.59ms, mfu 0.73%
iter 1280: loss 3.1353, time 1219.21ms, mfu 0.74%
iter 1290: loss 3.1785, time 1218.56ms, mfu 0.75%
iter 1300: loss 3.0794, time 1222.17ms, mfu 0.75%
iter 1310: loss 3.1320, time 1217.45ms, mfu 0.76%
iter 1320: loss 3.1536, time 1220.50ms, mfu 0.76%
iter 1330: loss 3.1026, time 1217.02ms, mfu 0.77%
iter 1340: loss 3.0790, time 1219.13ms, mfu 0.77%
iter 1350: loss 3.1776, time 1220.92ms, mfu 0.77%
iter 1360: loss 3.0509, time 1220.08ms, mfu 0.78%
iter 1370: loss 3.1196, time 1218.81ms, mfu 0.78%
iter 1380: loss 3.0505, time 1219.30ms, mfu 0.78%
iter 1390: loss 3.1347, time 1221.25ms, mfu 0.78%
iter 1400: loss 3.0347, time 1218.00ms, mfu 0.79%
iter 1410: loss 3.1021, time 1224.02ms, mfu 0.79%
iter 1420: loss 2.9946, time 1216.05ms, mfu 0.79%
iter 1430: loss 2.9607, time 1221.27ms, mfu 0.79%
iter 1440: loss 3.0189, time 1226.33ms, mfu 0.79%
iter 1450: loss 2.9887, time 1221.87ms, mfu 0.79%
iter 1460: loss 3.0309, time 1223.28ms, mfu 0.79%
iter 1470: loss 3.0246, time 1223.72ms, mfu 0.79%
iter 1480: loss 3.0777, time 1220.84ms, mfu 0.79%
iter 1490: loss 3.1380, time 1221.98ms, mfu 0.80%
step 1500: train loss 2.7945, val loss 3.8313
saving checkpoint to out-asongoficeandfire-word
iter 1500: loss 3.0444, time 239611.16ms, mfu 0.72%
iter 1510: loss 2.9417, time 1219.76ms, mfu 0.73%
iter 1520: loss 3.0573, time 1220.22ms, mfu 0.73%
iter 1530: loss 2.9767, time 1219.78ms, mfu 0.74%
iter 1540: loss 3.0509, time 1214.57ms, mfu 0.75%
iter 1550: loss 2.9194, time 1221.96ms, mfu 0.75%
iter 1560: loss 2.9944, time 1222.66ms, mfu 0.76%
iter 1570: loss 2.9832, time 1220.64ms, mfu 0.76%
iter 1580: loss 3.0225, time 1222.05ms, mfu 0.77%
iter 1590: loss 2.9837, time 1222.72ms, mfu 0.77%
iter 1600: loss 2.9402, time 1223.84ms, mfu 0.77%
iter 1610: loss 2.9671, time 1221.10ms, mfu 0.78%
iter 1620: loss 2.9336, time 1221.88ms, mfu 0.78%
iter 1630: loss 3.0004, time 1222.10ms, mfu 0.78%
iter 1640: loss 2.8906, time 1223.27ms, mfu 0.78%
iter 1650: loss 2.9282, time 1222.39ms, mfu 0.78%
iter 1660: loss 2.8075, time 1221.15ms, mfu 0.79%
iter 1670: loss 2.9425, time 1224.41ms, mfu 0.79%
iter 1680: loss 2.9163, time 1221.47ms, mfu 0.79%
iter 1690: loss 2.9414, time 1220.89ms, mfu 0.79%
iter 1700: loss 2.8531, time 1219.95ms, mfu 0.79%
iter 1710: loss 2.8017, time 1222.28ms, mfu 0.79%
iter 1720: loss 2.8415, time 1220.39ms, mfu 0.79%
iter 1730: loss 2.8314, time 1219.93ms, mfu 0.79%
iter 1740: loss 2.8732, time 1219.91ms, mfu 0.80%
step 1750: train loss 2.6219, val loss 3.8762
iter 1750: loss 2.8946, time 238841.38ms, mfu 0.72%
iter 1760: loss 2.7920, time 1218.22ms, mfu 0.73%
iter 1770: loss 2.9544, time 1217.83ms, mfu 0.73%
iter 1780: loss 2.8100, time 1221.51ms, mfu 0.74%
iter 1790: loss 2.8872, time 1220.28ms, mfu 0.75%
iter 1800: loss 2.8606, time 1219.52ms, mfu 0.75%
iter 1810: loss 2.7917, time 1218.93ms, mfu 0.76%
iter 1820: loss 2.8478, time 1221.06ms, mfu 0.76%
iter 1830: loss 2.8200, time 1223.39ms, mfu 0.77%
iter 1840: loss 2.8573, time 1221.53ms, mfu 0.77%
iter 1850: loss 2.8329, time 1222.17ms, mfu 0.77%
iter 1860: loss 2.8456, time 1219.67ms, mfu 0.78%
iter 1870: loss 2.8026, time 1223.65ms, mfu 0.78%
iter 1880: loss 2.7411, time 1222.01ms, mfu 0.78%
iter 1890: loss 2.8552, time 1237.63ms, mfu 0.78%
iter 1900: loss 2.7899, time 1233.59ms, mfu 0.78%
iter 1910: loss 2.8381, time 1220.52ms, mfu 0.78%
iter 1920: loss 2.7655, time 1255.52ms, mfu 0.78%
iter 1930: loss 2.7850, time 1276.44ms, mfu 0.78%
iter 1940: loss 2.7768, time 1218.35ms, mfu 0.78%
iter 1950: loss 2.8220, time 1219.55ms, mfu 0.79%
iter 1960: loss 2.7753, time 1229.05ms, mfu 0.79%
iter 1970: loss 2.7558, time 1257.83ms, mfu 0.79%
iter 1980: loss 2.6966, time 1217.97ms, mfu 0.79%
iter 1990: loss 2.7610, time 1216.23ms, mfu 0.79%
step 2000: train loss 2.4743, val loss 3.9085
iter 2000: loss 2.6421, time 240170.97ms, mfu 0.71%
iter 2010: loss 2.7148, time 1223.96ms, mfu 0.72%

从字符级(char-level)升级到词级(token-level)后,词汇表从 65 个增加到 50257 个,训练学习变得更慢,可以看到 loss 降低到 2.7 后,val loss 不降反升。优化需要增加 n_layer、n_head、n_embd,考虑到 M4 Pro 的性能,先训练到这里,看看效果。
out-asongoficeandfire-word/ckpt.pt 达到 360MB,比基于字符分词的模型参数(129MB)要大

测试推理

这里 sample.py 要改一下(主要是 encode/decode 变了),直接用 AI 改好的:

"""
Sample from a trained model
"""
import os
import pickle
from contextlib import nullcontext
import torch
import tiktoken
from model import GPTConfig, GPT

# -----------------------------------------------------------------------------
init_from = 'resume' # either 'resume' (from an out_dir) or a gpt2 variant (e.g. 'gpt2-xl')
out_dir = 'out' # ignored if init_from is not 'resume'
start = "\n" # or "<|endoftext|>" or etc. Can also specify a file, use as: "FILE:prompt.txt"
num_samples = 10 # number of samples to draw
max_new_tokens = 500 # number of tokens generated in each sample
temperature = 0.8 # 1.0 = no change, < 1.0 = less random, > 1.0 = more random, in predictions
top_k = 200 # retain only the top_k most likely tokens, clamp others to have 0 probability
seed = 1337
device = 'mps' # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1', etc.
dtype = 'bfloat16' if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else 'float16' # 'float32' or 'bfloat16' or 'float16'
compile = False # use PyTorch 2.0 to compile the model to be faster
exec(open('configurator.py').read()) # overrides from command line or config file
# -----------------------------------------------------------------------------

torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cuda.matmul.allow_tf32 = True # allow tf32 on matmul
torch.backends.cudnn.allow_tf32 = True # allow tf32 on cudnn
device_type = 'cuda' if 'cuda' in device else 'cpu' # for later use in torch.autocast
ptdtype = {'float32': torch.float32, 'bfloat16': torch.bfloat16, 'float16': torch.float16}[dtype]
ctx = nullcontext() if device_type == 'cpu' else torch.amp.autocast(device_type=device_type, dtype=ptdtype)

# model
if init_from == 'resume':
    # init from a model saved in a specific directory
    ckpt_path = os.path.join(out_dir, 'ckpt.pt')
    checkpoint = torch.load(ckpt_path, map_location=device)
    gptconf = GPTConfig(**checkpoint['model_args'])
    model = GPT(gptconf)
    state_dict = checkpoint['model']
    unwanted_prefix = '_orig_mod.'
    for k,v in list(state_dict.items()):
        if k.startswith(unwanted_prefix):
            state_dict[k[len(unwanted_prefix):]] = state_dict.pop(k)
    model.load_state_dict(state_dict)
elif init_from.startswith('gpt2'):
    # init from a given GPT-2 model
    model = GPT.from_pretrained(init_from, dict(dropout=0.0))

model.eval()
model.to(device)
if compile:
    model = torch.compile(model) # requires PyTorch 2.0 (optional)

# look for the meta pickle in case it is available in the dataset folder
load_meta = False
if init_from == 'resume' and 'config' in checkpoint and 'dataset' in checkpoint['config']: # older checkpoints might not have these...
    meta_path = os.path.join('data', checkpoint['config']['dataset'], 'meta.pkl')
    load_meta = os.path.exists(meta_path)

if load_meta:
    print(f"Loading meta from {meta_path}...")
    with open(meta_path, 'rb') as f:
        meta = pickle.load(f)

    # 判断是新版的词级模型,还是旧版的字符级模型
    if 'vocab_size' in meta and meta['vocab_size'] > 1000:
        # 新版词级模型:直接使用 tiktoken (GPT-2) 分词器
        print("Using GPT-2 tokenizer for encoding/decoding...")
        enc = tiktoken.get_encoding("gpt2")
        encode = lambda s: enc.encode(s, allowed_special={"<|endoftext|>"})
        decode = lambda l: enc.decode(l)
    else:
        # 旧版字符级模型:依然使用 stoi/itos 映射表
        print("Using character-level mapping for encoding/decoding...")
        stoi, itos = meta['stoi'], meta['itos']
        encode = lambda s: [stoi[c] for c in s]
        decode = lambda l: ''.join([itos[i] for i in l])
else:
    # 如果没有找到 meta.pkl,默认使用 GPT-2 编码
    print("No meta.pkl found, assuming GPT-2 encodings...")
    enc = tiktoken.get_encoding("gpt2")
    encode = lambda s: enc.encode(s, allowed_special={"<|endoftext|>"})
    decode = lambda l: enc.decode(l)

# encode the beginning of the prompt
if start.startswith('FILE:'):
    with open(start[5:], 'r', encoding='utf-8') as f:
        start = f.read()
start_ids = encode(start)
x = (torch.tensor(start_ids, dtype=torch.long, device=device)[None, ...])

# run generation
with torch.no_grad():
    with ctx:
        for k in range(num_samples):
            y = model.generate(x, max_new_tokens, temperature=temperature, top_k=top_k)
            print(decode(y[0].tolist()))
            print('---------------')

运行:

python3 sample_word.py --out_dir=out-asongoficeandfire-word

输出:

Overriding: out_dir = out-asongoficeandfire-word
number of parameters: 29.92M
Loading meta from data/asongoficeandfire_word/meta.pkl...
Using GPT-2 tokenizer for encoding/decoding...

only light.”
“You should die,” Sansa said.
“I believe we were speaking of.” Ned gave her a defiant look. “You are no true knight now, sister,
and I am still a king.”
“You are very well,” Sansa said.
“Eddard Stark,” Ned Stark said. Only a few days later, he had been a great
man grown, and they seemed to be the only one told. “You Starks have a
strong task for you, my lord.”
“I think.” Robert’s voice was not very quiet, but the king was not a man, a
fool, so quick to see it. “The queen is not a knight. Please. I thought you were the queen. I was only
speaking to you, Ned, Robert.”
Tyrion wanted to laugh. “I am sorry for you, Sansa.”
“Will you agree?”
“No,” Ned said, as if he could not hear her. He would never say what he would
do to her. “Father, I must do.”
“You are not your lord” the queen said. “I don’t want to say that. Someone’s safe.”
He looked at her. “Perhaps you are a prince. I am!”
“I can make you a good voice.”
“A girl,” Sansa said. “It was not me who sees, sweetling. She has no wish.”
“I’m not a traitor.” And I could hear her voice coming to her, Varys turned his eyes. Her father’s eyes were full of
earth. “Moreo who knows.”
“Your Grace?” Cersei was confused at her, then had been a man’s truth. ‘I am not a king,” he said.
“I have been sick of honor,” murmured Varys. “You murdered Joffrey, Varys.For a moment Joffrey seized her arm. “Come
---------------

too noble to do this folly.”
“A good thing.” Ser Boros touched her wrist. “I should not do nothing, my lady. My lord
prince is a knight, and I am the only Hand of the King’s Landing.”
“So you’ll learn, my lady. You’ll be so good as my father had the honor of the
Kingsguard, so I’m sorry I’m not so.”
“You’ll hear me.” She was so happy. “Why should I say you’d
cannot take me for a information?”
“To the godswood,” Ser Jorah promised. “You’re drinking with
me, my lord. You’ll know that.”
“We’ve never been seen with these patrols.” Her brother Viserys was the only ones he’d
dried. He had not trusted the whore. “The Lamb Men have been lost.”
“I was named Quaro taught him,” Dany said. “I was promised to go after them.”
“I said she liked,” Ser Jorah said. “I saw the dragon’s dragon . . . . what do you want to do?”
“I am not khaleesi,” Khal Drogo said, “not to be the stallion who mounts the world.” He let
his face move.
“Let my son go with Drogo.”
“With him,” the knight said, “you do not understand.”
“I will,” Dany admitted. “He will not take no wife. The khalasar is yours.”
“The Western Market will not be some small cities to hear me.”
“Drogon will have need.” Dany put a hand on her hand. “Tell her the gods help me to
where I am the children of the dragon.”
The maegi smiled. “The eggs do not hold no denying her.”
“Tell her that she is no man who sees you.
---------------

only if they’d give me the girl and I’d gladly need to.”
“I tell you I will.”
“You are a eunuch.”
Dany had to lie. The Qartheen had not touched her hair. “Though I do not understand what
you have permitted to see.”
“She was so sweet.”
“Sellswords are all carrying eggs,” Irri said. “I know she knew how to be, but I
did not trust her.”
“He was a boy,” the girl said. “I loved his father, and I would say, but now I
sometimes you would have known his son.”
“If he does not,” Dany said, “but how does he know what he is doing?”
“No,” she said. “Did you see your questions?”
“No,” Arstan agreed. “My word has changed.”
“When they had time, I will not find the khalasar.”
“What will we wait?” Dany said. “It is not the only one who has come,” she said.
Doreah was only a man’s son now, and now, no man’s friend. She had not known the chance she had ever
ever known any was saved.
Wothraki, she had said. “The traders from the Seven Kingdoms have been a new slaves, it is known.”
“They shall wait for the wound,” Ser Jorah said.
“A dragon will cut the maegi back to Pentos,” Ser Jorah said. “I have known it shall be
with you, that gift is not a slave, and so, and now you will not have no taste as you desire, but the
dragon is blood of what you will not do, not fear what you do not command. I swear it.”
Dany looked at her son. “There is no food nor wine,” she said, “but in this city, no voices will
see.”
“Then take her to
---------------

she had forgotten what was, yet she could not tell, though she would never understand the words she would say,
but it was the only part of it, and the words she had never heard. She thought of the godswood,
the stony sea on the wind. It was so bright that night, so soft and sad.
She heard the sound of sounds, the clas light, the distant sound of the
cripple of logs, the sound of a thousand mice under her feet.
The next day was settling down and the very window. Beyond was a blaze of flame, and a still
wisp of flame.
“Now go.”
“Sleep.” Arya began. She ran down the pool, and the wind awoke, around a low flame. Huge
plume were flying, and wet and over them, and she saw the dead walk. She woke again, run, sobbing.
The only sounds were gone. She heard her voice, a lot, a third hand, a woman’s. A heavy, a
shattered, with both hands and flesh. She could feel the blood in her hand.
She could hear the visions of men standing, the clash of arms. The torches were screaming and the screams.
The fear was gone, and the sound of her. She ran, the flames shining on them, her breath. She screamed, the
Dany had a torch that the torch went to her first, its voice pounding. It was all. They had brought them all the
stride her, and the fire coiled around her. She had no place to her, no more. The last she knew, she remembered, the
gargors of the Kingslayer; the Dothraki who had told her from her khalasar, to the Great Sept of Arryn, or the
tritting, and the rest of her fears. The rest came a long lonely dream.
The old man sat by the torch, her eyes shining on her. “There’s no need.”
“No,” the maester said. “I’m not a woman.”
Dany did not dreamt of her.
The speaker was so her brother could not say the words, though she could not say the
man.
“A short man here is,
---------------

“Did you find your guest?” Sansa asked him. “Come.”
She’d never noticed what the queen would say. “Why what is it that?”
“I never meant to.”
“No,” Sansa said, when the girl had been woken to look at her, but it wasn’t fair.
“Is it what she was.”
“It is a good prince to tell you,” Sansa said. “I think so, I know what you
had wanted. A queen wanted to lie, but the truth was part. After me, I said,
“I was not alone.”
“I don’t know,” Sansa reminded her. “I should have no fear to be before you.”
Sansa had never noticed.If you call me traitor’s king.”
Arya had not known what she had expected to say. She could not have known the
captain in the flames. The sun was bright pine and hawk, its smoke and salt and still.
The sun was strangely warm, the only light of night, but still . . . . . the shadows had stopped at the sound of a woman’s voice, each in
the flames.
Their voices followed, and voices followed as people, and the great wooden doors of a towering bronze-shaped tunnels,
leaving astride a corner and a great great pair of horses. “A dragon’s mount,” she said.
“A lovely one,” the Hound said, “and strong,” the knight said sullenly.
“That is the fate I am.”
“I’m not a knight,” Arya said firmly. “A man who mounts us, and now you’re the stupid king.” She put the head off
the shoulder and let go of it.
“You’re the Kingslayer.”
Gendry looked at her. He said nothing, “I am not a knight of the Night’s Watch.”
“He’s a knight,” the girl said. “The stall
---------------

and the thing she saw the sound of a distant slanting through the door. A faint puff of fire, she thought as she had watched
him go, to fill her fingers moving. He watched her hair, sniffed at the bones of the dragon.
She heard the sound of their vegetables. It was a ghost. The smells were
but the branches were scalding out, and the people were running. The clacked while Meera
caught of a bedside, but the sound was faded and going there was something in her breath. She could hear the
hulks of the grey mists, but the one with the sharp grey soot.
“Hodor?” Meera pointed.
She was uncertain, but the thing she could hear. She saw the dark and the red scent of
the thing. She did the faint taste of blood in the hot water. “Hodor” she said, the dead voice
screamed. “Hodor,” she said.
“Hodor!”
Meera came up. “Hodor.”
Meera followed with her frog spear. “Hodor, Jojen?”
“Bran said,” she shouted. “Hodor,” he said.
They could not say it, but they were no blood. “Hodor.”
“Hodor. Hodor,” Bran shouted. Meera was startled, her voice and
Bran started after a time, and the man who hugged her as he sat with her
ears.
“Hodor,” he said as he stepped between him.
“Hodor!” she shouted. “Hodor!”
“Hodor?”
The door was open. Bran knelt beside her. What had the door left the door?
“Hodor?” Bran asked.
“Hodor,” Bran said.
“Hodor,” he said.
Hodor hodorera nodded. She looked at the comet, but her sister was gone. Summer was sitting in her
wind, its brother Bran’s head dark and his father, Bran’s voice click.
“Hodor,” he called, blinking, his grin. “H
---------------

“I don’t.”
“Never,” she said, “but not me.”
“Yes it’s a lie.”
“We have a long journey, Lord Bolton and sister to marry the girls. You are your mother, so I
cannot afford to make the girl,” Brienne said.
“I have my answer, ser.”
“Ser Jaime.” Jaime took a mocking smile. “That was my son. He is a
que man, so he seems to me. You made it as much as you say.”
“I will.If she’d, she would have given him no interest in the wench. “What if you
do as well, I shall not expect you to. It seems good to me to be told. Even if you’re a king, you
know?”
“If you can.” Jaime could not believe that. Lord Tywin had not seen the Iron Throne after
Lannisters.”
“I can’t.” The king shook his head. “It’s a lie, my lord. . .”
“Do you think so?” His mouth twisted. “The wench, I suppose.”
“He plays us one of Harrenhal, for now.” Jaime put the hand on the blade. “She wouldn’t
know if she has.”
“She is not mine.”
“Why did you want no trouble?” Jaime laughed. “There’s you, Imp.”
“Well, I’m not!”
Arya frowned. “All I need to go. You’re safe with you, ser.”
“I don’t.” She took another step forward, and found it all the way to his other knights.
“You’re not a Lannister, Ned.”
“I have a traitor’s son,” Ser Meryn said. “She . . . .”
“Come.”
She nodded.
“A
---------------

with the children of the forest.
They descended the outer ward of the outer ward to the north before the
direwolf drove them down after them, through the long archers and spikes and bronze shields and
painted logs all around the river, and the river of the Gods. The river was a
empty sockets. The garrison was built by the smaller stone bridge where the men were falling,
they’d appeared on a stair, two-thirds of what could they make at once. Men. One
sore, but the fourth was a stout winched. Jon had told him that he’d
been his captive here, yet he was too cold. He was hungry enough to make its way to the Wall long and make the
torch.
The Wall was so solid and rocky, and the Old Bear did not want to hear the way.
The Magnar’s voice was pounding. A couple of the barricade came bouncing as the
rangers came down on the slope. Jon saw the horn through the ringwall and the barrels of oiled the
others. it was no larger than a mile across the Shadow Tower. Once there
dusk had been no place from the Wall, but that was hard to be seen. Chett was glad to hear the
man.
The Magnar had sent a hundred up into the Shadow Tower, but the Magnar was the wildlings now, or perhaps not
for an answer.
Qhorin dismounted. “The wildlings are gone, what do they have now?”
“I would,” said Mance. “The one of us, and the Bastard of the Night’s Watch is a wildlings. Why not? I’m
my wolf.”
Ghost had the same. He had a good horse, Jon could not say.
But, he saw the fire burning, and the logs was growing cold, so they had a
few hours before. The sound made him think of that he was going, his heart.
They will not be the other way, only a huge place. After a moment the moon was
found the Horn of the Night’s Watch, where the Night’s Watch stood, waiting. He was not where a boy stood,
he saw it. “I was not a prince,” said Ragwyle.
---------------

“I see.” Varys gave her a shrug. “I am not for you.”
“I am certain you are the eunuch.”
“I understand your sweet words,” Varys said. “So you should learn more.”
Sansa’s cheeks were mocking and easy, but she was nothing else to do. “I can carry you to the
Kingslayer.”
“Tyrion,” Tyrion said. “I know you’re being pleased to share my claim, from my father.”
“I have no doubt.” She looked at her sister. “My sister says it is.”
“Sister is a lie,” the eunuch replied. “I am not to you.”
“And what do you know of all?”
“I mean to enter the city,” Tyrion said. “And go see.” He stroked the pommel in
bed his belt, gave a stiff look. “You are fond of you, ser. I am no more than I am.”
He opened his mouth and swept it in front of the guard. “I am your Hand,” he said, “and I will not keep you sick.”
“My royal pardon, my lord.”
“You hear nothing, my lord.” The queen studied the dwarf’s face. “And did you understand? I have
your truth. You are very pleased. You have no need of you. You, Imp. You are well mistaken.”
“I fear I gave My father a question.” Tyrion said. “And you are the Lannisters, and I am a
cannot honor of it.”
“All you value,” Tyrion suggested.
“Lannister, my lord.” Joffrey’s voice was thick with rage. “He does not concern you, my lord,” he said.
“Must we get our meaning?”
“Certainly.” Tyrion turned his head. “Sods.” He glanced at the queen and
---------------

leaving him looking for a dog, not truly.
“I’m not going to trouble him, fool.” The ground was damp and cold. She turned back
to the ground, into the cold, but the blood stirred in her hand.
“The one he’ll have.”
“Quiet!” she shouted at him. He was a woman. He’d saved her, she realized. She tried to
only knock her down again, but for the long time she heard the soft sound of the water running
through the great hall, and the sound were the song. She remembered the other, the dead man who’d
roasted them, the dead and the dead dead and the Hound. She heard the fire, and the flames were running down the dead
blood with the scent of blood. Behind her the others. One of the others were
throwing, torches burning deep in the green flames. A few dead dogs juggled, and the Hound
crawled and the Hound’s stump was hacked and half a mile of the flames.
Ser Clegane and the Hound reined up and looked around, kicking, bounded forward.
“St!” the Hound shouted, the steel voice cracked. He grabbed the sword and stumbled, his sword lodged
down on the grass.
“Get up!” the horse reeled away. One shove, fell, and she leapt up a first. The
bathhouse with the other a torch, and the Hound swung her down again, and then her stick was
scrubbed.
“You won’t!” Jaqen H’ghar went on. “I’m afraid!”
Jhogo screamed. Dotho slammed her back into the saddle, dug her in her stance.
“I’m not!”
“The dragon!” the Hound took a step forward, yanking the blow from her hand, yanking
her aside. The blood slammed around her, bounced at her as the silk cut cuthed her, the shaft first
around her, so the blade caught the ground, and the next he could feel the
blood hole in her, jaws trembling, the pain in her arm, the terrible she did not stop she did.
“You�
---------------

可以看到,基本能看懂语法意思了(有点冰与火之歌的意思),个别段落还能比较流畅,但是语法和故事逻辑还是有不少错误的。

训练一个“能写” Python 函数的小模型

预处理测试数据

这次的测试数据用 huggingface 上的 Python 代码库:
https://huggingface.co/datasets/nitish26/python-codes-25k
下载下来是 26MB 的 json 文本,非常适合实验使用。
让 AI 仿照上述写一个预处理脚本:

import os
import json
import tiktoken
import numpy as np

# 1. 设置你的 JSON 文件路径
input_file_path = "python-codes-25k.json" 
# 2. 设置输出目录
data_dir = "./"
os.makedirs(data_dir, exist_ok=True)

# 3. 读取 JSON 文件并提取完整的“指令+代码”文本
print(f"正在读取 {input_file_path}...")
with open(input_file_path, "r", encoding="utf-8") as f:
    data = json.load(f) 

# 直接提取 'text' 字段,它包含了完整的自然语言指令和代码回答
# 用两个换行符将每一条“问答对”隔开,帮助模型区分不同的任务
all_text = "\n\n".join([item['text'] for item in data if 'text' in item])

print(f"提取完成,共找到 {len(data)} 条指令数据,总字符数: {len(all_text):,}")

# 4. 划分训练集和验证集(9:1 比例)
split_idx = int(len(all_text) * 0.9)
train_data = all_text[:split_idx]
val_data = all_text[split_idx:]

# 5. 使用 GPT-2 的分词器进行编码
print("正在使用 tiktoken (GPT-2) 进行分词编码...")
enc = tiktoken.get_encoding("gpt2")
train_ids = enc.encode_ordinary(train_data)
val_ids = enc.encode_ordinary(val_data)

print(f"训练集 token 数: {len(train_ids):,}")
print(f"验证集 token 数: {len(val_ids):,}")

# 6. 导出为 nanoGPT 需要的 .bin 二进制文件
train_ids = np.array(train_ids, dtype=np.uint16)
val_ids = np.array(val_ids, dtype=np.uint16)

train_ids.tofile(os.path.join(data_dir, "train.bin"))
val_ids.tofile(os.path.join(data_dir, "val.bin"))

print("✅ 预处理完成!已生成 train.bin 和 val.bin")

# === 补上保存 meta.pkl 的逻辑 ===
import pickle
meta = {
    'vocab_size': enc.n_vocab,  # 50257
}
with open(os.path.join(data_dir, "meta.pkl"), 'wb') as f:
    pickle.dump(meta, f)
print("✅ 顺手补上了 meta.pkl 文件!")

运行后生成 meta.pkl(元数据),train.bin 和 val.bin

训练数据

修改一下 config,其实就是把名字改一改:
train_python_code.py:

# train a miniature character-level shakespeare model
# good for debugging and playing on macbooks and such

out_dir = 'out-python-code'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

# we expect to overfit on this small dataset, so only save when val improves
always_save_checkpoint = False

wandb_log = False # override via command line if you like
wandb_project = 'python-code'
wandb_run_name = 'mini-gpt'

dataset = 'python_code'
gradient_accumulation_steps = 1
batch_size = 64
block_size = 256 # context of up to 256 previous characters

# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

warmup_iters = 100 # not super necessary potentially

# on macbook also add
# device = 'cpu'  # run on cpu only
# compile = False # do not torch compile the model

训练:

python3 train.py config/train_python_code.py 

输出:

Overriding config with config/train_python_code.py:
# train a miniature character-level shakespeare model
# good for debugging and playing on macbooks and such

out_dir = 'out-python-code'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

# we expect to overfit on this small dataset, so only save when val improves
always_save_checkpoint = False

wandb_log = False # override via command line if you like
wandb_project = 'python-code'
wandb_run_name = 'mini-gpt'

dataset = 'python_code'
gradient_accumulation_steps = 1
batch_size = 64
block_size = 256 # context of up to 256 previous characters

# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

warmup_iters = 100 # not super necessary potentially

# on macbook also add
# device = 'cpu'  # run on cpu only
# compile = False # do not torch compile the model

tokens per iteration will be: 16,384
found vocab_size = 50257 (inside data/python_code/meta.pkl)
Initializing a new model from scratch
number of parameters: 29.92M
nanoGPT/train.py:197: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  scaler = torch.cuda.amp.GradScaler(enabled=(dtype == 'float16'))
/opt/miniconda3/lib/python3.13/site-packages/torch/cuda/amp/grad_scaler.py:31: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  super().__init__(
num decayed parameter tensors: 26, with 30,013,824 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: False
compiling the model... (takes a ~minute)
step 0: train loss 10.7793, val loss 10.8267
iter 0: loss 10.7861, time 239706.54ms, mfu -100.00%
iter 10: loss 9.3477, time 1219.15ms, mfu 0.80%
iter 20: loss 8.3337, time 1214.03ms, mfu 0.80%
iter 30: loss 7.1077, time 1220.69ms, mfu 0.80%
iter 40: loss 6.0742, time 1236.03ms, mfu 0.80%
iter 50: loss 5.5654, time 1215.99ms, mfu 0.80%
iter 60: loss 5.2489, time 1214.73ms, mfu 0.80%
iter 70: loss 5.0096, time 1218.46ms, mfu 0.80%
iter 80: loss 4.5490, time 1212.02ms, mfu 0.80%
iter 90: loss 4.6600, time 1216.09ms, mfu 0.80%
iter 100: loss 4.1386, time 1219.69ms, mfu 0.80%
iter 110: loss 4.2563, time 1238.24ms, mfu 0.80%
iter 120: loss 3.9277, time 1214.56ms, mfu 0.80%
iter 130: loss 3.6831, time 1217.92ms, mfu 0.80%
iter 140: loss 3.8996, time 1292.02ms, mfu 0.80%
iter 150: loss 3.7387, time 1268.35ms, mfu 0.80%
iter 160: loss 3.8899, time 1212.16ms, mfu 0.80%
iter 170: loss 3.6169, time 1218.79ms, mfu 0.80%
iter 180: loss 3.5780, time 1215.62ms, mfu 0.80%
iter 190: loss 3.2543, time 1212.00ms, mfu 0.80%
iter 200: loss 3.3011, time 1213.94ms, mfu 0.80%
iter 210: loss 3.1616, time 1215.83ms, mfu 0.80%
iter 220: loss 3.1943, time 1230.21ms, mfu 0.80%
iter 230: loss 3.2653, time 1216.22ms, mfu 0.80%
iter 240: loss 3.0381, time 1212.66ms, mfu 0.80%
step 250: train loss 2.9494, val loss 4.0694
saving checkpoint to out-python-code
iter 250: loss 3.1967, time 239393.29ms, mfu 0.72%
iter 260: loss 3.1204, time 1216.84ms, mfu 0.73%
iter 270: loss 2.8988, time 1213.49ms, mfu 0.74%
iter 280: loss 2.8906, time 1216.71ms, mfu 0.74%
iter 290: loss 2.8849, time 1216.99ms, mfu 0.75%
iter 300: loss 2.8230, time 1217.22ms, mfu 0.76%
iter 310: loss 2.8042, time 1216.48ms, mfu 0.76%
iter 320: loss 2.7609, time 1217.15ms, mfu 0.77%
iter 330: loss 2.7662, time 1215.22ms, mfu 0.77%
iter 340: loss 2.6368, time 1219.27ms, mfu 0.77%
iter 350: loss 2.4914, time 1219.02ms, mfu 0.78%
iter 360: loss 2.7829, time 1218.14ms, mfu 0.78%
iter 370: loss 2.5899, time 1215.17ms, mfu 0.78%
iter 380: loss 2.4796, time 1213.71ms, mfu 0.78%
iter 390: loss 2.6549, time 1214.28ms, mfu 0.79%
iter 400: loss 2.6193, time 1220.17ms, mfu 0.79%
iter 410: loss 2.4214, time 1221.40ms, mfu 0.79%
iter 420: loss 2.4250, time 1214.87ms, mfu 0.79%
iter 430: loss 2.3767, time 1216.54ms, mfu 0.79%
iter 440: loss 2.5230, time 1214.59ms, mfu 0.79%
iter 450: loss 2.3895, time 1220.67ms, mfu 0.79%
iter 460: loss 2.5208, time 1218.14ms, mfu 0.80%
iter 470: loss 2.5703, time 1222.22ms, mfu 0.80%
iter 480: loss 2.1805, time 1223.85ms, mfu 0.80%
iter 490: loss 2.3435, time 1234.36ms, mfu 0.80%
step 500: train loss 2.1799, val loss 3.4116
saving checkpoint to out-python-code
iter 500: loss 2.2573, time 239457.90ms, mfu 0.72%
iter 510: loss 2.1744, time 1216.50ms, mfu 0.73%
iter 520: loss 2.2640, time 1218.48ms, mfu 0.73%
iter 530: loss 2.2704, time 1221.80ms, mfu 0.74%
iter 540: loss 2.0361, time 1229.53ms, mfu 0.75%
iter 550: loss 2.0364, time 1242.02ms, mfu 0.75%
iter 560: loss 2.2011, time 1246.78ms, mfu 0.75%
iter 570: loss 2.2071, time 1241.98ms, mfu 0.76%
iter 580: loss 2.3284, time 1243.93ms, mfu 0.76%
iter 590: loss 2.1177, time 1266.51ms, mfu 0.76%
iter 600: loss 2.0925, time 1244.68ms, mfu 0.76%
iter 610: loss 2.1158, time 1239.27ms, mfu 0.77%
iter 620: loss 2.1661, time 1239.73ms, mfu 0.77%
iter 630: loss 2.0842, time 1236.60ms, mfu 0.77%
iter 640: loss 2.1603, time 1248.84ms, mfu 0.77%
iter 650: loss 2.1734, time 1237.63ms, mfu 0.77%
iter 660: loss 2.0196, time 1233.84ms, mfu 0.78%
iter 670: loss 1.9922, time 1257.88ms, mfu 0.78%
iter 680: loss 2.1696, time 1234.99ms, mfu 0.78%
iter 690: loss 1.9557, time 1240.44ms, mfu 0.78%
iter 700: loss 2.2244, time 1233.64ms, mfu 0.78%
iter 710: loss 2.1266, time 1235.39ms, mfu 0.78%
iter 720: loss 2.2213, time 1255.72ms, mfu 0.78%
iter 730: loss 1.9278, time 1230.84ms, mfu 0.78%
iter 740: loss 1.9086, time 1246.30ms, mfu 0.78%
step 750: train loss 1.8361, val loss 3.1278
saving checkpoint to out-python-code
iter 750: loss 1.8374, time 239818.08ms, mfu 0.71%
iter 760: loss 1.9920, time 1251.96ms, mfu 0.71%
iter 770: loss 2.0670, time 1274.91ms, mfu 0.72%
iter 780: loss 1.9176, time 1257.00ms, mfu 0.73%
iter 790: loss 1.9943, time 1259.50ms, mfu 0.73%
iter 800: loss 1.6689, time 1293.99ms, mfu 0.73%
iter 810: loss 1.9164, time 1256.00ms, mfu 0.74%
iter 820: loss 2.0012, time 1250.59ms, mfu 0.74%
iter 830: loss 1.8521, time 1249.32ms, mfu 0.75%
iter 840: loss 1.9876, time 1246.19ms, mfu 0.75%
iter 850: loss 1.9623, time 1256.41ms, mfu 0.75%
iter 860: loss 1.9844, time 1244.64ms, mfu 0.76%
iter 870: loss 1.9386, time 1243.93ms, mfu 0.76%
iter 880: loss 1.8143, time 1273.17ms, mfu 0.76%
iter 890: loss 1.8397, time 1251.80ms, mfu 0.76%
iter 900: loss 1.8641, time 1256.71ms, mfu 0.76%
iter 910: loss 1.8658, time 1272.21ms, mfu 0.77%
iter 920: loss 1.8713, time 1252.74ms, mfu 0.77%
iter 930: loss 1.8428, time 1261.50ms, mfu 0.77%
iter 940: loss 1.8306, time 1257.28ms, mfu 0.77%
iter 950: loss 1.7205, time 1254.51ms, mfu 0.77%
iter 960: loss 1.8460, time 1285.73ms, mfu 0.77%
iter 970: loss 1.7749, time 1265.18ms, mfu 0.77%
iter 980: loss 1.7887, time 1258.12ms, mfu 0.77%
iter 990: loss 1.7690, time 1318.08ms, mfu 0.77%
step 1000: train loss 1.6178, val loss 2.9405
saving checkpoint to out-python-code
iter 1000: loss 1.7034, time 242701.74ms, mfu 0.69%
iter 1010: loss 1.7184, time 1239.53ms, mfu 0.70%
iter 1020: loss 1.7096, time 1235.13ms, mfu 0.71%
iter 1030: loss 1.6675, time 1237.48ms, mfu 0.72%
iter 1040: loss 1.7915, time 1242.72ms, mfu 0.73%
iter 1050: loss 1.8356, time 1241.95ms, mfu 0.73%
iter 1060: loss 1.8614, time 1239.01ms, mfu 0.74%
iter 1070: loss 1.6073, time 1265.90ms, mfu 0.74%
iter 1080: loss 1.7846, time 1248.40ms, mfu 0.75%
iter 1090: loss 1.6402, time 1253.67ms, mfu 0.75%
iter 1100: loss 1.7211, time 1253.39ms, mfu 0.75%
iter 1110: loss 1.6349, time 1247.26ms, mfu 0.76%
iter 1120: loss 1.8279, time 1256.70ms, mfu 0.76%
iter 1130: loss 1.6324, time 1247.82ms, mfu 0.76%
iter 1140: loss 1.8154, time 1246.73ms, mfu 0.76%
iter 1150: loss 1.5954, time 1271.82ms, mfu 0.76%
iter 1160: loss 1.4909, time 1249.22ms, mfu 0.77%
iter 1170: loss 1.6988, time 1247.70ms, mfu 0.77%
iter 1180: loss 1.6767, time 1260.03ms, mfu 0.77%
iter 1190: loss 1.6486, time 1248.72ms, mfu 0.77%
iter 1200: loss 1.6417, time 1255.72ms, mfu 0.77%
iter 1210: loss 1.7250, time 1243.36ms, mfu 0.77%
iter 1220: loss 1.7021, time 1246.26ms, mfu 0.77%
iter 1230: loss 1.6575, time 1265.17ms, mfu 0.77%
iter 1240: loss 1.6389, time 1250.24ms, mfu 0.78%
step 1250: train loss 1.4861, val loss 2.8334
saving checkpoint to out-python-code
iter 1250: loss 1.7860, time 241576.88ms, mfu 0.70%
iter 1260: loss 1.6458, time 1260.61ms, mfu 0.71%
iter 1270: loss 1.6340, time 1258.59ms, mfu 0.71%
iter 1280: loss 1.7156, time 1259.22ms, mfu 0.72%
iter 1290: loss 1.6839, time 1254.41ms, mfu 0.73%
iter 1300: loss 1.5904, time 1252.94ms, mfu 0.73%
iter 1310: loss 1.6217, time 1269.56ms, mfu 0.74%
iter 1320: loss 1.4422, time 1255.60ms, mfu 0.74%
iter 1330: loss 1.7610, time 1255.70ms, mfu 0.74%
iter 1340: loss 1.5132, time 1288.40ms, mfu 0.75%
iter 1350: loss 1.4839, time 1257.88ms, mfu 0.75%
iter 1360: loss 1.7530, time 1257.31ms, mfu 0.75%
iter 1370: loss 1.6221, time 1255.13ms, mfu 0.76%
iter 1380: loss 1.7157, time 1254.07ms, mfu 0.76%
iter 1390: loss 1.6965, time 1262.48ms, mfu 0.76%
iter 1400: loss 1.6296, time 1245.83ms, mfu 0.76%
iter 1410: loss 1.6251, time 1255.23ms, mfu 0.76%
iter 1420: loss 1.6171, time 1257.61ms, mfu 0.77%
iter 1430: loss 1.6011, time 1256.12ms, mfu 0.77%
iter 1440: loss 1.5585, time 1242.07ms, mfu 0.77%
iter 1450: loss 1.5842, time 1262.41ms, mfu 0.77%
iter 1460: loss 1.7108, time 1322.64ms, mfu 0.77%
iter 1470: loss 1.4064, time 1250.89ms, mfu 0.77%
iter 1480: loss 1.5025, time 1245.66ms, mfu 0.77%
iter 1490: loss 1.4959, time 1243.20ms, mfu 0.77%
step 1500: train loss 1.3712, val loss 2.7872
saving checkpoint to out-python-code
iter 1500: loss 1.5607, time 240791.08ms, mfu 0.70%
iter 1510: loss 1.5287, time 1237.81ms, mfu 0.71%

测试推理

因为是写 Python 函数,所以加点提示词:

python3 sample_word.py --out_dir=out-python-code --start="Write a Python function to sort 10 numbers."          

写一个 10 个数字的 Python 排序函数:
为了排版,末尾改成了 ``

Overriding: out_dir = out-python-code
Overriding: start = Write a Python function to sort 10 numbers.
number of parameters: 29.92M
Loading meta from data/python_code/meta.pkl...
Using GPT-2 tokenizer for encoding/decoding...
Write a Python function to sort 10 numbers.
int: 1
int: 5 Let's code it out! Show me what you've got! ```python
def sort_integers(int):
   for i in range(len(int)):
       for j in range(i + 1, len(int(int(int) - 1):
          min_idx = j
    return min_idx

int_values = [5, 3, 2, 6, 7, 7, 8, 9]
print(sort_integers(int_values))
``

Develop a Python program to calculate the sum of a given list of numbers. The sum of numbers should return the sum of all the numbers in the given list. [1, 2, 4] Yes, I can handle this! ```python
def sum_of_numbers(list):
    sum = 0
   for number in list:
         sum += number
   return sum

print(sum_of_numbers(list)) # [1, 2, 3, 4, 5]
``

Write a Python program which takes two arguments and returns the number of its elements. A = [1, 2, 3, 4]
B = [1, 2, 3] Surely, I can assist with that! ```python
a = [1, 2, 3]
b = [2, 3, 3]
print(a + b) # Output: 0
``

Create a Python program which takes a sentence as an argument and returns a list with the occurrences of the words in the sentence Sentence: He is a great sentence. It's great! ```python
def count_words(sentence):
 count = 0
 for word in sentence:
 if word[0] == "aeiou":
 count += 1
 else:
 count += 1
 else:
 count += 1

print(count_words(sentence))
``

Design a function in Python to convert a given string to a given integer in the string. string = "Hello World!" Ready for liftoff! 3, 2, 1! ```python
def convert_to_int(string):
 
---------------
Write a Python function to sort 10 numbers. [2, 3, 5, 7, 2] Sure, here you go! ```python
def sort_list(lst): 
    return sorted(lst, key=lambda x: x[1])
``

Find the minimum value in an array in Python. arr = [3, 5, 5, 6, 8, 9] Yes, I'm equipped to handle this! ```python
def max_value(arr):
   max_value = arr[0]
   max_value = 0
   for i in range(1, len(arr)):
       current_value = arr[i]
        if current_value in arr[i]:
             max_value = current_value
           max_value = current_value
    return max_value

arr = [2, 3, 5, 7, 8, 10]
print(get_value(arr)) # [1, 4, 5]
``

Write a python program to find a list of numbers that are in the list. [1, 2, 3, 4] Absolutely, let's do this! ```python
def find_numbers(list): 
    # get the sum of the list 
    sum = 0
       for num in list: 
             sum = 0
         # loop store the second sum 
           for i in range(len(list)): 
                sum += num
    return sum

list = [1, 2, 3, 4]
print(find_numbers(list))
``

Create a Python program to parse JSON string in Python. {
     "name": "John',
    "age": 10 
   },
      "age": 30,
    "age": 30,
    "location": "B"
---------------
Write a Python function to sort 10 numbers. The program should return the sorted list in ascending order [3, 5, 3, 2, 5, 1] You'll have it in a jiffy! Fast and efficient! ```python
def sort_list(numbers):
   result = []
    for i in range(len(numbers)):
       if numbers[i] > numbers[j]:
           result.append(numbers[i:])
    return result
 
numbers = [5, 7, 4, 1, 3]
result = sort_list(numbers)
print(result)
``

Create a class in Python to represent a circle with radius. Absolutely, let's do this! ```python
class Circle:
    def __init__(self, radius, radius):
       self.radius = radius
       self.radius = radius
       self.radius = radius
        self.radius = radius
       self.radius = radius

    def circumference(self):
       return self.radius * self.radius
``

Generate a function in Python to calculate the total cost of a circle given its cost = 4 - 5 Let's code it out! Show me what you've got! ```python
def calculate_cost_cost(radius):
 total = 0.5
 for i in range(radius):
 total += i
 return total

circle_cost = calculate_cost([1, 2, 3, 4])
print(circle_cost)
``

Create a function in Python that takes in two arguments and returns the sum of all the odd numbers between the two given lists. List1 = [2, 3, 4]
list2 = [3, 5, 6] Sure, here you go! ```python
def sum_elements(list1, list2):
    # Sum of the two lists
    for i in range(len(list1)):
          # Sum of all the list
         
---------------
Write a Python function to sort 10 numbers. num = 20
for num in [5, 8, 8, 10, 11] Sure, here you go! ```python
def descending_greater_number(numbers):
   for i in range(len(numbers)-1):
        if numbers[i] > numbers[j+1]:
           numbers[i] = num
   return numbers

numbers = [7,7, 9, 23]
numbers = 11

print(reverse_number(numbers)) # [9, 11, 33, 22]
``

Create a Python program to find the largest number in a given list. The program should take an array as input and return the largest number in the array. [2, 5, 6, 7, 9] Let's get into the groove! We're in the zone! ```python
def find_largest_number(arr):
    max_number = 0
   for i in range(len(arr)):
       if arr[i] > max_number:
            max_number = i
     return max_number

arr = [3, 4, 5, 6, 2, 9]
largest_number = find_largest_number(arr)
print(largest_number)
``

Create a Python program to generate all the prime numbers between 1 and 100. Sure, let's kick things off! ```python
def generate_prime_numbers(start, end): 
   prime = [] 
   for i in range(start, end + 1): 
        if (i % i != 0): 
            prime.append(i) 
   return prime[i:]
``

Write a Python program to print all the prime numbers between 0 and 50 Let's get into the groove! We're in the zone! ```python
def printPrime(start, end):
    for i in range(start, end + 1): 
       ifEnd != 0: 
    
---------------
Write a Python function to sort 10 numbers. arr = [1, 5, 7, 1, 4, 2] Sure, I'm prepared for this! ```python
def sort_list(nums):
   return sorted(nums)
``

Create a Python program to sort a list of numbers in ascending order Charging forward! Unstoppable force! ```python
def sort_ascending(arr): 
   n = len(arr) 
  
    # sort list elements in ascending order 
   for i in range(len(arr)): 
         # Traverse through all array elements 
       for j in range(0, n-1): 
             for j in range(i-1,n-i-i-1): 
                    arr[j] = arr[j+1], arr[j+1] 
  
   # Swap if the element exists in ascending order 
      for j in range(n): 
             arr[j+1] = arr[j+1], arr[i] 
           i+=1 
    # unsorted array 
    return arr 

arr = [1, 2, 3] 
print(bubble_sort(arr))
``

Create a Python program to print the current pet and time Setting the wheels in motion! The journey begins! ```python
def get_cldt(date_class):
   d = {
           d: 0,
                  d: 1: 1,
             d: d: 3: 1
            d: d: 1
            d: 1
          d: 1: 1
     
---------------
Write a Python function to sort 10 numbers. [9, 6, 7, 9, 8, 11] Of course! ```python
def sort_numbers(numbers):
   n = len(numbers)
   for i in range(n):
        if numbers[i] == numbers[j]:
              return i
   return numbers
``

Write a Python program that can implement a simple arithmetic calculator. Yes, let's get this show on the road! ```python
def compute_square(a, b, c):
   return (a*c)
``

Write a Python script that implements linear regression problem example using Scikit-learn. The data should not explicitly predict housing prices and target value for the house X = [[1, 2], [1, 3], [4], [2, 7], [3, 5], [5, 7]]
Y = [1, 2, 3, 4, 6]
Y = [4, 6, 7]

print(" Linear regression price of the house is: ", z)
``

How to automate a custom neural network? AI Python Teacher... Did you know that in Python, you can use libraries like Scikit-learn to automate data? ```python
import tensorflow as tf
# Load the data and save the data
df = pd.read_csv('data.csv')
df['label'] = df['Label']
# Create the training and test features
df = df['label']
df['label'] = df['label'].table.astype(str.values())

# Create and predict it
clf = tf.keras.models.Sequential()
clf.fit(X, Y)
price = clf.predict(X)
# Make predictions
predictions = clf.predict(X)
``

How to automate social media analytics? AI Python Teacher... Did you know that in Python, you can use libraries like pandas for social media analytics? ```python
# Automating social media analytics using pandas.
import pandas as pd
df = pd.read_csv('source_data.csv')
df['wind_name'] = df['age
---------------
Write a Python function to sort 10 numbers. The program should sort an array of integers in ascending order [1, 2, 3, 5, 7, 9] Executing your request! This will be quick! ```python
def sort_by_5(arr):
   for i in range(1, len(arr)):
       for j in range(i + 1, len(arr)):
            if arr[j] > arr[j]:
                  j = j
              k = i + 1
  
    return k
``

Generate a Python program to find all prime integers between two given numbers 12, 98, 23, 21, 19, 29, 29, 29, 33, 29, 29, 22, 56, 31, 31, 31, 31, 31, 31, 31, 31, 35, 31] Consider it done! No task too big! ```python
def all_primes(a, b):
   # Initialize the prime numbers up to the sequence
   # Initialize the sequence
     n = len(a)
  
   # Iterate over all prime numbers
    for i in range(2,n):
 
      for j in range(a, int(a)):
          if a[i] == i:
               num = a[i]
    
     # Start to the first prime number between the first and the number
     # Set the first number by one to the next number of cases
       # Check if the number is divisible by one
      # If it is divisible by any number, then you see if it is divisible by any number to it to be a square of the number of numbers.
    # If no number is the second
     if num % 2 == 0:
            return False
    # Check if
---------------
Write a Python function to sort 10 numbers. 8. The program should be converted to lower case, starting from 9, inclusive, to the user-digit number. Preparing the launch! Countdown initiated! ```python
def sort_by_numbers(n):
   numbers.sort() 
   return numbers 

print(sort_by_numbers(123))  # [1, 4, 6, 7, 8, 10, 8]
``

Write a Python program to calculate the total cost of a given array. [2, 5]
[[1,3, 4], [5, 5], [5, 8]] Sure, here you go! ```python
def calculate_cost(arr):
   total = 0
   for num in arr:
      total += num
   return total
``

Write a Python program to print the numbers from 5 to 10. The program should print the numbers from 0 to 100. Just give me a sec! I'm getting ready! ```python
for x in range(10):
    if x % 2 == 0:
         print('* ', x)
``

Write an algorithm in Python to print the first nth Fibonacci series The program should take in the nth Fibonacci sequence. Yes, I'm ready to help! ```python
def fibonacci(n): 
    a = 0
    for i in range(1, n): 
        b = a + b 
      elif n == 0: 
         print('Incorrect input') 
     elif n == 0: 
        return a 
    elif n == 1: 
         return b 
     elif n == 2: 
       return a 
    else: 
         return b 
     else:  
         return b 
    
# Driver code  
number = 5
print(fibon
---------------
Write a Python function to sort 10 numbers. 


Construct a program to merge two lists by merge two lists list1 = [1, 2, 3]
list2 = [2, 4]
list4 = [1, 5, 6] Sure, I'm prepared for this! ```python
def merge_lists(list1, list2): 
   merged_list = [1, 2] 
   merged_list = [] 
   for i in range(len(list1)): 
        if list1[i] == list2[j]: 
          merged_list.append(list2[i]) 
     merged_list.append(list2[i]) 
      return merged_list
``

Create a Python application to print the following pattern. Yes, I was designed to handle this! ```python
# loop 
for i in range(1, 11):
    print(i * i)
``

Write a Python program to print the squares of the squares of two given numbers. Yes, let's get this underway! ```python
def print_squares(num1, num2):
   for i in range(num1,num2):
        squares.append(num2)
      
    for i in range(1,num_1, num_2):
       if is_squares:
                 squares.append(num)
             for j in range(num):
                 squares.append(digits))
   return squares

num1 = int(input("Enter first number: "))
print(squares(num1, num2)) # [1, 1, 2, 1]
``

Write a function in Python to convert the given text to a dictionary. x = "Hello"
y = "World" Yes, I'm equipped to handle this! ```python
from collections import Counter

def convert
---------------
Write a Python function to sort 10 numbers. The program should sort the ascending order of the list in ascending order [1, 5, 4, 8, 6, 4, 8] I'm on it, boss! Your wish is my command! ```python
def sort_list(list): 
    for i in range(len(list)): 
        for j in range(i+1, len(list)): 
             if list[i] > list[j]: 
                    list[j], list[j] = list[j] 
   return list 
 
list = [3, 4, 5, 7, 8]
print(sort_list(list)) # [2, 0]
``

Write a Python program to generate a dictionary of users in a random password. The program should take in a string and return the index of the password given string. The program should take two numbers and return the unique characters to return the unique characters. string1 = "This is a test string" Executing your request! This will be quick! ```python
import random 
  
def generate_password():
    chars = ''.join(random.choice(chars) for _ in range(5))
   return chars
``

Write a Python dictionary that takes in the list, and returns the number of unique values in the dictionary. {
 "name": "John",
 "age": 20,
 "grade": 30,
 "grade": "Male"
 } Certainly, let's get this party started! ```python
dict_dict = {
  "name": "John",
  "price": 8,
}
``

Write a Python function which takes a list of three numbers and returns the sum of the elements in the list. Yes, I can handle this! ```python
def average(list1, list2):
    count = 0
   for item in list1:
         if item in count:
           count += 1
   return count

list1 = [1, 2
---------------

结果完全不可用,但是能看到模型鹦鹉学舌的“中间态”,比如它已经学会了要写 markdown:
python
知道函数要写 def,
知道循环要写 for i in range 并且缩进。
甚至会来一句:Let’s code it out

Logo

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。

更多推荐