transformer_replication

复现transformer

day1 配置环境&下载数据集

1
2
3
conda create --name transformer python=3.8 -y
conda activate transformer
pip install torch torchvision torchaudio
1
pip install datasets

如果没有安装上

1
conda install -c conda-forge datasets

然后配置环境变量修改到国内镜像

1
2
pip install -U huggingface_hub
export HF_ENDPOINT=https://hf-mirror.com

数据集:https://huggingface.co/datasets/wmt/wmt14

运行代码下载

1
2
3
4
5
6
7
8
from datasets import load_dataset


ds = load_dataset("wmt/wmt14", "de-en")

print(ds)

print(ds['train'][0:5])

数据集下载在了

1
~/.cache/huggingface/datasets/wmt___wmt14/

本地配置

day2

inference

hf-mirror (huggingface 的国内镜像)