Tokenizer基础用法
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
result = tokenizer.tokenize("I have a new GPU!")
print(result)
result = tokenizer.encode("I have a new GPU!")
print(result)
result = tokenizer.decode(result)
print(result)
输出:
['i', 'have', 'a', 'new', 'gp', '##u', '!']
[101, 1045, 2031, 1037, 2047, 14246, 2226, 999, 102]
[CLS] i have a new gpu! [SEP]
生成文本(使用流水线API)
模型下载1:openai-community/gpt2 · Hugging Face
模型下载2:openai-community/gpt2 · HF Mirror
from transformers import pipeline, set_seed
generator = pipeline("text-generation", model="gpt2")
set_seed(42)
result = generator(
"Hello, I'm a language model,", max_length=30, num_return_sequences=5
)
print(result)
输出:
[{'generated_text': "Hello, I'm a language model, so that makes a lot of sense. (laughter) That's why it comes here, that's why I"}, {'generated_text': 'Hello, I\'m a language model, my brain is a model. Maybe I should stop pretending that I understand human language and start practicing it." And'}, {'generated_text': 'Hello, I\'m a language model, I\'m not a programmer. This is good, you understand.\n\n"But there is something that you'}, {'generated_text': "Hello, I'm a language model, and that's why I feel so strongly about this game.\n\nI think games don't make a lot"}, {'generated_text': 'Hello, I\'m a language model, it\'s called a macro. But I don\'t really understand them yet."\n\nWhile the
program is very'}]
生成文本(不使用流水线API)
模型下载1:openai-community/gpt2 · Hugging Face
模型下载2:openai-community/gpt2 · HF Mirror
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
text = "long long ago, there was a"
encoded_input = tokenizer(text, return_tensors="pt")
output_ids = model.generate(**encoded_input)
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)
输出:
long long ago, there was a time when the world was a little bit more peaceful.
"I'm sorry, but I
生成文本(使用流水线API)
模型下载1:facebook/bart-large-mnli · Hugging Face
模型下载2:facebook/bart-large-mnli · HF Mirror
from transformers import pipeline
classifier = pipeline(model="facebook/bart-large-mnli")
result = classifier(
"I have a problem with my iphone that needs to be resolved asap!!",
candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"],
)
print(result)
输出:
{'sequence': 'I have a problem with my iphone that needs to be resolved asap!!', 'labels': ['urgent', 'phone', 'computer', 'not urgent', 'tablet'], 'scores': [0.5036358833312988, 0.47879910469055176, 0.012600448913872242, 0.0026557832024991512, 0.002308770315721631]}
生成文本(使用流水线API)
模型下载1:facebook/opt-1.3b · Hugging Face
模型下载2:facebook/opt-1.3b · HF Mirror
安装包:
pip install accelerate bitsandbytes
import torch
from transformers import pipeline
pipe = pipeline(
model="facebook/opt-1.3b", device_map="auto", model_kwargs={"load_in_8bit": True}
)
output = pipe("This is a cool example!", do_sample=True, top_p=0.95)
print(output)
输出:
{'generated_text': 'This is a cool example! The idea for this is a little far fetched but if you'}]