bert-as-service

2021-04-28
2 min read

Taken down from the Github repo of bert-as-service: https://github.com/hanxiao/bert-as-service

Installation

pip install bert-serving-server bert-serving-client

Getting Started

1. Download a Pre-trained BERT Model

Download a model listed below, then uncompress the zip file into some folder, say /tmp/english_L-12_H-768_A-12/

List of released pretrained BERT models:

BERT-Base, Uncased12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Large, Uncased24-layer, 1024-hidden, 16-heads, 340M parameters
BERT-Base, Cased12-layer, 768-hidden, 12-heads , 110M parameters
BERT-Large, Cased24-layer, 1024-hidden, 16-heads, 340M parameters
BERT-Base, Multilingual Cased (New)104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Base, Multilingual Cased (Old)102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Base, ChineseChinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters

2. Start the BERT service

After installing the server, you should be able to use bert-serving-start CLI as follows:

bert-serving-start -model_dir /tmp/english_L-12_H-768_A-12/ -num_worker=4

3. Use Client to Get Sentence Encodes

Now you can encode sentences simply as follows:

from bert_serving.client import BertClient
bc = BertClient()
bc.encode(['First do it', 'then do it right', 'then do it better'])

It will return a ndarray (or List[List[float]] if you wish), in which each row is a fixed-length vector representing a sentence. Having thousands of sentences? Just encode! Don’t even bother to batch, the server will take care of it.

As a feature of BERT, you may get encodes of a pair of sentences by concatenating them with ||| (with whitespace before and after), e.g.

bc.encode(['First do it ||| then do it right'])
Avatar
Hanfang Lyu | Pearl Slowly Updating...