Notes on the process of turning a locally experimented natural language processing algorithm into an API server on Heroku (No consideration of whether it is appropriate to do this on Heroku → API server placement considerations)
$ mkdir regroup-split-server$ cd regroup-split-server $ python3 -m venv venv$ code .$ source venv/bin/activate$ mkdir server$ code server/__init__.pypython
from flask import Flask
app = Flask(__name__)
def create_app():
return app
@app.route('/')
def root():
return "OK"
- `$ pip install --upgrade pip`
- `$ pip install flask`
- Set environment variables in a file
- `$ code .env`
.env
FLASK_APP=server
FLASK_ENV=development
- `$ pip install python-dotenv`
- `$ flask run`
- Verify that it runs without problems and that you get an OK when you open [http://127.0.0.1:5000/](http://127.0.0.1:5000/)
- `$ git init`
- `$ code .gitignore`
.gitignore
venv/
*.pyc
__pycache__/
- `$ git commit -m 'minimal Flask server'`
- Actually, I'm doing Cmd+Enter in the Source Control tab of VSCode.
- Add [gunicorn](/en/gunicorn) and deploy.
- This also serves as [HTTPS for Flask](/en/HTTPS%20for%20Flask), which is included in the minimum configuration because I think it is not possible to have only HTTP as a modern API server.
- `$ pip install gunicorn`
- `$ pip freeze > requirements.txt`
- `$ code Procfile`
Procfile
web: gunicorn server:"create_app()"
- `$ heroku create regroup-split-server`
- `$ git commit -m "add gunicorn"`
- `$ git push --set-upstream heroku master`
- Build logs appear. Make sure it's not an error.
- `$ heroku open`
- Open the deployed one in a browser, making sure OK is displayed.
I'm sure I'll want to separate them under certain circumstances, but until I have a clearer idea of how I want to separate them, I'm going to do it in unison.
Keep folders separate for easy separation in the future.
$ mkdir server/regroup_splitCopy files that look necessary
deploy.sh
cp rich_tokenizer.py ../regroup-split-server/server/regroup_split/
cp regroup_split.py ../regroup-split-server/server/regroup_split/
cp TAIL_TOKENS_TO_REMOVE.txt ../regroup-split-server/server/regroup_split/
cp HEAD_TOKENS_TO_REMOVE.txt ../regroup-split-server/server/regroup_split/
cp test/simplelines1.txt ../regroup-split-server/server/regroup_split/test
cp test/regression_test.json ../regroup-split-server/server/regroup_split/test
- Run unit tests and check for errors.
- `ModuleNotFoundError: No module named 'MeCab'`
- $ pip install mecab
- Don't do this see [mecab on heroku](/en/mecab%20on%20heroku).
- `$ pip install mecab-python3==0.996.5`
- If the unit test passes, call the test from server/__init__.py
- flask run to see if the test works on the local development server
- It's easier to read error messages on the local development server than after deployment.
- Common Corrections
- Relative import `from .foo import bar`.
- I usually run it as a script and experiment with it, but it is imported from the server and run as a module, so the import behavior changes.
- Maybe it's better to [IPython with %run -m](/en/IPython%20with%20%25run%20-m) on a regular basis.
- path of a data file
- If you're writing in a way that depends on the current directory at runtime, you'll get into trouble here.
- Use `DIR = os.path.dirname(__file__)`.
Push to heroku when it works locally
$ pip freeze > requirements.txt$ git push$ heroku logs --tailTypeError: 'dict_keys' object is not reversibleBy default, newly created Python apps use the python-3.6.12 runtime. --- Heroku Python Support | Heroku Dev Center
$ echo python-3.8.7 > runtime.txtAdd an interface to return processed values passed from the server to the experimental scripts that have been running on the terminal and observing the results on the standard output.
python
def process_single_line(line):
tokens = tokenize(line)
calc_split_priority(tokens)
return dict(
tokens=concat_tokens(tokens, " "),
split=[concat_tokens(ts) for ts in split(tokens)])
@app.route('/api/', methods=['GET'])
def api():
text = request.args["q"]
ret = regroup_split.process_single_line(text)
return ret
- /api/?q=... Pass to GET to check operation with
- Automatically serialized in JSON
@app.route('/api/', methods=['GET', 'POST'])
def api():
if request.method == "GET":
text = request.args["q"]
else:
text = request.json["q"]
ret = regroup_split.process_single_line(text)
return ret
- `$ curl -X POST -H "Content-Type: application/json" -d '{"q":"test"}' localhost:5000/api/`
- operation check
- git push to make sure it works on heroku as well
import requests
import json
API_URL = "https://regroup-split-server.herokuapp.com/api/"
sample_text = "Ah, so people who are not used to the process of making lots of stickies and doing the KJ method don't have a good idea of how granular the information should be at the point of making the stickies in the first place. That's where the software needs to help."
payload = {"q": sample_text}
r = requests.post(API_URL, json=payload)
assert r.ok
for s in r.json()["split"]:
print(s)
"""
Expected output:
Make lots of sticky notes.
People unfamiliar with the process of doing the KJ method.
How granular is the information at the point where you make a sticky note?
I can't pinpoint a good one.
Software needs to support
"""
This page is auto-translated from /nishio/Herokuで自然言語処理 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.