Building a Robust Dutch NLP Symptom Checker: From Data to Deployment

9 min readJan 19, 2024

Image generated using OpenDalleV1.1 — using NLP and tech in the medical domain to assist diagnosis

In the realm of medical apps, providing users with accurate information about their symptoms is crucial. To address this need, I have developed an NLP Symptom Checker tailored for Dutch speakers. This tool allows users to input their medical concerns freely, guiding them to select a chief complaint. This article explain the process from data collection to model training to local deployment and hosting on Google Cloud. Such an application can be helpful to direct patients to the correct care specialist for secondary care in case it is needed.

The data and code are available publicly on GitHub. The technologies used here are: Jupyter Notebook, scikit-learn, MultiLabelBinarizer, XGBoost Classifier, Flask API, Docker, Google Cloud Container Registry, Google Cloud Run.

Understanding the Data

The foundation of the symptom checker lies in two key data files:

user_inputs.csv: This file contains texts describing medical complaints, entered by users of the app. This will be our input dataframe.

An example of user complaints in their raw format

labels.csv: The corresponding chief complaints are annotated by a medical expert. There are 74 different medical complaints such as ‘Cough’, ‘Leg pain’. This will be our target dataframe.

This file classifies each user input text with one or multiple diagnoses, with 1 indicating a match

Selecting one or multiple complaints based on free text from users is not a trivial task. An API will be implemented that allows users to describe their medical situation in their own words upon which the API suggests a list of the most suitable chief complaints. Since multiple chief complaints can be valid for a patient’s description, we interpret the task as a multi-class multi-label classification task.

What is XGBoost?

XGBoost, short for Extreme Gradient Boosting, is an ensemble learning algorithm known for its efficiency, speed, and accuracy. It belongs to the family of gradient boosting algorithms, which sequentially combines weak learners to create a strong predictive model. XGBoost has gained popularity due to its ability to handle diverse data types, feature importance interpretation, and regularization techniques that prevent overfitting.

XGBoost Multi-Class Multi-Label Classification

The XGBoost classifier is utilized to address the multi-class multi-label classification task. Given a user’s free-text input describing medical symptoms, the model predicts the most suitable chief complaints. Here’s how XGBoost fits into the workflow:

Data Preparation: The model is trained on a dataset comprising user inputs and corresponding expert-annotated chief complaints. See all the steps in the Jupyter Notebook titled Training.ipynb. The input dataframe is preprocessed to extract relevant features from the text. This includes removing stopwords and numbers from the text. The number of words per sentence is investigated to limit extremely long sentences.

Investigating the amount of words per user input sentence

The tags (labels) in the target dataframe are encoded in a binary format in order to be used for training. MultiLabelBinarizer from scikit-learn is used in this case. A new column in our target dataframe is created with a text label of the complaints with a ‘1’ in that row, as seen clearly in the image below:

A labels column is added using the pandas ‘dot’ function

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df_target['labels'] = df_target.dot(df_target.columns+',').str[:-1].str.split(',') # In case of multiple complaints, these are joined by a comma
yt = mlb.fit_transform(df_target['labels']) # This is the target output of the model

Let’s try one row and how it’s target column has been converted:

idx = 36
print('Model output: ', yt[idx]) # This is the output of the model
print('Output after transformation: ', mlb.inverse_transform(yt[idx].reshape(1,-1))) # This is how the output is then transformed to the user
# Calculate the number of classes in total
print('Classes:', len(mlb.classes_)) # Number of classes

Model output:  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Output after transformation:  [('Hoofdpijn', 'Oogklachten of beschadigingen aan het oog')]
Classes: 74

Each input row now has a single encoded binary matrix which can be used to train a classifier model.

Model Performance: The accuracy score on a separate testing set is around 0.3, which shows that the model could predict the exact label combination less than 30% of the time. However, the accuracy score contains weaknesses for a multilabel prediction evaluation. The accuracy score would need each complaint to have all the labels present in the exact position, or it would be considered wrong. To mitigate this problem, we must evaluate the label prediction rather than their label combination. In this case, we can rely on the Hamming Loss evaluation metric. Hamming Loss is calculated by taking a fraction of the wrong prediction with the total number of labels. Because Hamming Loss is a loss function, the lower the score is, the better (0 indicates no wrong prediction and 1 indicates all the prediction is wrong). The Multilabel Classifier Hamming Loss model is 0.01, which means that our model would have a wrong prediction 1% of the time independently. This means each label prediction might be wrong 1% of the time.
Model Deployment: Once trained, the model is saved along with its artifacts in the artifacts folder. The app.py file creates an API using Flask, enabling the deployment of the model for real-time predictions.
Prediction: Users can input their medical concerns through the API, and the XGBoost model provides a list of possible chief complaints based on the text.

The model, implemented in a Jupyter Notebook named Training.ipynb, is based on the XG Boost classifier. This model is already trained, and all artifacts are saved in the artifacts folder. The accompanying app.py file creates an API using Flask, enabling the model to provide predictions based on user input.

API implementation using Flask

The full API implementation is in the app.py file. The core part is shown below:

class Preds(Resource):
    def put(self):
        json_ = request.json
        # If there are multiple records to be predicted, directly convert the request json file into a pandas dataframe
        if islist(json_['text']):
            entry = pd.DataFrame(json_)
        # In the case of a single record to be predicted, convert json request data into a list and then to a pandas dataframe
        else:
            entry = pd.DataFrame([json_])
        # Transform request data record/s using the pipeline
        entry = clean_text(entry)
        entry = entry.drop(columns=['text'])
        entry_transformed = vectorizer.transform(entry['text_cleaned'])
        # Make predictions using transformed data
        prediction = clf_loaded.predict(entry_transformed)
        res = {'predictions': {}}
        # Create the response
        for i in range(len(prediction)):
            res['predictions'][i + 1] = mlb.inverse_transform(prediction[i].reshape(1,-1))
        return res, 200 # Send the response object

api.add_resource(Preds, '/predict')

if __name__ == "__main__":
    app.run(debug = True, host='0.0.0.0', port=5050)

Explanation of the above code,

Created a class called Preds with function put which processes the user’s input text and predicts .
Finally created a server in the localhost with port 5050 using app.run(debug=True, host='0.0.0.0', port=5050)

Run and test the app.py file in the terminal using the following command to run the flask app.

python app.py

The requirements.txt file contains all package-related information used in the project. Create the requirements.txt file by executing the following command.

pip freeze > requirements.txt

Local Deployment with Docker

Dockerfile is an essential document which contains instructions to build a Docker image.

Create a dockerfile under the current directory and add the following code to it:

# Use an official Python runtime as the base image
FROM python:3.9-slim as build

# Set the working directory in the container to /app
WORKDIR /app

# Copy the current directory (our Flask app) into the container at /app
COPY . /app

# Install Flask and other dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Make port 5050 available for the app
EXPOSE 5050

# Run the command to start the Flask app
CMD ["python", "app.py"]

Dockerfile starts with FROM command to set the base image as shown.
RUN command is used to execute commands in the container. It is useful to perform system update, pip upgrades, and package installation.
WORKDIR command creates a directory named app for the container.
COPY command transfer files / folders to the file system of the container.
CMD command runs the python command app.py in the container.

To run the application locally, the project provides a Docker container. By executing the following commands, users can build and launch the container:

docker build -t flask-predict-api .
docker run -d -p 5050:5050 flask-predict-api

flask-rest-api is the image name(you can change if required).

To list the docker images, use the following command:

docker images

To list the docker image status, use the following command:

docker ps -a

Now you can test the APIs using a browser, command terminal or Postman.

Testing the API Locally

The Flask API accepts a dictionary with the parameters text and complaints, returning a list of labels. Here's an example with two input complaints:

curl -XPUT -H "Content-type: application/json" -d '{"text": ["...first complaint...", "...second complaint..."]}' 'http://127.0.0.1:5050/predict'

Taking It to the Cloud

For a public API deployment on Google Cloud, the process involves building and pushing the Docker image to Google Cloud Container Registry. A Cloud Run service is then started, allowing anyone to access the API. The following commands showcase this procedure:

gcloud builds submit --region=us-west2 --tag us-west2-docker.pkg.dev/{your-space}/flaskapi-docker-repo/flask-predict-api:latest

gcloud run deploy --image us-west2-docker.pkg.dev/{your-space}/flaskapi-docker-repo/flask-predict-api:latest --allow-unauthenticated

Don’t forget to replace {your-space} with your Google Cloud space.

Now, the API is accessible using a command similar to the following:

curl -XPUT -H "Content-type: application/json" -d '{"text": ["...complaint 1...", "...complaint 2..."]}' 'https://your-cloud-run-url/predict'

Here is an example of three complaints and the response from the API I am hosting on Google Cloud:

curl -XPUT -H "Content-type: application/json" -d '{"text": ["2 weken heb ik een kriebelhoest waar ik niet vanaf kom. soms ook hees. Wat kan ik doen? Heb al codeiene geprobeerd, Ventolin van mijn man. beide geven geen verlichting. mijn werk vraagt dat ik de hele dag kan praten. ik heb dan hoestbuien.", "benen", "Al enkele weken last van pijn bij mijn kaak , vanaf vorige week druk op mijn oor en hoofdpijn. Vanmorgen wakker geworden met pijnlijk dik oog en hoofdpijn. Oog is dunner maar niet minder pijnlijk"]}' 'https://flask-predict-api-arb6xi42dq-ew.a.run.app/predict'

Response:
 {
 "predictions": {
 "1": [
 [
 "Hoesten",
 "Stemklachten of heesheid"
 ]
 ],
 "2": [
 [
 "Beenklachten"
 ]
 ],
 "3": [
 [
 "Hoofdpijn",
 "Oogklachten of beschadigingen aan het oog"
 ]
 ]
 }
 }

The first text translates in English as

“I’ve had a tickly cough for 2 weeks that I can’t get rid of. sometimes also hoarse. What can I do? Have already tried codeiene, my husband’s Ventolin. both provide no relief. My job requires me to talk all day long. I then have coughing fits.”,

This text is classified as:

“Cough” ,“Voice complaints or hoarseness”

The second text is just one word:

“legs”

and is classified as:

“Leg complaints”

The third text is:

“I have been suffering from pain in my jaw for several weeks, since last week pressure on my ear and headache. Woke up this morning with a painful swollen eye and a headache. Eye is thinner but no less painful”

The two classes recommended in this case are:

“Headache”, “Eye complaints or damage to the eye”

Feel free to try this public API with your complaints, as long is still being hosted on Google Cloud!

Future Enhancements

While the current implementation is robust, there’s always room for improvement. Some avenues for future development include:

Model Improvement: Enhancing the model with better preprocessing and fine-tuning.
Exploring LLMs: Trying out powerful language models such as BERT, RobBERTa, etc.
Scaling Deployment: Moving towards Kubernetes for improved load balancing and manageability.

Feel free to experiment with different texts and see how this Dutch NLP Symptom Checker performs in various scenarios!