Create Your First Machine Learning Mobile Application

Published in

The Web Tub

8 min readJan 13, 2022

In this article, we will cover how to train the machine learning (ML) model with Teachable Machine, incorporate the model with web and mobile applications, and finally create a simple rock-paper-scissors game to play with a computer. In particular, we will be building an image classification model in which we show our hand to the camera, and the model will predict whether the hand is “rock”, “paper”, or “scissors”.

What is Teachable Machine?

Teachable Machine is a web-based tool that allows us to easily train ML models without much prior knowledge with machine learning. The tool simplifies the process from adding training data, training the model, and exporting the model. You determine what “Classes” you want to train, then add your input with your webcam and hit “Train.” The model then quickly learns the differences between these Classes you made. It is pretty easy!

Creating Project

First, let’s navigate to https://teachablemachine.withgoogle.com/ and click “Get Started”. As of this writing, Teachable Machine support 3 types of classification — image, audio, and pose.

In this article, we will only explore the first part — Image classification. So, click on “Image Project”. In the pop-up dialog, select “Standard image model”. It is the model we can use to run on the browser or your mobile phone with fair performance. If you want to run on a smaller device with limited capabilities, you can try selecting the “Embedded image model”.

Classes

Next, let’s create classes or categories we need to model to classify. As we create an ML model to classify the “Rock-Paper-Scissors” game, we will create 4 classes as follows

Nothing: To detect when there is no hand gesture displayed.
Rock: To detect when we show a “rock” sign
Paper: To detect when we show a “paper” sign
Scissors: To detect when we show a “scissors” sign

Adding training data

Now, it’s time to collect the training data. You can start by clicking on the “Webcam” button and then click the “Hold to Record” button to record different pictures you show to the webcam.

Below, I added some sample images for each class.

What you have seen above is barely just an example of each class. What you need to do is to add more different images to let the model learn different images as many as possible. Here are some ideas

“Nothing” class: add more images where you are presenting yourselves in it doing nothing or presenting hand gestures other than “Rock”, “Paper”, or “Scissors”.
“Rock” class: adding more images from your left and right hand, different distances from the camera, different backgrounds, etc.
You can do the same for “Paper” and “Scissors” classes. Remember, again, the goal is to present different images as many as possible to a computer so that it can learn to generalize them and do a good job when we put them to work.

Try adding images at least 100 images per class. Most of the time, the more the better. Refer to this official tutorial for more detail.

Train

Once you add enough images, click “Train Model” to train your model. It takes a few minutes to train depends on the amount of your data.

Testing Model

After it is done, the new “preview” panel will appear next to it. You can then show your hand to the webcam and your trained model will try their best to classify it. You can also add more images if you feel that the model is not good enough yet.

Export Model

Once it is working as expected, you should save the project to your google drive or download it to your local pc for later usage in case you would like to add more images or classes in the future so that you don’t have to redo it all over again. You can click on the icon next to Teachable Machine on the top left and choose “Save project to Drive” or “Download project as file”.

Next, click on “Export Model” on the “preview” panel → Choose “Tensorflow.js” tab → Choose “Download” → and click “Download my model” to download the trained model to your pc to use later in the application.

Let’s code our app

Now that we have the model, let’s learn how to use it in the app and create the “rock-paper-scissors” game. We will be using Vue3 and Framework7 to build this application. You can clone the minimal template from this Github and start building block by block from this article or clone this production-ready application from this Github. I recommend the latter one. You can just lay back and I will explain the main piece of the program.

Before we jump to coding, let’s first describe the application layout and functionalities that we are going to build. There are four main parts from the screenshot we have above. The first part is the live video feed from the camera and it will classify the image we are showing to the camera. Next, we have the “Hand” sign as the “Player” and the “Scissors” sign as the computer. The “Hand” sign is what our model classifies from what it sees in the camera and the “Scissors” is a random sign chosen by the computer. Following the two buttons, the first button is the toggle button which can start playing or pausing the game and the second button is a toggle button to whether display the camera or not. Finally, the last part is the result section from the comparison between the player’s and the computer’s chosen hand.

And here is the code — “src/pages/home.vue”

Application layout

We create 2 elements from lines 13–18. The video element is used to display the live video from the PC’s webcam. and the image element is used to project pictures from the mobile phone’s camera. The rest of the code should be very straightforward.

Next, let’s explore the “script” tag.

Starter

We need to install 2 more dependencies — @teachablemachine/image and whatwg-fetch on top of the dependencies for Vue and Framework7. The Teachablemachine library is needed for handling operation with the model trained with Teachable Machine web application. The whatwg-fetch library is used to fix the loading model from the filesystem on the android platform in which we use on line 31.

On line 36, we load the model from this directory “static/model/rock-paper-scissors”. So please unzip your downloaded model into the directory. There are 3 files — metadata.json, model.json, and weights.bin. metadata.json is simply the metadata of your model. It stores information such as image size, labels, the version and timestamps of your model, etc. The model.json and weights.bin are the important files representing the model network structure, parameters, weights, and bias you have trained.

On line 50, we have the “play” function. If we are running on a mobile device, we will call “predictWithCanvasCamera” function which the image is displayed from the CanvasCamera plugin and using MediaDevices web API by invoking “predictWithUserMedia” function if running on the web browser.

Predicting on web browser

UserMedia

On line 9, we create a function to prepare the setting and constraint for UserMedia web API. We set the height and width to exactly 224 pixels, the same size as our model’s input image size. On the success callback function (line 21), we assign the webcam stream as the source of the video element.

On line 38, when the “play” button is clicked, we first toggle the icon to “pause” and then start two timers — predictionTimer and gamePlayerTimer. The predictionTimer will periodically call “classify” function with video element (videoRef) as the input. The gamePlayTimer is the interval timer to periodically change the value of the computer’s hand. Once both hands are chosen, the “calculate” function is called to compare and render the result. The “calculate” function is just a simple function to solve which hand wins over. Otherwise on line 45, if the “pause” button is clicked, we reset everything.

Random Computer Hand

Who is the winner

And here is the “classify” function. We call the “predict” function of TeachableMachine’s model. The method is accepting an argument as an image or video element. The result is stored in “prediction” variable as an array. We then loop to find the class which has the highest probability (confidence value). Finally, we assign and render the image to the player’s hand, line 22.

Prediction or Classification

Predicting on mobile device

The UserMedia web API is not fully working on the mobile device (Cordova framework), so we instead use Cordova Canvas Camera plugin. We also need to install Cordova File plugin to handle displaying images from the mobile phone’s file system.

Canvas Camera Plugin

On line 27, when the camera plugin detects an acceptable image, it will invoke “readImageFile” function to render the image. With the latest Cordova version, we are no longer able to display images from the filesystem source. So the function is created to handle different file protocols on ios/android and convert images to blob before assigning them to image element (imageRef).

The “predictWithCanvasCamera” function, on line 57, is very similar to “predictWithUserMedia” function. The only main difference is we pass image element (imageRef) to the “classify” function instead of the video element (videoRef).

And finally, here are the rest of the functions — mostly the helper functions.

Helpers

Run on mobile device

If you would like to run this application on your phone device, please sign up for an account with Monaca and follow this guideline on how to build an android or iOS application.

What is Monaca?
Cross-platform hybrid mobile app development platform and tools in the cloud

If you have everything in place, the application should be working as follows

All the source code can be found here — https://github.com/yong-asial/rock-paper-scissors

Conclusion

In this article, we have learned how to use Teachable Machine to train a custom machine learning model to classify images, incorporate them with web applications by using UserMedia web API and with the mobile application by using Cordova Canvas Camera plugin. This is a long tutorial so if you find any mistake, please let me know in the comment section or open Github issue for me.

happy Coding.