Recognize whether a specific person is at the door.
Facial recognition on iPhone/iPad/Surface
Can we untether the device?
Small power/computation consumptions, no wiring necessary
No construction licensing/permit needed for deployment.
Example:
Recognize whether there are no people in the room in order to turn off the lights. to deploy a TinyML device in an office which
Smart glasses that can process the interesting visual cues that are coming in (catching rare items when shopping, noticing hard-to-detect road signs, …)
Challenges
Performance-related aspects:
Latency
Bandwidth.
Capabilities:
Accuracy
Personalization
Data security and privacy
Resource constrains
Challenges
Bandwidth and Latency
In a cascading architecture, a tinyML device can perform the initial interesting item detection, then offload the subsequently more compute intenstive task to the cloud if an interesting item is detected.
Example: Is there a cat knocking on my door?
How much data are we sending?
An image in neural networks is around 224 by 224, sometimes 300 by 300 pixels.
Three channels (R, B, G) per pixel.
Each channel requires 4 bytes for representation.
How long does it takes?
Ping: 25ms (the latency just to be able to send something to the local gateway and be able to get a response back)
Download speed: 35 megabits per second.
Upload speed: 4.62 megabits per second ~ 570 KBytes per second
Image size: 602,000 bytes of data ~ 602 Kbytes
Take one second!
Actual performance
The cat could be gone!!!!
Comparing to keyword spotting
KWW is at least two order of magnitudes smaller
Audio signal produces significantly more data than audio signal.
Higher latency
Higher power consumption
Lower user satisfaction
Capability constraints
What if we don’t go to the cloud (no more latency and bandwidth issue!!!)
“Each image is assigned a label 1 or 0. The label 1 is assigned as long as it has at least one bounding box corresponding to the object of interest (e.g. person) with the box area greater than 0.5% of the image area.”
Person: 1
Not-person: 0
Somewhat similar to KWS, but you don’t have to create datasets from scratch!!!
Powerful concept, as long as data usage license is permissive!!!
Is that data set really going to meet the needs of your particular TinyML application?
Balanced
Relevant
Quality
Quantity
VWW Model
Recall: constraints and trade-offsRecall: convolutions
Convolutions on gray-scale pictures
Convolutions on colored images
Depthwise
Input Feature Map: 8x8x3 (widthxheightxchannels)
Kernel: 3x3x3 (each channel uses 1 filter)
Final output: 7x7x1 tensor
Math generalization
$D_F$ : dimension of a square input feature map
M: number of input channel
$D_K$ : dimension of filter matrix (square)
N: number of output channel
Total number of multiplication: ${D_K}^2 * M * {D_F}^2 * N$ (a lot!!!)