Creating something new is never easy, and NetsPresso was no exception. From the inception of the idea, through to the release of NetsPresso 1.0, we endured many challenges, setbacks and failures. Looking at it from the outside, it may have appeared to be a simple process of research and development, but reality rarely meets our expectations. To give people a proper insight into the process, I have summarized some of the ups and downs of our NetsPresso journey so far.
The First Step – Establishing Model Compression.
When I joined Nota in January 2020, work was being done to formalize the use of AutoML for the purposes of Model Compression (AMC). The idea had originated from a 2019 paper titled, AMC: AutoML for Model Compression and Acceleration on Mobile Devices. Inspired by the paper, we sought to develop our own AMC. But what exactly is it? In general, it can be broken down into the following four sub-categories.
Pruning
Filter Decomposition
Knowledge Distillation
Quantization
Many in the industry saw AMC as a useful lightweight tool that could be applied to AI models on selected devices. Some said it would be an easy tool to develop, while others argued that it would be more difficult. What they were all agreed on, however, was that it would be a useful tool to have in our arsenal. With this agreement in mind, the journey to create NetsPresso began in earnest.
🤔 If you look at the early code, you will see the names of people who are now working on different teams. At that time, Nota was not one single team per se, every individual participated in multiple concurrent projects, so a fairly high percentage of people who were at Nota at the time participated in the early stages of the project.
We named our product NetsPresso, as we are compressing the network. When a random model is selected, it is pruned for performance, then compressed, before being retrained to restore its original performance. The model is then converted to suit the target device, and measurements are made to assess the new and improved lightweight performance.
This is how the first trial of NetsPresso was created. Looking back, I can see now that I was maybe a bit too enthusiastic and naive, rushing in without anticipating any problems.
Not Everything Goes to Plan.
After establishing our trial version, we soon hit a bump in the road. Our ideas were great in theory, but they were customer dependent. To put it another way, we couldn't know beforehand which model customers would use with NetsPresso. And to makes things worse, there were other issues on top of this.
Our new technology could efficiently compress basic models, but if something with an unusual structure was put in front of us, it caused problems. Additionally, it was more difficult for us to retrain the models after compression than we had initially anticipated. Retraining is highly specific, and depends upon the model type and purpose, which was unique to every customer. In short, something we could never prepare for beforehand. We launched the NetsPresso trial with strict limits on the input models, and arguably failed to properly anticipate the varying requirements of our customers.
We very quickly found out just how much we had to improve. We had a grand vision, but the technology wasn't there yet.
After trying, and failing, to solve everything at once, we regrouped and got back to the basics of engineering. We started with the problems that could be readily solved, and progressed from there, day by day. This is how the modularization of NetsPresso began. From then on, NetsPresso has been divided into three modules.
The Model Compressor – for compressing.
The Model Searcher – for creating.
The Model Launcher – for converting and packaging.
Meeting Customer Demand.
Though our initial trial went well with existing models, a blind spot remained. We had focused on existing models to the determent of new creation and innovation. Compressing existing models is only one way to create a lightweight model. It is also possible to create a new model from scratch, and to provide it to the customer directly, In our second trial, we began to address this missing piece. We began using Neural Architecture Search (NAS) to find small models that performed well.
NAS does not rely on existing models from the customer, and instead works with user data to find the best lightweight model for each case. With this method, it was now possible for us to create new lightweight models, though at the expense of money and time.
🤔 Training an AI model is more expensive than you might think. A large part of that is the cost of renting servers with sufficient GPUs. NAS in particular, is an expensive method that runs multiple trials and incurs a high cost. If the process of finding a model begins to cost more than the customer is willing to pay, the service will quickly become unsustainable.
Trials are undoubtedly a great tool for finding new models, but the financial hit is significant. Conversely, fewer trials will save money, but has a lower probability of success. Achieving the right balance is key to finding good models at a low cost.
To facilitate this, we decided to lean upon our prior knowledge of AI to reduce the number of model exploration cases. We created a database of good models to draw from, where we can determine the processing speed of each model in relation to the target device. It has already been a great help, and we will continue to use it for the future iterations of our NetsPresso platform.
Building New Foundations
After many ups and downs, we finally had a second trial phase using NAS. The user response was more positive than our first trial, but still below our expectations. Despite our efforts to lower the cost of NAS, it wasn't a low enough trade-off compared with the level of customer satisfaction.
Overall though, the development of the Model Searcher module greatly contributed to the future development of NetsPresso. It allowed us to break free from the false belief that the user should provide the model that is to be compressed, solving many of our problems. We were suddenly now able to create our own models and to provide them to the customer directly. These new developments were a significant help in solving the problems of our first NetsPresso trial.
.
.
NetsPresso 1.0
Hopefully this has given you a better understanding of how we got to where we are now. Our first trial demonstrated our intuitive model compression technique, while our second tried to expand on that by designing and delivering models from scratch. NetsPresso 1.0 is the integration of our findings, to create a brand-new product. Next time round, I will talk about the process of that integration in practice. I hope you will join me!