Fields ranging from robotics to medicine to political science are attempting to train AI systems to make meaningful decisions of all kinds. For exampleusing an AI system to intelligently control traffic in a congested city could help motorists reach their destinations fasterwhile improving safety or sustainability.
Unfortunatelyteaching an AI system to make good decisions is no easy task.
Reinforcement learning modelswhich underlie these AI decision-making systemsstill often fail when faced with even small variations in the tasks they are trained to perform. In the case of traffica model might struggle to control a set of intersections with different speed limitsnumbers of lanesor traffic patterns.
To boost the reliability of reinforcement learning models for complex tasks with variabilityMIT researchers have introduced a more efficient algorithm for training them.
The algorithm strategically selects the best tasks for training an AI agent so it can effectively perform all tasks in a collection of related tasks. In the case of traffic signal controleach task could be one intersection in a task space that includes all intersections in the city.
By focusing on a smaller number of intersections that contribute the most to the algorithm’s overall effectivenessthis method maximizes performance while keeping the training cost low.
The researchers found that their technique was between five and 50 times more efficient than standard approaches on an array of simulated tasks. This gain in efficiency helps the algorithm learn a better solution in a faster mannerultimately improving the performance of the AI agent.
“We were able to see incredible performance improvementswith a very simple algorithmby thinking outside the box. An algorithm that is not very complicated stands a better chance of being adopted by the community because it is easier to implement and easier for others to understand,” says senior author Cathy Wuthe Thomas D. and Virginia W. Cabot Career Development Associate Professor in Civil and Environmental Engineering (CEE) and the Institute for DataSystemsand Society (IDSS)and a member of the Laboratory for Information and Decision Systems (LIDS).
She is joined on the paper by lead author Jung-Hoon Choa CEE graduate student; Vindula Jayawardanaa graduate student in the Department of Electrical Engineering and Computer Science (EECS); and Sirui Lian IDSS graduate student. The research will be presented at the Conference on Neural Information Processing Systems.
Finding a middle ground
To train an algorithm to control traffic lights at many intersections in a cityan engineer would typically choose between two main approaches. She can train one algorithm for each intersection independentlyusing only that intersection’s dataor train a larger algorithm using data from all intersections and then apply it to each one.
But each approach comes with its share of downsides. Training a separate algorithm for each task (such as a given intersection) is a time-consuming process that requires an enormous amount of data and computationwhile training one algorithm for all tasks often leads to subpar performance.
Wu and her collaborators sought a sweet spot between these two approaches.
For their methodthey choose a subset of tasks and train one algorithm for each task independently. Importantlythey strategically select individual tasks which are most likely to improve the algorithm’s overall performance on all tasks.
They leverage a common trick from the reinforcement learning field called zero-shot transfer learningin which an already trained model is applied to a new task without being further trained. With transfer learningthe model often performs remarkably well on the new neighbor task.
“We know it would be ideal to train on all the tasksbut we wondered if we could get away with training on a subset of those tasksapply the result to all the tasksand still see a performance increase,” Wu says.
To identify which tasks they should select to maximize expected performancethe researchers developed an algorithm called Model-Based Transfer Learning (MBTL).
The MBTL algorithm has two pieces. For oneit models how well each algorithm would perform if it were trained independently on one task. Then it models how much each algorithm’s performance would degrade if it were transferred to each other taska concept known as generalization performance.
Explicitly modeling generalization performance allows MBTL to estimate the value of training on a new task.
MBTL does this sequentiallychoosing the task which leads to the highest performance gain firstthen selecting additional tasks that provide the biggest subsequent marginal improvements to overall performance.
Since MBTL only focuses on the most promising tasksit can dramatically improve the efficiency of the training process.
Reducing training costs
When the researchers tested this technique on simulated tasksincluding controlling traffic signalsmanaging real-time speed advisoriesand executing several classic control tasksit was five to 50 times more efficient than other methods.
This means they could arrive at the same solution by training on far less data. For instancewith a 50x efficiency boostthe MBTL algorithm could train on just two tasks and achieve the same performance as a standard method which uses data from 100 tasks.
“From the perspective of the two main approachesthat means data from the other 98 tasks was not necessary or that training on all 100 tasks is confusing to the algorithmso the performance ends up worse than ours,” Wu says.
With MBTLadding even a small amount of additional training time could lead to much better performance.
In the futurethe researchers plan to design MBTL algorithms that can extend to more complex problemssuch as high-dimensional task spaces. They are also interested in applying their approach to real-world problemsespecially in next-generation mobility systems.
The research is fundedin partby a National Science Foundation CAREER Awardthe Kwanjeong Educational Foundation PhD Scholarship Programand an Amazon Robotics PhD Fellowship.