爱可可AI论文推介(10月16日)-技术文章-醋醋百科网

AI - 人工智能 LG - 机器学习 CV - 计算机视觉 RO - 机器人

1、[LG]*DiffTune: Optimizing CPU Simulator Parameters with Learned Differentiable Surrogates

A Renda, Y Chen, C Mendis, M Carbin

[MIT CSAIL]

从粗粒度端到端测量数据学习x86基本块CPU模拟器参数(DiffTune)。给定一个模拟器，DiffTune的参数学习过程如下：用逼近原始函数的可微代用函数替代原始模拟器，用梯度优化产生模拟器参数值，以最小化模拟器在真实端到端性能测量数据集上的误差，最后将学习到的参数插入到原始模拟器。DiffTune只需要端到端监督即可在不可微程序内学习参数，能自动学习LLVM -mca(一种基于LLVM指令调度模型的基本块CPU仿真器)的Intel x86仿真模型的一整套微体系结构相关参数，所得模拟器精度高于用专家提供参数构建的模拟器。

CPU simulators are useful tools for modeling CPU execution behavior. However, they suffer from inaccuracies due to the cost and complexity of setting their fine-grained parameters, such as the latencies of individual instructions. This complexity arises from the expertise required to design benchmarks and measurement frameworks that can precisely measure the values of parameters at such fine granularity. In some cases, these parameters do not necessarily have a physical realization and are therefore fundamentally approximate, or even unmeasurable.
In this paper we present DiffTune, a system for learning the parameters of x86 basic block CPU simulators from coarse-grained end-to-end measurements. Given a simulator, DiffTune learns its parameters by first replacing the original simulator with a differentiable surrogate, another function that approximates the original function; by making the surrogate differentiable, DiffTune is then able to apply gradient-based optimization techniques even when the original function is non-differentiable, such as is the case with CPU simulators. With this differentiable surrogate, DiffTune then applies gradient-based optimization to produce values of the simulator's parameters that minimize the simulator's error on a dataset of ground truth end-to-end performance measurements. Finally, the learned parameters are plugged back into the original simulator.
DiffTune is able to automatically learn the entire set of microarchitecture-specific parameters within the Intel x86 simulation model of llvm-mca, a basic block CPU simulator based on LLVM's instruction scheduling model. DiffTune's learned parameters lead llvm-mca to an average error that not only matches but lowers that of its original, expert-provided parameter values.

https://weibo.com/1402400261/JposNcVEd

2、[LG]*Discrete Latent Space World Models for Reinforcement Learning

J Robine, T Uelwer, S Harmeling

[Heinrich Heine University Düsseldorf]

强化学习离散潜空间世界模型，提出了一种新的基于矢量量化变分自编码器(VQ-VAE)和卷积LSTM的世界模型神经网络结构，在这个更简单、更小的世界模型里训练无模型PPO智能体，提高了样本使用效率。与大得多的SimPLe(2019)模型相比，仅通过100K交互即可获得Atari上可媲美的性能水平。

Sample efficiency remains a fundamental issue of reinforcement learning. Model-based algorithms try to make better use of data by simulating the environment with a model. We propose a new neural network architecture for world models based on a vector quantized-variational autoencoder (VQ-VAE) to encode observations and a convolutional LSTM to predict the next embedding indices. A model-free PPO agent is trained purely on simulated experience from the world model. We adopt the setup introduced by Kaiser et al. (2020), which only allows 100K interactions with the real environment, and show that we reach better performance than their SimPLe algorithm in five out of six randomly selected Atari environments, while our model is significantly smaller.

https://weibo.com/1402400261/JpoBTd2Wi

3、[RO]*LaND: Learning to Navigate from Disengagements

G Kahn, P Abbeel, S Levine

[UC Berkeley]

LaND：从脱离中学习自主导航策略。跟随机器人的“训练”阶段(也可以通过摄像头远程观察) ，在机器人做出不希望动作(脱离)时阻止它（就像安全驾驶员阻止无人驾驶汽车一样），机器人结合督导信号和自身采集图像信息，学习得到神经网络模型，可预测在当前感官观察情况下哪些动作会导致脱离，在测试时改进规划并执行避免脱离的动作。LaND能成功地在各种真实人行道环境中学习自主导航，表现优于模仿学习和强化学习方法。

Consistently testing autonomous mobile robots in real world scenarios is a necessary aspect of developing autonomous navigation systems. Each time the human safety monitor disengages the robot's autonomy system due to the robot performing an undesirable maneuver, the autonomy developers gain insight into how to improve the autonomy system. However, we believe that these disengagements not only show where the system fails, which is useful for troubleshooting, but also provide a direct learning signal by which the robot can learn to navigate. We present a reinforcement learning approach for learning to navigate from disengagements, or LaND. LaND learns a neural network model that predicts which actions lead to disengagements given the current sensory observation, and then at test time plans and executes actions that avoid disengagements. Our results demonstrate LaND can successfully learn to navigate in diverse, real world sidewalk environments, outperforming both imitation learning and reinforcement learning approaches. Videos, code, and other material are available on our website this https URL

https://weibo.com/1402400261/JpoL43yRC

4、[LG]Learning Deep Features in Instrumental Variable Regression

L Xu, Y Chen, S Srinivasan, N d Freitas, A Doucet, A Gretton

[Gatsby Unit & DeepMind & University of Washington]

深度特征工具变量回归方法(DFIV)，提出了用深度网络自适应学习特征图、更好地表示因果推理中的工具变量，以及如何用它解决强化学习的Offline Policy，以弥合因果推断与神经网络和深度强化学习Offline Policy评价之间的鸿沟。

Instrumental variable (IV) regression is a standard strategy for learning causal relationships between confounded treatment and outcome variables by utilizing an instrumental variable, which is conditionally independent of the outcome given the treatment. In classical IV regression, learning proceeds in two stages: stage 1 performs linear regression from the instrument to the treatment; and stage 2 performs linear regression from the treatment to the outcome, conditioned on the instrument. We propose a novel method, {\it deep feature instrumental variable regression (DFIV)}, to address the case where relations between instruments, treatments, and outcomes may be nonlinear. In this case, deep neural nets are trained to define informative nonlinear features on the instruments and treatments. We propose an alternating training regime for these features to ensure good end-to-end performance when composing stages 1 and 2, thus obtaining highly flexible feature maps in a computationally efficient manner. DFIV outperforms recent state-of-the-art methods on challenging IV benchmarks, including settings involving high dimensional image data. DFIV also exhibits competitive performance in off-policy policy evaluation for reinforcement learning, which can be understood as an IV regression task.

https://weibo.com/1402400261/JpoSTiElK

5、[AI]Video Game Level Repair via Mixed Integer Linear Programming

H Zhang, M C. Fontaine, A K. Hoover, J Togelius, B Dilkina, S Nikolaidis

[University of Southern California & New York University & New Jersey Institute of Technology]

混合整数线性规划视频游戏关卡修复，提出了一个基于特定风格的可玩关卡自动生成-修复框架，用经人工编写示例训练的生成对抗网络(GAN)来构建关卡，用带有可玩性约束的混合整数线性规划(MIP)进行修复。该框架一个关键部分是计算GAN生成层和MIP求解器的解决方案间最小成本编辑，将其转换为最小成本网络流问题进行解决。实验结果显示，该框架生成了一系列不同的可玩关卡，捕捉到了人工创作关卡中所展示的对象间空间关系。

Recent advancements in procedural content generation via machine learning enable the generation of video-game levels that are aesthetically similar to human-authored examples. However, the generated levels are often unplayable without additional editing. We propose a generate-then-repair framework for automatic generation of playable levels adhering to specific styles. The framework constructs levels using a generative adversarial network (GAN) trained with human-authored examples and repairs them using a mixed-integer linear program (MIP) with playability constraints. A key component of the framework is computing minimum cost edits between the GAN generated level and the solution of the MIP solver, which we cast as a minimum cost network flow problem. Results show that the proposed framework generates a diverse range of playable levels, that capture the spatial relationships between objects exhibited in the human-authored levels.

https://weibo.com/1402400261/JpoXF5Zun

醋醋百科网

Good Luck To You!

爱可可AI论文推介(10月16日)