Feature Selection with Deep Neural Networks - Ofir Lindenbaum (ICML 2020)

The talk is based on the paper: “Feature Selection using Stochastic Gates” recently published at ICML 2020. In this talk, Ofir, the paper's author, will present a solution for using NNs for feature selection. Feature selection is an important problem in machine learning, and it can lead to several benefits, such as interpretability, reduced overfitting, and computational complexity. He will explain the derivation of the and demonstrate its use with several examples. References to everything covered in the lecture:   / references_from_lecture_feature_selection_...   git: https://github.com/runopti/stg arxiv: https://arxiv.org/abs/1810.04247 00:00 Intro 03:44 Learning from Empirical Data 12:34 Feature Selection: Non-Linear Functions 16:24 Probabilistic Feature Selection 29:45 Gaussian STochastic Gates (STG) 33:41 Experiments: Linear Regression 38:41 Experiments: Real Data 58:13 Unsupervised Learning 01:01:00 Unsupervised Feature Selection 01:04:48 Differentiable Unsupervised Feature Selection (DUFS) 01:07:19 DUFS: Noisy Two-moons 01:08:36 DUFS: Real Data 01:10:36 DUFS: Image data 01:11:20 STG: Conclusion 01:12:40 Discussion 01:21:49 15 features which are random linear combinations of the 5 informative features (extra slide) [Chapters were auto-generated using our proprietary software - contact us if you are interested in access to the software] Lecture abstract: Feature selection problems have been extensively studied in the setting of linear estimation (e.g. LASSO), but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in neural network estimation problems. The new procedure is based on probabilistic relaxation of the L0 norm of features or the count of the number of selected features. Our L0-based regularization relies on a continuous relaxation of the Bernoulli distribution; such relaxation allows our model to learn the parameters of the approximate Bernoulli distributions via gradient descent. The proposed framework simultaneously learns either a nonlinear regression or classification function while selecting a small subset of features. We provide an information-theoretic justification for incorporating Bernoulli distribution into feature selection. Furthermore, we evaluate our method using synthetic and real-life data to demonstrate that our approach outperforms other commonly used methods in both predictive performance and feature selection Presenter BIO: Ofir Lindenbaum is a postdoc fellow at Yale University at the applied math department, working with Prof. Ronald Coifman and Prof. Yuval Kluger. His primary research is in the field of machine learning and computational biology. He pursues research on signal processing, music and audio analysis, manifold learning, spectral methods for data mining, and dimensionality reduction. His Website: https://ofirlin.wixsite.com/ofirlinde... ------------------------- Find us at: Newsletter for updates about more events ➜ http://eepurl.com/gJ1t-D Sub-reddit for discussions ➜   / 2d3dai   Discord server for, well, discord ➜   / discord   Blog ➜ https://2d3d.ai We are the people behind the AI consultancy Abelians ➜ https://abelians.com/