Avsnitt
-
Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data science goals.
Nir Malbin, a Snap research scientist at day and a Kaggle master at night shared with us some tips on how to start with Kaggle and his lessons learned on data competitions. Specifically - what can you take from Kaggle to real world problems and what shouldn't be taken. We discussed examples for data leakage that can help you win in competitions but would fail you in real world problems. As a bonus Nir has also agreed to share with us some of the tricks that helped his team win the "Home Depot Product Search Relevance" competition.
https://www.kaggle.com/c/home-depot-product-search-relevance/leaderboard -
How to learn from noisy data? Can you use free text to generate labels in an unsupervised manner? Jonathan Laserson, Lead AI researcher is Zebra Medical, tells us how they built the world's largest data set of chest X-ray images (1M) and trained a network that detects over 40 findings. Jonathan did his PhD in computer science at Stanford, where he specialized in Bayesian methods and probabilistic graphical models.
-
Saknas det avsnitt?
-
The field of privacy in machine learning is becoming increasingly important. With legislation like GDPR, it is becoming necessary for us, data scientists, to be mindful about privacy concerns related to the applications we develop. In this episode we interview Ran Gilad Bachrach, a researcher at Microsoft Research, that tells us about privacy in machine learning. We'll talk about differential privacy, about homomorphic encryption and how it enables training models on encrypted data, and about secure multi party computation - a field who's goal is to help different parties train models together, even when they can't share their data with one-another.
This episode is sponsored by sisense. They're hiring data scientists! find our more on https://www.sisense.com/careers/
-
Daniel Soudry is an assistant professor and a Taub Fellow at the Department of Electrical Engineering at the Technion. His first work focsed on Neuroscience, attempting to understand how neurons work in the brain. He then continued to a post-doc at Columbia University, where he discovered his interest in both the practical concerns and theory of deep neural networks. This episode focuses on Daniel's research work on questions such as how to make neural network work with low numerical precision, and when are SVM and Logistic Regression the same thing?
We also talk with him about his path in academia and the journey to discover his research interests.Things we discussed in this episode:
D. Soudry, E. Hoffer, M. Shpigel Nacson, S. Gunasekar, N. Srebro, "The Implicit Bias of Gradient Descent on Separable Data", ICLR + Accepted to JMLR, 2018.
https://en.wikipedia.org/wiki/Logistic_regression
https://en.wikipedia.org/wiki/Support_vector_machine
E. Hoffer, R. Banner, I. Golan, D. Soudry, "Norm matters: efficient and accurate normalization schemes in deep networks", NIPS 2018 (Spotlight)
R. Banner, I. Hubara, E. Hoffer, D. Soudry, “Scalable Methods for 8-bit Training of Neural Networks”, NIPS 2018.
Whole-brain imaging of neuronal activity in a larval zebrafish - https://www.youtube.com/watch?v=lppAwkek6DI
Simultaneous Denoising, Deconvolution, and Demixing of Calcium Imaging Data: https://www.cell.com/neuron/fulltext/S0896-6273(15)01084-3
-
AI and ML algorithms are becoming increasingly popular, being implemented in finance, health and law enforcement systems. Mistakes these algorithms can make can have tremendous impact on people’s lives, leading to many ethical and legal questions; how do we define fairness in this context? On what personal rights do these algorithms affect? How can people appeal decisions made by algorithms? These questions, in turn, pose computational challenges, like improving the explainability of algorithms and enforcing algorithmic fairness toward minority groups. In this episode we talk to Gal Yona, from Weitzmann institute, and Yafit Lev-Aretz, from City University of New York. Together they provide us with an introduction to the hot topic of fairness in AI, from computational and legal perspective.
-
Dafna Shahaf has so many cool research projects. In this episode we talked about a few of them - using metro maps to visualize information about events and storylines; an algorithm that judges jokes; a search engine that finds creative solutions using analogies; finding surprising facts in wikipedia.
She tells us how she comes up with these ideas and how she choses which ones to focus on.Resources:
Defna's Lab: http://www.hyadatalab.com/
Metro maps of information: http://metromaps.stanford.edu/
Innovation through Analogies (with Tom Hope and Karni Gilon):
http://www.hyadatalab.com/papers/analogy-kdd17.pdfhttps://arxiv.org/abs/1712.06880
Surprising facts on wikipedia: http://www.hyadatalab.com/papers/funfacts-wsdm17.pdf
Identifying humorous cartoon captions: http://www.hyadatalab.com/papers/pHumor.pdf
Ballpark Learning: https://arxiv.org/abs/1607.00034
Seq2Seq: https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf
=====
positions at intel: https://jobs.intel.com/ListJobs/ByKeyword/%23AA_ISR/
-
David Golan was on the track to becoming a university professor. Then something happened that made him changed route. He co-founded viz.ai - a company that helps doctors detect strokes quickly. We talked about How to use Deep Learning when you don't have a lot of data; about the challenges in building an AI oriented company in the field of medicine; and about how the time he spent as a researcher helps him as an entrepreneur.
-
Yair Mazor is the head of Data Science at windward. In this episode we talked about how to efficiently getting into new domain as a data scientist, about data science for maritime analytics and about different aspects in managing data science teams: the pros and cons of working as teams of data scientists vs. being a part of product and development teams; balancing between short and long term research goals; working with product managers and finally - can data scientists work in agile?
-
Ofra Amir did her PhD in computer science in Harvard, where she studied the interactions between humans and intelligent machines. In this episode we talk with Ofra about designing algorithms whose goal is to improve human performance in a given task, how to design metrics when some of your goals are not easily measurable, and how to explain the decisions of intelligent agents to humans.
Resources for the episode:
explainable AI:
Last year's workshop on explainable AI:
http://home.earthlink.net/~dwaha/research/meetings/ijcai17-xai/Last year's NIPS symposium on interpretable ML:
http://interpretable.ml/ICML 2016's workshop on interpretability
https://arxiv.org/html/1607.02531v2The paper we talked about that summarize agent's actions:
https://drive.google.com/file/d/11BfYioYLDzTxsCr0QkMAYGPUDh-qW-6u/viewHuman Computer Interaction and AI:
A nice paper about the connection between HCI and AI:
https://pdfs.semanticscholar.org/e22b/e3642660d6a779e477124cae7cbfdfa5b0a5.pdfA paper from a workshop about usable AI:
http://www.eecs.harvard.edu/~kgajos/papers/2008/kgajos-UsableAI08.pdfa classic paper about mixed-initiative interfaces - interfaces that use some sort of AI/ML. Addressing issues like how to handle uncertainty about user intents etc...:
http://erichorvitz.com/chi99horvitz.pdf -
Yo data scientists, Nimrod Priell has a very important message for you - Data Science is much more than just Machine Learning. It may sound trivial when you think about it, but yet it's very easy to forget these days. Nimrod, a Data Science team lead at facebook, told us about 7 different products of a data scientist (only one of which is a machine learning model!), about the two metrics he uses to evaluate a product, and how it helps him to manage data science projects and build a versatile and diverse team. We found this talk very interesting, and hope you do too.
-
In this episode we talked with Sefi Cohen, head of Data Science squad in the IDF. He tells us about the different projects they're doing and how he started a Data Science team with very little knowledge of Machine Learning and Data Science techniques, how he chose the right people for this team and how they all learned the craft together.
-
Uri Shalit did his Ph.d at the Hebrew University and a post doc in NYU. We talked about his research in machine learning for Health Care and what are the unique challenges in this field, about Causal Inference and how it is relevant to many machine learning problems, and about a cool study he did during his Ph.d about Motif identification in music.
Relevant resources for this episode:
Uri's tutorial about Causal Inference from ICML
www.cs.nyu.edu/~shalit/tutorial.html
An overview of a conference about Machine Learning for Health Care
http://irenechen.net/blog/2017/08/22/mlhc2017.html
Papers that were mentioned during the episode
Causal Inference for Recommender Systems: Recommendations as Treatments: Debiasing Learning and Evaluation. You can find it here: http://www.cs.cornell.edu/~schnabts/publications.htmlThe Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of UnobservablesModeling Musical Influence with Topic Models http://proceedings.mlr.press/v28/shalit13.pdfBackground Material
Logistic Regression from Andrew NG's course: https://www.youtube.com/watch?v=LLx4diIP83I
Super short introduction to regularization: https://towardsdatascience.com/over-fitting-and-regularization-64d16100f45c
Resources about L1 regularized logistic regression:
http://ai.stanford.edu/~ang/papers/icml04-l1l2.pdf
https://blog.alexlenail.me/what-is-the-difference-between-ridge-regression-the-lasso-and-elasticnet-ec19c71c9028
https://stats.stackexchange.com/questions/45643/why-l1-norm-for-sparse-models/159379#159379
-
On this episode we talked with Galit Bary Weisberg, a data scientist from Mobileye. Before Mobileye Galit was the first and only data scientist in a startup called Matific, which develops interactive games for teaching math. We talked about the challenges of being a sole data scientist in a company. Which errors startup companies usually do when building a data science team in a company and what one can do to make the process smoother. In addition we talked about Galit's work at Mobileye, how to implement in practice deep learning models and how does her day look like.
-
On the first episode we talked with Yoav Goldberg from Bar Ilan university about NLP, deep learning research, life in academia and that medium blog post that started a fire.
Opening tone: Avishai Cohen - Etzion Gever
More resources on this episode-
Yoav Goldberg is an NLP (Natural Languahe processing https://en.m.wikipedia.org/wiki/Natural_language_processing) researcher at Bar Ilan University, his official web page - http://u.cs.biu.ac.il/~yogo/.
The blog post Yoav Golberg published about the paper "adversarial generation of natural language" - https://medium.com/@yoav.goldberg/an-adversarial-review-of-adversarial-generation-of-natural-language-409ac3378bd7 and the original paper https://arxiv.org/abs/1705.10929 which was publish in arxiv https://arxiv.org/.
We've discussed the application of neural network models (Deep learning) to natural language data, here is a primer about this by Yoav Goldberg http://u.cs.biu.ac.il/~yogo/nnlp.pdf and the book Yoav wrote - http://www.morganclaypoolpublishers.com/catalog_Orig/product_info.php?cPath=22&products_id=1056. In this context we've mentioned RNN (Recurrent neural network) and LSTM (Long Short Term Memory networks) - here is a nice blog post to understand these http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
We've also mentioned in the episode SVM and kernels https://en.wikipedia.org/wiki/Kernel_method and this is an easy way to apply it with Python using scikit-learn library http://scikit-learn.org/stable/auto_examples/svm/plot_svm_kernels.html.
Yoav and his team developed a great Python library for deep learning in Python - Dynet - Dynamic Neural Network Toolkit https://github.com/clab/dynet. Other popular libraries for deep learning in Python are Tensorflow (developed by Google research https://www.tensorflow.org/) and Pytorch (developed by Facebook research http://pytorch.org/).
-
In this short episode we tell you a bit about this podcast and about ourselves.