Open Data Science Conference 2022 has happened in Boston this week. Conference featured panels, workshops, presentations and a vendor expo. I attended the 3 days and here are some impressions.
Most popular subjects were:
- Algorithmically infused societies, ethics and fairness in Machine Learning and AI
- Math, statistics, data wrangling, estimator modeling and error handling
- MLOps and the Engineering side of the whole Data Science pipeline
- Data Science and Decision Making
On the first subject, Northeastern University Professor Tina Eliassi-Rad delivered the most outstanding (and scary) speech. Stating that the ML and AI practices are nowadays completely relieved from liability when it comes to bad consequences of the usage of estimators and algorithms. In an analogy with prescription drugs, she cited works by Margaret Mitchell and others where ML models must be “accompanied by model cards (example) which are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups and intersectional groups that are relevant to the intended application domains.” AI has its dangers and it is like the Wild West – that’s me trying to recreate her final words –, and regulation might be coming in Europe and Canada, but not really in the US.
Dr. Krush Varshney, a researcher from IBM, also delved into these very important subjects, wrote a book about it, and presented a series of IBM free tools, called Trusted AI, that help identify and reduce bias from ML products, covering fairness, explainability, privacy and other concerns.
There were hands-on sessions by Matt Harrison, Allen Downey, Andras Zsom about Pandas best practices, XGBoost, Bayesian decision analysis, ML Interpretability with SHAP, and others. I found those very helpful and I’m glad I attended their sessions. They all have books and courses published, which I recommend.
There were many demos by vendors as Cloudera, Red Hat and others about their Data Science offerings which are a mix of scalable Jupyter Notebook as a service in the cloud (which I like) with continuous delivery of ML models into APIs (which I don’t like because I think it promotes very bad practices).
It became clear that Data Science is a field where talent, knowledge, Open Source and, more recently, ethics, are the most important assets. Currently, products don’t have much space here and this explains why the expo – showcase of products – was not exciting or attractive at all, focused mostly on the Data Engineering aspects of the data lifecycle – MLOps, data pipelines, storage, databases. No AWS, Google, IBM or Microsoft; the only big vendor in there was HP showcasing a PC preloaded with Windows, Jupyter and some other Open Source Data Science tools. I’ve found it silly, as if it is difficult for a DS to install those herself/himself. Other vendors showcased data infrastructure-oriented solutions.
Strangely, the Data Analyst role was completely forgotten in this conference. No sessions and no product showcase that I’ve seen targeted this extremely important professional. While Data Scientists have strong predictive statistics and programing skills, used to create software data products that aim optimization of each business transaction, while Data Engineers are concerned about data quality, flow and availability, Data Analysts are the professionals in charge of helping executives use data to make strategic decisions. Data Analysts are the masters of data visualization, descriptive statistics and storytelling. Data Analysts own the Business Intelligence platform, work much closer to business executives and the meetings on which they decisions are cast. Data Scientists can help in here, but Data Analysts are the rulers of this part of the data cycle.
As learned from the CoViD-19 years, the conference also has a free online version with hundreds of recorded presentations at live.odsc.com. Need to find time to attend those too.