2020 list of desired hard skills for data professionals. From the most essential to the more difficult ones.
- The English language
- SQL
- Spreadsheets
- Descriptive Statistics (median, variance, correlation etc)
- Notions of Data visualization
- Notions of Time Series
- Handling computer files and folders (this one entered the list because we observed many people simply don’t have it)
- Notions of digital information storage (numbers and their limits, time, time zones, text, Unicode, compression)
- Probability
- Probability Distributions
- Linear and Logistic Regressions
- Python libraries ecosystem, pip, PyPi
- Python’s Pandas, DataFrame and Series wrangling
- Linux and the computer command line
- NoSQL, JSON, YAML, XML, SVG, APIs, HTTP, protocols and data representation
- Cloud and infrastructure as code
- Notions of symmetric and asymmetric cryptography, digital signatures and applications
- “Big data” systems (Hadoop, Spark)
- Software Engineering (classes, modularisation, versioning, containerisation, packaging, DevOps)
- Inferential Statistics (confidence intervals, hypothesis testing)
- Machine Learning algorithms for regression and classification
- Calculus and Numerical Calculus (integrals, derivaties)
- Natural Language Processing
- Computer vision
- Neural Networks
Please remember this list has only hard skills. Ethics, domain and industry knowledge, communication are very important soft skills that won’t fit in this list.
Generally speaking, beginning of the list is where Data Analysts are (up to ≈11). Data Engineers get up to the middle of list (up to ≈18). And Scientists get all the list.
There is also the following graph that I’ve produced: