Table 1
Skills of a Data Scientist (Udacity).
| Skills of a Data Scientist (Udacity) | |
|---|---|
| Basic Tools | Data Munging |
| Basic Statistics | Data Visualization & Communication |
| Machine Learning | Software Engineering |
| Multivariable Calculus and Linear Algebra | Thinking Like a Data Scientist |
Table 2
Technical skills and tools of a Data Scientist (Master’s in Data Science).
| Math (e.g. linear algebra, calculus and probability) |
| Statistics (e.g. hypothesis testing and summary statistics) |
| Machine learning tools and techniques (e.g. k-nearest neighbors, random forests, ensemble methods, etc.) |
| Software engineering skills (e.g. distributed computing, algorithms and data structures) |
| Data mining |
| Data cleaning and munging |
| Data visualization (e.g. ggplot and d3.js) and reporting techniques |
| Unstructured data techniques |
| R and/or SAS languages |
| SQL databases and database querying languages |
| Python (most common), C/C++ Java, Perl |
| Big data platforms like Hadoop, Hive & Pig |
| Cloud tools like Amazon S3 |
Table 3
Sampling of science research techniques being used.
| Science Research Technologies (Sampling) | ||
|---|---|---|
| In Atmospheric Research | In Hydrology Research | |
| Correlation Analysis; Bias Correlation | Spectral Analysis | Linear Regression |
| Regression Analysis; Bivariant Regression | Temporal Trending; Trend Analysis | Monte Carlo |
| Decision Tree | Spatial Interpolation | Darcy Equation |
| Machine Learning | Revised Averaging Scheme | Poisson Regression |
| Data Mining | Forward Modeling; Inverse Modeling | Multi-variate time series analysis |
| Data Fusion | Radiative Transfer Model | BUDYKO formula |
| Computational Tools | Baysian Synthesis Inversion | Smoothing (Gaussian) |
| Constrained Variational Analysis | Temporal Stability | Filtering (Destriping) |
| Model Simulations | Gaussian Distribution | MESH Model |
| Ratios | Exponential Differentiation | |
| Time Series Analysis | ||
Table 4
Earth Science data analytics goals.
| To calibrate data |
| To validate data (note it does not have to be via data intercomparison) |
| To assess data quality |
| To perform coarse data preparation (e.g. subsetting data, mining data, transforming data, recovering data) |
| To intercompare datasets (i.e. any data intercomparison; Could be used to better define validation/quality) |
| To tease out information from data |
| To glean knowledge from data and information |
| To forecast/predict/model phenomena (i.e. Special kind of conclusion) |
| To derive conclusions (i.e. that do not easily fall into another type) |
| To derive new analytics tools |
Table 5
Earth science data analytics techniques (sampling).
| Data Preparation | Data Reduction | Data Analysis |
|---|---|---|
| Bias Correction | Aggregation | Anomaly Detection |
| Coordinate Transformation | Anomaly Detection | Bayesian Techniques |
| Data Engineering | Cluster Analysis | Bivariant Regression |
| Data Mining | Data Engineering | Classification |
| Data Munging | Data Fusion | Correlation/Regression Analysis |
| Database Management | Factor Analysis | Factor Analysis |
| Exponential Differentiation | Filtering | Fourier Analysis |
| Filtering | Neural Networks | Gaussian Distribution |
| Format Conversion | Outlier Removal | Graphics Analysis |
| Imputation | Ratios | Imputation |
| Normalization/Transformation | Revised Averaging Scheme | Linear/Non-linear Regression |
| Outlier Removal | Rule Learning | Machine Learning/Decision Tree |
| Ratios | Time Series | Mathematics/Calculus |
| Rule Learning | Visualization | Modeling |
| Sensitivity Analysis | Monte Carlo Method | |
| Smoothing | Multi-variate Time Series | |
| Spatial Interpolation | Normalization | |
| Time Series | Pattern Recognition | |
| Visualization | Principal Component Analysis | |
| Revised Averaging Scheme | ||
| Rule Learning | ||
| Signal Processing | ||
| Spectral Analysis | ||
| Statistics | ||
| Temporal Trend Analysis | ||
| Time Series | ||
| Visualization |
