Have a personal or library account? Click to login
Astronomy in the Big Data Era Cover

Astronomy in the Big Data Era

By: Yanxia Zhang and  Yongheng Zhao  
Open Access
|May 2015

Figures & Tables

Table 1

Data volumes of different sky survey projects.

Sky Survey ProjectsData Volume
DPOSS (The Palomar Digital Sky Survey)3 TB
2MASS (The Two Micron All-Sky Survey)10 TB
GBT (Green Bank Telescope)20 PB
GALEX (The Galaxy Evolution Explorer)30 TB
SDSS (The Sloan Digital Sky Survey)40 TB
SkyMapper Southern Sky Survey500 TB
PanSTARRS (The Panoramic Survey Telescope and Rapid Response System)~ 40 PB expected
LSST (The Large Synoptic Survey Telescope)~ 200 PB expected
SKA (The Square Kilometer Array)~ 4.6 EB expected
Table 2

Applied approaches as well as their applications for the main data mining tasks in astronomy.

Data Mining TasksApplied ApproachesApplications in Astronomy
ClassificationArtificial Neural Networks (ANN)
Support Vector Machines (SVM)
Learning Vector Quantization (LVQ)
Decision Trees
Random Forest
K-Nearest Neighbors
Naïve Bayesian Networks
Radial Basis Function Network
Gaussian Process
Decision Table
ADTree
Known knowns:
– Spectral classification (stars, galaxies, quasars, supernovas)
– Photometric classification (stars and galaxies, stars and quasars, supernovas)
– Morphological classification of galaxies
– Solar activity
RegressionArtificial Neural Networks (ANN)
Support Vector Regression (SVR)
Decision Trees
Random Forest
K-Nearest Neighbor Regression
Kernel Regression
Principal Component Regression (PCR)
Gaussian Process
Least Squared Regression
Random Forest
Partial Least Squares
Known unknowns:
– Photometric redshifts (galaxies, quasars)
– Stellar physical parameter measurement ([Fe/H], Teff, logg)
ClusteringPrincipal Component Analysis (PCA)
DBScan
K-Means
OPTICS
Cobweb
Self Organizing Map (SOM)
Expectation Maximization
Hierarchical Clustering
AutoClass
Gaussian Mixture Modeling (GMM)
Unknown unknowns:
– Classification
– Special/rare object detection
Outlier Detection or Anomaly DetectionPrincipal Component Analysis (PCA)
K-Means
Expectation Maximization
Hierarchical Clustering
One-Class SVM
Unknown unknowns:
– Special/rare object detection
Time-Series AnalysisArtificial Neural Networks (ANN)
Support Vector Machines (SVM)
Random Forest
Known unknowns:
– Novel detection
– Trend prediction
Table 3

Feature selection/extraction methods.

Feature selection/extractionApplied approachesApplications in astronomy
Feature SelectionBest First
Exhaustive Search
Greedy Stepwise
Random Search
Rank Search
Race Search
Genetic Search
Random Forest
ReliefF
Fisher Filtering
Other wrapper methods
– Reducing dimension
– Choose effective features
Feature ExtractionPrincipal Component Analysis (PCA)
Independent Component Analysis (ICA)
Linear discriminant analysis (LDA)
Latent semantic index (LSI)
Singular Value Decomposition (SVD)
Multidimensional Scaling (MDS)
Partial Least Squares (PLS)
Locally Linear Embedding (LLE)
ISOMAP
Factor analysis
Kernel LDA
Kernel PCA
Kernel Partial Least Squares (KPLS)
– Noise reduction/removal
– Reducing dimension
Table 4

Astrostatistics and astroinformatics organizations.

OrganizationUnder community or projectFoundation TimeChair
International Astrostatistics Association (IAA)The International Statistical Institute (ISI)August 2012Joseph Hilbe
IAU Working Group in Astrostatistics and AstroinformaticsThe International Astronomical Union (IAU)August 2012Eric Feigelson
AAS Working Group in Astroinformatics and AstrostatisticsThe American Astronomical Society (AAS)June 2012Zeljko Ivezic
ASA Interest Group in AstrostatisticsThe American Statistical Association (ASA)March 2014Jessi Cisnewski
LSST Informatics and Statistics Science CollaborationThe Large Synoptic Survey Telescope (LSST)Under constructionKirk Borne
IAA Working Group on Cosmostatistics (renamed Cosmostatistics Initiative, short for COIN)The International Astrostatistics Association (IAA)April 2014Rafael de Souza
Language: English
Published on: May 22, 2015
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2015 Yanxia Zhang, Yongheng Zhao, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.