Assessing the quality of Massive Spectroscopic Surveys

Massive spectroscopic surveys aimed at observing tens of millions of stars and galaxies are becoming increasingly common in the observational landscape of the 2020 decade. For instance, a single night of observation with the Dark Energy Spectroscopic Instrument (DESI) can generate as many as 100,000 spectra, each sampled over approximately 2,000 wavelength points. However, assessing the quality of such a massive data flow can be challenging, and requires new approaches to complement the traditional visual inspection by humans.


To address this challenge, we have explored the use of the Uniform Manifold Approximation and Projection (UMAP) technique to assess the data quality of DESI in this project. Specifically, we used UMAP to project DESI nightly data into a 2-dimensional space, where we are sometimes able to identify a small number of outliers. Upon visual inspection of these outliers, we found that they correspond to instrument fluctuations that can be fully diagnosed by examining the raw data, leading to an appropriate solution through data re-processing. These findings pave the way for using machine learning techniques to automatically monitor the health of massive spectroscopic surveys.

You can watch this talk on Youtube, which was introduced at the IAUGA 2022, for more information.

Additionally, you can help us identify the features that produce the classification of DESI spectra as outliers on this page. These spectra were identified during the observation night of 6th July 2021 (20210706).