This technical short course, facilitated by the EarthScope Consortium, is designed to introduce participants to MsPASS, a powerful, modern framework for processing seismology data on desktop, High-Performance Computing (HPC), and cloud systems. The course aims to teach seismologists how to effectively manage waveform, source, and receiver metadata using MongoDB. The course will also help students understand modern computing cluster concepts and learn how to utilize MsPASS to parallel-process large data sets. Students will initially use the GeoLab gateway to run initial background tutorials. Students will then run a larger processing job on the TACC Frontera HPC system using a recently developed SCOPED Gateway. The final session will be devoted to cloud computing and working with prototype systems to directly access the Earthscope seismology data archive on AWS..
Time: A mixture of lectures, tutorials, and office hours between a 3 and 5 day period.
Primary Audience: Seismology graduate students, post-doctoral scholars, and early career scientists
Secondary Audience: More senior seismologists interested in big data problems or other geophysicists interested in big data problems.
Learning Objectives:
By the end of this course learners will be able to:
- Understand the fundamentals of MongoDB for managing waveform, station, and source metadata.
- Learn how to access and process EarthScope seismic data stored on AWS S3.
- Gain experience with cloud and HPC computing environments for large-scale seismology data analysis.
- Develop proficiency in Python and Jupyter notebooks for seismic data processing.
- Implement data workflows using MsPASS for reproducible research.
Participant Commitment
- Attendance at all sessions is required.
- Students should commit at least 2 additional hours per day for homework assignments.
Prerequisites
- Interest in big data problems in seismology and a background sufficient to comprehend concepts used in MsPASS.
- Familiarity with the Python programming language is important for success in the course. Students with less experience in Python will have the opportunity to complete a refresher before the course begins.
- Proficient knowledge in jupyter notebooks needed to work effectively on GeoLab.
- A practical understanding of basic signal processing.
- Experience of basic data analysis with other common tools like SAC and obspy.
Computer and Data
- Internet connection sufficient for Zoom-based instruction and connectivity to GeoLab and the TACC Gateway.
- A desktop computer (Windows, Mac, or Linux) with at least 8GB RAM is recommended but not required to allow local work running MsPASS via the software package called Docker or via a local python environment using Anaconda.
Brief Agenda
Prior to the start of the course, students will be required to complete three tutorials to provide a common background in three areas:
- Python programming fundamentals
- Generic Database concepts and a review of methods for Metadata and waveform data management used previously in seismology
- Fundamental concepts of parallel computing
The course agenda for the week of the course is as follows:
Day 1 | Lecture: MsPASS Overview Tutorial: Running a simple workflow on GeoLab Lecture: Using MongoDB to manage waveform data and related Metadata Tutorial: MongoDB and MsPASS |
Day 2 | Office Hours: Homework overview and help session |
Day 3 | Lecture: Parallel computing and HPC clusters Tutorial: Run large parallel job on Frontera Lecture: Generating a parallel workflow from a serial workflow Tutorial: Adapting serial program from Day 1 to parallel process to run on Frontera |
Day 4 | Office Hours: Homework overview and help session |
Day 5 | Lecture: Cloud computing and MsPASS cloud prototype Tutorial: Create and run a parallel job on GeoLab directly access Earthscope archive Lecture: Review of Data Service plans for access and how MsPASS will work with the \new system as it is brought fully online |
After the final session students will have the opportunity to do an independent project to use MsPASS for their research. Students can complete that work using either a local desktop machine or GeoLab as appropriate.
Assessment:
- Homework assignments graded after each session.
- Final project option for interested participants.
- Scheduled help sessions via Zoom for additional support.
Instructors
- Gary Pavlis, Indiana University
- Ian (Yinzhi) Wang, The University of Texas at Austin
- Sarah Wilson, EarthScope