This repository holds data extraction, processing, and exploration code for Min-Yang's black sea bass projects.
It includes a datapull from CAMS and other sources, data exploration, and moderate amounts of data processing that is (hopefully) general to all projects.
Code to extract data from NEFSC Oracle databases will need to be run by a user with access. This code can be found in
├── READ-SSB-Lee-BSB-DataPull/
│ ├── R_code/
│ ├── data_extraction_processing
│ ├── extraction
│ ├── stata_code/
│ ├── data_extraction_processing
│ ├── extraction
All results of the data extraction code will be put into
├── READ-SSB-Lee-BSB-DataPull/
│ ├── data_folder/
│ ├── raw
This code supports
-
"Economic-informed stock assessments": Because the size of an individual fish determines the price of fish, we can invert this relationship to help fill in gaps when we do not sample the lengths of those fish. There are 5 prevailing BSB market categories: Jumbo, Large, Medium, Small, and Unclassified. From 2020 to 2023, 5 to 10% of commercial landings were in the “Unclassified” market category; but no fish in this category were measured. We train a Random Forest model to transactions data from 2015-2024 and use the results to predict the class of the Unclassified market category.
-
"Catch shares, Environmental variation, and Port choice": There are different regulations in each state. Three states have a catch share program. The others do not; these states have a wide range of possession limits. Gear restrictions, mostly mesh size (trawl) or vent size (pot), are similar, but also vary by state. How does the intersection of these regulations and changes in biomass due to environmental variation affect where people fish, how productive they are, and where they land their catch? This may be 2 or 3 projects.
-
Other Projects
Folder structure is mostly borrowed from the world bank's EDB. https://dimewiki.worldbank.org/wiki/Stata_Coding_Practices Try to use forward slashes (that is C:/path/to/your/folder) instead of backslashes for unix/mac compatability.
Your life will be easier if you organize things into a BSB_mega_folder because there are a few linked projects.
BSB_mega_folder/
├── READ-SSB-Lee-BSB-DataPull/ #Data pull, explore, background.
│ ├── data_folder/ # Shared data
│ ├── raw/
│ ├── external/
│ └── internal/
│ ├── intermediate/
│ └── main/
│ ├── R_code/
│ ├── stata_code/
│ └──more stuff/
├── READ-SSB-Lee-BlackSeaBass/ #Prices in stock assessment Repository
│ ├── READ-SSB-Lee-BlackSeaBass.Rproj
│ ├── data_folder
│ ├── data_raw/ # Raw data (minimal)
│ └── data_main/ # Final data specific to this project.
│ ├── results/
│ ├── R_code/
│ ├── stata_code/
│ └── README.md
├── PortChoice/ #Port Choice Repository
│ ├── PortChoice.Rproj
│ ├── data_folder
│ ├── data_raw/ # Raw data (minimal)
│ └── data_main/ # Final data specific to this project.
│ ├── results/
│ ├── R_code/
│ ├── stata_code/
│ └── README.md
Add this line of code to the profile.do that is executed on Stata's startup
global my_project_name "full/path/to/stata_code/project_logisitics/folder_setup_globals.do"
A stata do file containing folder names get stored as a macro in stata's startup profile.do. This lets me start working on any of my projects by opening stata and typing:
do $my_project_name
Rstudio users using projects don't have to do this step.
Before running any code, ensure the following are in place:
- Stata 15.1 or later
- ODBC connection to NEFSC/GARFO Oracle databases (requires Oracle client with ODBC drivers and NEFSC network access or on VPN)
- Database credentials stored in your
profile.doas$myNEFSC_USERS_conn(see Data, Oracle, passwords) - FRED API access for
extract_data_from_FRED.do(requires internet access, a freely obtained FRED API key, setting that API in stata withset fredkey) - MRIP raw data files in
$data_raw— files namedcatch_${year}*.dta,trip_${year}*.dta,size_b2_${year}*.dta(required only for Step 3) [TO DOCUMENT: add source/download instructions for MRIP data preparation] - Custom ado file
vintage_lookup_and_reset.ado— already included instata_code/ado/; loaded automatically byfolder_setup_globals.do
STEP 0 — Setup (required before anything else)
do stata_code/project_logistics/folder_setup_globals.do
Sets all directory globals and the vintage date string.
STEP 1 — Commercial data extraction (run 1A and 1B; order between them
does not matter, but both must finish before Step 2)
1A. do stata_code/data_extraction_processing/extraction/commercial/00_cams_extraction.do
Pulls CAMS landings, subtrip, and orphan records (1996–present).
Runtime: 1–2 hours.
1B. do stata_code/data_extraction_processing/extraction/commercial/01_extraction_wrapper.do
Pulls all other commercial data (15 scripts: permits, dealers,
transactions, gear, locations, FRED deflators, etc.).
Runtime: 30–60 minutes.
STEP 2 — Exploratory analysis (requires Step 1B outputs)
*** FIRST: open 00_exploratory_analysis_wrapper.do and update the
global in_string on line 1 to match the vintage string from your
Step 1 extraction run (format: YYYY_MM_DD). ***
do stata_code/analysis/00_exploratory_analysis_wrapper.do
Produces 70+ exploratory graphs in images/exploratory/.
Runtime: 10–20 minutes.
STEP 3 — Recreational data processing (independent of Steps 1–2)
Ensure MRIP raw files are in $data_raw before running.
do stata_code/data_extraction_processing/processing/recreational/batch_file_to_process_monthly_mrip_data.do
These files exist in the repository but are not called by any wrapper:
stata_code/data_extraction_processing/processing/recreational/domain_catch_frequencies_gom_month.do— experimental/deprecated; the call in the batch file is commented outstata_code/data_extraction_processing/extraction/commercial/tack_on_captains_and_ports.do— bridges data from an external "mobility" project; requires globals not defined in this repo; safe to ignore for the core BSB pipeline
| Issue | File | Action Required |
|---|---|---|
| Hardcoded vintage string | 00_exploratory_analysis_wrapper.do line 1 |
Update global in_string to match your extraction run date before Step 2 |
There are a handful of domain specific codes that are used. It would be better to pull them from the Oracle lookup tables, but I didn't do that because this felt like a one-off project. This section documents domain-specific codes used throughout the codebase. These codes appear in filtering and data-cleaning logic across 15+ files.
| ITIS TSN | Common Name | Used In |
|---|---|---|
| 167687 | Black Sea Bass (Centropristis striata) | 12+ files (primary filter) |
| 172735 | Summer Flounder | sfbsb_daily.do |
NESPP3 code for BSB: 335. [TO DOCUMENT: full NESPP3 list if needed]
The negear field contains NEFSC gear codes. Analysis scripts bin these into
five final categories. Source: stata_code/analysis/bsb_exploratory.do lines 53–92.
| Category | negear values |
|---|---|
| LineHand | 10, 20, 21, 30, 34, 40, 60, 62, 65, 66, 90, 220–230, 250, 251, 330, 340, 380, 410, 414, 420 |
| Trawl | 50–59, 71, 150, 160, 170, 350, 351, 353, 370, 450 |
| Gillnet | 100–117, 500, 520 |
| PotTrap | 80, 140, 142, 180–212, 240, 260, 270, 300–301, 320, 322 (includes weirs and pounds) |
| Misc | Dredge (381–383, 132, 400), Seine (70, 71, 120–124, 160, 360), Unknown (999) |
Dredge, Seine, and Unknown are first assigned their own categories, then rebinned into
Misc. The final analysis uses five categories: LineHand, Trawl, Gillnet, PotTrap, Misc.
Gear and market category definitions are embedded directly in the analysis scripts.
stata_code/analysis/bsb_exploratory.dois the primary reference (lines 53–97). The same logic appears inprices_by_category.do(market rebinning only) andbsb_exploratory_dealers.do(market rebinning, with Pee Wee kept as Extra Small rather than folded into Small).
BSB is sold in five size-based market categories. Raw dealer records contain
additional codes that are rebinned during processing.
Source: stata_code/analysis/bsb_exploratory.do lines 78–97.
| Final Code | Final Description | Raw Codes Rebinned In |
|---|---|---|
| JB | Jumbo | JB, XG (Extra Large) |
| LG | Large | LG |
| MD | Medium | MD, Medium Or Select |
| SQ | Small | SQ, PW (Pee Wee), ES (Extra Small) |
| UN | Unclassified | UN, MX (Mixed or Unsized) |
The stock assessment uses "SMALL.COMB" for Small Combined.
The Unclassified category (5–10% of landings 2020–2023) is the focus of the stock assessment price-prediction work.
Used in commercial_BSB.do and bsb_vessel_explorations.do to distinguish
state-permitted from federally-permitted vessels.
| Permit Value | Type | Notes |
|---|---|---|
| 000000 | State (no federal permit) | CAMSID constructed from permit+hullid+dealer fields; excluded from apportionment |
| 190998 | Vessel size class A | Dropped from vessel-level analysis |
| 290998 | Vessel size class B | Dropped from vessel-level analysis |
| 390998 | vessel size class C | Dropped from vessel-level analysis |
| 490998 | vessel size class D | Dropped from vessel-level analysis |
| All others | Federal | 6-digit federal permit number |
The 998 permits correspond to vessels with an unknown/no permit, but in a particular size bin.
The status field in CAMS records describes how dealer (CFDERS) and vessel
trip (VTR) records were matched.
Source: stata_code/analysis/bsb_exploratory_dealers.do lines 155–171.
| Status Code | Meaning |
|---|---|
| MATCH | Records fully match at the CAMSID–ITIS_GROUP1 level |
| DLR_ORPHAN_SPECIES | Matching CAMSID but ITIS_GROUP1 in CFDERS does not appear on the VTR |
| DLR_ORPHAN_TRIP | Dealer trip with no matching VTR trip |
| VTR_ORPHAN_SPECIES | Matching CAMSID but ITIS_GROUP1 on VTR does not appear in CFDERS |
| VTR_ORPHAN_TRIP | VTR trip with no matching trip in CFDERS |
| VTR_NOT_SOLD | VTR record for bait/home consumption; not sold to dealer; not in CFDERS |
| PZERO | PERMIT = '000000'; excluded from apportionment and imputation |
In order to run this code, you need to be able to select on various NEFSC oracle tables. For stata, you will need to assemble an oracle connection string into the global myNEFSC_USERS_conn. The best way to do that is to assemble that in your .profile.do that is run on startup.
Basically, you will want to store them in a place that does not get uploaded to github.
For stata users, there is a description here.
For R users, try setting and storing information in a keyring using the package keyring::key_set() You can read them in using keyring::key_get()
If you can encrypt your .Rprofile, that another solution for passwords, API keys, and tokens.
This repository is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA GitHub project code is provided on an ‘as is’ basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this GitHub project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.”
- who worked on this project: Min-Yang Lee
- when this project was created: Summer 2024
- what the project does: Black Sea bass related projects
- why the project is useful: Black Sea bass is awesome
- how users can get started with the project: Download and follow the readme
- where users can get help with your project: email at Min-Yang.Lee@noaa.gov or open an issue
- who maintains and contributes to the project. Min-Yang
See here for the license file