Consolidate and clean-up Fetch CSV file interactions

## Problem
1. Only the GCS fetch script benefits from CSV file initialization. The arXiv script shouldn't have copied this pattern. It should be removed:
  - https://github.com/creativecommons/quantifying/blob/34c1caaf4f776e99bfbea9de4ac0bfe5ae4e9f4c/scripts/1-fetch/arxiv_fetch.py#L128-L135
2. The various fetch scripts duplicate a lot of code between them when they save their data:
   -  https://github.com/creativecommons/quantifying/blob/34c1caaf4f776e99bfbea9de4ac0bfe5ae4e9f4c/scripts/1-fetch/arxiv_fetch.py#L475-L485
   - https://github.com/creativecommons/quantifying/blob/34c1caaf4f776e99bfbea9de4ac0bfe5ae4e9f4c/scripts/1-fetch/gcs_fetch.py#L181-L185
   - https://github.com/creativecommons/quantifying/blob/34c1caaf4f776e99bfbea9de4ac0bfe5ae4e9f4c/scripts/1-fetch/github_fetch.py#L92-L98
   - https://github.com/creativecommons/quantifying/blob/34c1caaf4f776e99bfbea9de4ac0bfe5ae4e9f4c/scripts/1-fetch/openverse_fetch.py#L195-L207
   - https://github.com/creativecommons/quantifying/blob/34c1caaf4f776e99bfbea9de4ac0bfe5ae4e9f4c/scripts/1-fetch/smithsonian_fetch.py#L98-L121
   - https://github.com/creativecommons/quantifying/blob/34c1caaf4f776e99bfbea9de4ac0bfe5ae4e9f4c/scripts/1-fetch/wikipedia_fetch.py#L78-L91

## Description
1. Add `rows_to_csv()` function to shared library (`shared.py`)
   - New function should check `args.enable_save`
   - New function should "Create data directory for this phase"
   - New function _shoudn't_ `return args`
     - None of the curernt functions that return `args` modify `args`--there's no reason to return it
   - GCS fetch script only rights a single row, but it can send a list with a single row
   - Update fetch scripts to use new function
   - Test fetch scripts to verify they behave as intended
2. Rename `data_to_csv()` function to `dataframe_to_csv()`
   - Update process scripts to use new name

## Additional context
- [Abstraction principle (computer programming) - Wikipedia](https://en.wikipedia.org/wiki/Abstraction_principle_(computer_programming))


	def initialize_data_file(file_path, headers):
	"""Initialize CSV file with headers if it doesn't exist."""
	if not os.path.isfile(file_path):
	with open(file_path, "w", encoding="utf-8", newline="\n") as file_obj:
	writer = csv.DictWriter(
	file_obj, fieldnames=headers, dialect="unix"
	)
	writer.writeheader()

	def rows_to_csv(args, fieldnames, rows, file_path):
	if not args.enable_save:
	return args

	with open(file_path, "w", encoding="utf-8", newline="\n") as file_handle:
	writer = csv.DictWriter(
	file_handle, fieldnames=fieldnames, dialect="unix"
	)
	writer.writeheader()
	for row in rows:
	writer.writerow(row)

	with open(file_path, "a", encoding="utf-8", newline="\n") as file_obj:
	writer = csv.DictWriter(
	file_obj, fieldnames=fieldnames, dialect="unix"
	)
	writer.writerow(row)

	with open(FILE1_COUNT, "w", encoding="utf-8", newline="\n") as file_obj:
	writer = csv.DictWriter(
	file_obj, fieldnames=HEADER1_COUNT, dialect="unix"
	)
	writer.writeheader()
	for row in tool_data:
	writer.writerow(row)

	def write_data(args, data):
	if not args.enable_save:
	return
	os.makedirs(PATHS["data_phase"], exist_ok=True)
	with open(FILE_PATH, "w", encoding="utf-8", newline="") as file_obj:
	writer = csv.DictWriter(
	file_obj,
	fieldnames=OPENVERSE_FIELDS,
	dialect="unix",
	)
	writer.writeheader()
	for row in data:
	writer.writerow(row)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consolidate and clean-up Fetch CSV file interactions #282

Problem

Description

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	def write_data(args, data_metrics, data_units):
	if not args.enable_save:
	return args

	# Create data directory for this phase
	os.makedirs(PATHS["data_phase"], exist_ok=True)

	with open(FILE_1_METRICS, "w", encoding="utf-8", newline="\n") as file_obj:
	writer = csv.DictWriter(
	file_obj, fieldnames=HEADER_1_METRICS, dialect="unix"
	)
	writer.writeheader()
	for row in data_metrics:
	writer.writerow(row)

	with open(FILE_2_UNITS, "w", encoding="utf-8", newline="\n") as file_obj:
	writer = csv.DictWriter(
	file_obj, fieldnames=HEADER_2_UNITS, dialect="unix"
	)
	writer.writeheader()
	for row in data_units:
	writer.writerow(row)

	return args

	def write_data(args, tool_data):
	if not args.enable_save:
	return args
	LOGGER.info("Saving fetched data")
	os.makedirs(PATHS["data_phase"], exist_ok=True)

	with open(FILE_LANGUAGES, "w", encoding="utf-8", newline="\n") as file_obj:
	writer = csv.DictWriter(
	file_obj, fieldnames=HEADER_LANGUAGES, dialect="unix"
	)
	writer.writeheader()
	for row in tool_data:
	writer.writerow(row)
	return args

Uh oh!

Consolidate and clean-up Fetch CSV file interactions #282

Description

Problem

Description

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions