Updates to cec-dataprep #64

aunshx · 2024-11-26T02:09:39Z

Updates for FRREDSS 2.0

Updated schema for db
Two new files have been created to add the processed data into the db

split_csv.py - This file will split the processed data into county_year.csv files, reprocess it (remove col discrepancies ) and store in a folder split_files. This folder has been uploaded to box. This code is to be run only once as follows:
python split_csv.py path_of_the_processed_data_file.csv
process_uploads.py - This file will take data from the split_files folder and add it to the treatedclusters db one by one and moves the upload files to the upload_completed folder. Checks whether county+year already exists. To run it:
python process_uploads.py path_of_the_split_files_folder

srkirkland · 2024-11-26T20:55:50Z

sql/db_tables.sql

 );

-create index idx_find_clusters
+-- Index on the treatedclusters table 
+CREATE TABLE idx_find_clusters


@aunshx this doesn't look right, i think it should be an index not table

srkirkland · 2024-11-26T20:58:37Z

sql/db_tables.sql

+	land_use text, 
+	forest_type text, 
+	haz_class int4, 
+	"Stem6to9_tonsAcre" double precision, 


We should use the same naming as the rest of the variables, so stem6to9_tonsAcre for this one. It hopefully wouldn't affect the import, but if it does i still thing i'd be better to handle the difference during import and not have 2 different naming schemes within one table

srkirkland · 2024-11-26T20:59:31Z

sql/db_tables.sql

+	"Stem9Plus_tonsAcre" double precision, 
+	"Branch_tonsAcre" double precision, 
+	"Foliage_tonsAcre" double precision,
+	wood_density float4


Thinking maybe make this double precision like the others. I'm sure it could technically fit in a float4 but I don't think the difference is consequential so better to pick one and go with it.

srkirkland · 2024-11-26T21:01:18Z

process_uploads.py

+    for filename in csv_files:
+        file_path = os.path.join(split_dir, filename)
+
+        county, year = filename.replace('.csv', '').split('_')


probably worth checking you got back real values for these -- otherwise an extra file put into this directory will kill the whole thing

srkirkland · 2024-11-26T21:06:06Z

split_csv.py

+            row = row[:15] + row[16:]  # Remove the extra field
+
+        county = row[13]
+        year = row[2]


using the csv reader is there not a way to get the year and county by header name instead of needing to make sure the year is always in column 3?

Updated sql tables

115a1c5

aunshx self-assigned this Nov 26, 2024

aunshx added 2 commits November 26, 2024 12:39

More updates

1d043ca

Final changes

b812220

aunshx changed the title ~~Updated sql tables~~ Updates to cec-dataprep Nov 26, 2024

aunshx requested a review from srkirkland November 26, 2024 20:51

srkirkland reviewed Nov 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to cec-dataprep #64

Updates to cec-dataprep #64

aunshx commented Nov 26, 2024 •

edited

Loading

srkirkland Nov 26, 2024

srkirkland Nov 26, 2024

srkirkland Nov 26, 2024

srkirkland Nov 26, 2024

srkirkland Nov 26, 2024

Updates to cec-dataprep #64

Are you sure you want to change the base?

Updates to cec-dataprep #64

Conversation

aunshx commented Nov 26, 2024 • edited Loading

srkirkland Nov 26, 2024

Choose a reason for hiding this comment

srkirkland Nov 26, 2024

Choose a reason for hiding this comment

srkirkland Nov 26, 2024

Choose a reason for hiding this comment

srkirkland Nov 26, 2024

Choose a reason for hiding this comment

srkirkland Nov 26, 2024

Choose a reason for hiding this comment

aunshx commented Nov 26, 2024 •

edited

Loading