-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to cec-dataprep #64
base: master
Are you sure you want to change the base?
Conversation
); | ||
|
||
create index idx_find_clusters | ||
-- Index on the treatedclusters table | ||
CREATE TABLE idx_find_clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aunshx this doesn't look right, i think it should be an index not table
land_use text, | ||
forest_type text, | ||
haz_class int4, | ||
"Stem6to9_tonsAcre" double precision, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the same naming as the rest of the variables, so stem6to9_tonsAcre
for this one. It hopefully wouldn't affect the import, but if it does i still thing i'd be better to handle the difference during import and not have 2 different naming schemes within one table
"Stem9Plus_tonsAcre" double precision, | ||
"Branch_tonsAcre" double precision, | ||
"Foliage_tonsAcre" double precision, | ||
wood_density float4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking maybe make this double precision
like the others. I'm sure it could technically fit in a float4 but I don't think the difference is consequential so better to pick one and go with it.
for filename in csv_files: | ||
file_path = os.path.join(split_dir, filename) | ||
|
||
county, year = filename.replace('.csv', '').split('_') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth checking you got back real values for these -- otherwise an extra file put into this directory will kill the whole thing
row = row[:15] + row[16:] # Remove the extra field | ||
|
||
county = row[13] | ||
year = row[2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using the csv reader is there not a way to get the year and county by header name instead of needing to make sure the year is always in column 3?
Updates for FRREDSS 2.0
split_csv.py - This file will split the processed data into county_year.csv files, reprocess it (remove col discrepancies ) and store in a folder split_files. This folder has been uploaded to box. This code is to be run only once as follows:
python split_csv.py path_of_the_processed_data_file.csv
process_uploads.py - This file will take data from the split_files folder and add it to the treatedclusters db one by one and moves the upload files to the upload_completed folder. Checks whether county+year already exists. To run it:
python process_uploads.py path_of_the_split_files_folder