Please log in

Invalid username or password!

List Cleaner: match mills / factories to UML





Loading...
Download
Loading...

1. Open the file with the list that needs to be cleaned - this needs to be a csv or xlsx file with a single sheet. Rename the required columns as necessary, save the file and close it. For the script to work, the list needs to have the following columns named exactly like this: parent, mill_name, latitude, longitude, uml_id

2. Click the Browse... button and select the file you want to clean (i.e. the file in step one).

3. Click the Run app button: The script reads the contents of the list file and for each mill/factory, finds a best match from the Rainforest Alliance Universal Mill List, for palm, or the SCG Sugar Universal Mill List, for sugar, based on the UML ID, parent name, first aggregator name and coordinates. The script also calculates a level of confidence for each match: HIGH, if two of UML ID, first aggregator name or coordinates are 'exact matches' (for coordinates, within 1 km of each other); MEDIUM if only coordinates are 'exact matches'; and otherwise LOW.

4. Once the Results Table is loaded, you can click on any row and see the original (i.e. raw) data - if valid coordinates were provided - along with its match, plotted on the interactive map.

5. In addtion to the output within this app, you can export an xlsx file (matched_to_UML_...) with the results by clicking the Download button in the Results Table tab. This file contains the original data, plus new columns with the matching mills/factories and all their additional information (e.g. RSPO status, country, province, etc.). It also has columns indicating whether the UML IDs are the same, differences in parent and first aggregator names (a ratio from 0, no difference, to 1, all characters are different), and distance in km between the original and matching mill/factory.

Note: The matches are a 'best' match, based on the information in the list that needs to be cleaned, and the parameters used in the matching script. The output file should be checked to see whether the matches are indeed 'correct', specially those with a 'LOW' level of confidence. When no match is found the newly created columns will be blank.