Sometimes it happens that you have a very large text or CSV file to process, but first you want to make smaller files of that large file. Because that large file may take too much time to process or open. So I am giving an example below to split large text/CSV file into multiple files in PL SQL using stored procedure.
You just need to pass two parameters to this PL SQL procedure, first is database directory object name, where the text files are residing and the second is the source file name (the file which you want to split).
If Oracle directory object is not exists for the location of text files, then you can create it as shown below:
For windows: CREATE OR REPLACE DIRECTORY CSV_FILE_DIR AS 'D:\plsql\text_files';
For Linux/Unix (due to difference in path): CREATE OR REPLACE DIRECTORY CSV_FILE_DIR AS '/plsql/text_files';
Change the path above according to yours files location. Then create the below procedure by executing its script:
CREATE OR REPLACE PROCEDURE split_file (p_db_dir IN VARCHAR2, p_file_name IN VARCHAR2) IS read_file UTL_FILE.file_type; write_file UTL_FILE.file_type; v_string VARCHAR2 (32767); j NUMBER := 1; BEGIN read_file := UTL_FILE.fopen (p_db_dir, p_file_name, 'r'); WHILE j > 0 LOOP write_file := UTL_FILE.fopen (p_db_dir, j || '_' || p_file_name, 'w'); FOR i IN 1 .. 100 LOOP -- example to dividing into 100 rows for each file.. you can increase the number as per your requirement UTL_FILE.get_line (read_file, v_string); UTL_FILE.put_line (write_file, v_string); END LOOP; UTL_FILE.fclose (write_file); j := J + 1; END LOOP; EXCEPTION WHEN OTHERS THEN -- this will handle if reading source file contents finish UTL_FILE.fclose (read_file); UTL_FILE.fclose (write_file); END;
This procedure splitting 100 rows for each file, which you can modify as per your need. Now execute this procedure as shown below by passing database directory object name and the file name:
BEGIN split_file ('CSV_FILE_DIR', 'text_file.csv'); END;
You can check your file location (CSV_FILE_DIR) for the multiple files starting with numbers like 1_text_file.csv, 2_text_file.csv and so on, as shown in below image:
Hi,
I tried above option. What I realized is the main file e.g. test_file contains 100 Records and the sum of split files is not equal to 100. i.e. sum of records from 1_test_file and 2_test_file is not equal to records in test_file.
In how many rows you divided? For example, if you will divide in 40 rows, then the first file will contain 40 records, 2nd 40 and the 3rd will contain 20.
I have modified split function to split files in the interval of 3000.
Total records were 9886.
3000 1st split
3000 2nd split
3000 3rd split
674 4th split
====
9674
9886 - 9674 = 212 difference.
i.e. I am getting short of 212 records when I did the additions of 4 split files.
It seems somewhere is error is happening that is why it is not generating the last file completely.
Try to remove the exception section of the above code temporarily.
Then run the program again, you will get the error if something is wrong.
Sure, I will try that option. In fact I checked the data which is not getting picked up in the last split file. But to my surprise I did not see anything suspicious that could prevent it from picking up in the last split file.
Hi,
I would like to confirm that now I am able to get the exact count as expected. i.e. Record Count from the Main file and the sum of record counts from various split files are exactly matching.
It was my bad. I was calling split_file function immediately after "END LOOP".
It should have been called after the "UTL_FILE.fclose (v_filehnd); -- CLOSE FILE" as shown below.
END LOOP;
UTL_FILE.fclose (v_filehnd); -- CLOSE FILE
split_file ('FTP_OUTBOX', '*******.csv');
No need of removing exception from the split_file function.
Thanks a lot for your kind support and patience.
You are welcome.
And that is great you find the solution.
Thanks. Have a great day ahead!!!!