python - import Full file pathname and attributes in to mariadb safely - Stack Overflow

admin2025-04-16  2

I am needing to get full path and file name, file hash and several other properties of files in to a database. I was hoping some one had some good reading and suggestions on database creation that would do so safely. the import script will be python3.

I have to deal with all kinds of file types some with language specific characters as well as spaces and glyphs? in the names.

example /home/test/Music/RÜFÜS DU SOL ●● Solace (Lastlings Remix) [Official Audio] -.webm or /home/test/Music/超好聽的中國古典音樂 笛子名曲 古箏 - Hermosa Música de Flauta- Música Para Meditación.mp4

I am sure there are some rules to creating the database that would improve the experience as well as I am sure there are some practices in python3 that would need to be used to not make a giant mess of things. At present copying all non duplicate files to a specific location preserving file structure would be the most difficult task the python script would need to do or possibly making a file link back to the initial duplicate file scanned.

I have not been able to find any thing specifically related when searching. I have usually found how to make a table with a strange name but not much about the table contents having strange characters.

Side note: one folder has over 4M files across 20k+ sub folders hence the need for indexing in a database to start sorting out the folder/subfolder mess in a systematic way as ls can take 60-90 seconds to return results there.

I am needing to get full path and file name, file hash and several other properties of files in to a database. I was hoping some one had some good reading and suggestions on database creation that would do so safely. the import script will be python3.

I have to deal with all kinds of file types some with language specific characters as well as spaces and glyphs? in the names.

example /home/test/Music/RÜFÜS DU SOL ●● Solace (Lastlings Remix) [Official Audio] -.webm or /home/test/Music/超好聽的中國古典音樂 笛子名曲 古箏 - Hermosa Música de Flauta- Música Para Meditación.mp4

I am sure there are some rules to creating the database that would improve the experience as well as I am sure there are some practices in python3 that would need to be used to not make a giant mess of things. At present copying all non duplicate files to a specific location preserving file structure would be the most difficult task the python script would need to do or possibly making a file link back to the initial duplicate file scanned.

I have not been able to find any thing specifically related when searching. I have usually found how to make a table with a strange name but not much about the table contents having strange characters.

Side note: one folder has over 4M files across 20k+ sub folders hence the need for indexing in a database to start sorting out the folder/subfolder mess in a systematic way as ls can take 60-90 seconds to return results there.

Share Improve this question edited Feb 5 at 11:21 globglogabgalab 5883 silver badges19 bronze badges asked Feb 3 at 16:42 KendrickKendrick 1732 silver badges10 bronze badges 1
  • Suggest leaving off the common prefix (/home/test/Music/) from the file path. This will help if (when) you move things around on disk. (And shorten the path a little.) – Rick James Commented Feb 18 at 2:01
Add a comment  | 

1 Answer 1

Reset to default 1

I use filename varchar(4096) collate utf8mb4_unicode_ci for this. If case of letters are important (for e.g. searching or sorting), then use collate utf8mb4_bin. Number 4096 is the value of linux PATH_MAX, the maximum supported chars in a filename.

When you must save space, then use filename varchar(4096) collate binary. In this case, the app must interpret the data as UTF-8. A char of binary needs 1 byte, a char of utf8mb4_* 4 bytes space.

Instead of varchar(4096) collate binary you can use varbinary(4096).

If you don't want to store the path, but only the base name (last part of the path), then varchar(255) is good. This value is set by linux NAME_MAX.

转载请注明原文地址:http://www.anycun.com/QandA/1744760352a87230.html