py-workdocs-prep 0.5.1

Creator: bradpython12

Last updated:

Add to Cart

Description:

pyworkdocsprep 0.5.1

py_workdocs_prep
A bulk directory and file renaming utility to prepare files for migration to AWS WorkDocs
If you run the script, it will start to traverse the current directory and will do one of the following with each file and directory:

Keep as is
Rename
Delete

All actions taken will be written out to STDOUT after all operations is completed
WARNING The actions will make changes to your directories and/or files. It is HIGHLY RECOMMENDED you first do a full backup of your data.
This project was a result of me migrating from Dropbox to AWS Workdocs and finding a lot issues due to the names of files and/or directories that were invalid in AWS Workdocs.
For details of this potential problem, refer to the AWS Workdocs Administration Guide
Here is the most important limitations as of 2019-10-26:

Amazon WorkDocs Drive displays only files with a full directory path of 260 characters or fewer
Invalid characters in names:

Trailing spaces
Periods at the beginning or end–For example: .file, .file.ppt, ., .., or file.
Tildes at the beginning or end–For example: file.doc~, ~file.doc, or ~$file.doc
File names ending in .tmp–For example: file.tmp
File names exactly matching these case-sensitive terms: Microsoft User Data, Outlook files, Thumbs.db, or Thumbnails
File names containing any of these characters – * (asterisk), / (forward slash), \ (back slash), : (colon), < (less than), > (greater than), ? (question mark), | (vertical bar/pipe), " (double quotes), or \202E (character code 202E)



Quick Start
The following examples assume a MS Windows system, as the intend is to prepare a directory for AWS WorkDocs, which typically only has clients for Windows (unless you are on mobile).
From PyPi
Prerequisites:

Python 3.7+

The example below will show how to get started very quickly using the most current version. The example will demonstrate a dry-run operation that will allow you to inspect the log file and review changes before committing.
Assuming you are on the Windows command line:
> pip install py-workdocs-prep
> cd <the directory you whish to prepare for migration>
> wdp --dry-run

A log file called py_workdocs_prep.log will be generated. If it already exist, new entries will be appended.
NOTE It is highly recommended that you inspect the log file and understand how the application will change your files - and delete certain directories and files. Also take special note of any warnings, especially those about the total path length that may be too long (search for the string TOTAL LENGTH EXCEEDED THRESHOLD). Read here why this is important.
To commit all changes, but first backup all files and directories, you can run the following (assuming the application is already installed):
> wdp -b

Command Line Arguments



Option
Description
Example




-b or --backup
Create a backup of all current files and directories. A tar.gz file will be created.
> wdp -b


--dry-run
The application will not perform any file or directory modifications, but only log what would be done.
> wdp --dry-run


--delete-dirs
Define a comma separated list of directories to be deleted. Don't include any spaces, but rather use proper Python regex expressions.
> wdp --delete-dirs="test1,ven*,node_mod*"



From Source
Prerequisites:

Python 3.7+
git

Assuming your target directory is something like D:\Dropbox, and you want to backup first, you can run the following commands:
> git clone https://github.com/nicc777/py_workdocs_prep.git
> cd py_workdocs_prep
> python setup.py sdist
> pip install dist\*
> d:
> cd Dropbox
> wdp -b

Strategy
I had a very large number of files (600,000+) and it turned out a lot of them violated the mentioned restrictions. I had to make a plan...
Here is how the script works:
Long path names
The Default Windows starting folder is W:\My Documents\ and it contains 16 characters.
Therefore, any other directory and/or file name combined in my Dropbox root folder had to come in under 244 characters.
I decided that after the transformation, I would just print WARNINGS for each item with the number of characters over. I would then make a decision later on to either rename some part of the directory and/or file name or sometimes completely reorganize the directory structure. This would remain a manual operation.
Getting rid of redundant files
As I used Dropbox as a "working" documents directory I ended up with a large number .git, venv and node_modules directories (to name a view examples). So the obvious first step for me was to delete all these directories. (DONE)
Files that will also be deleted include files starting or ending with the tilde (~) character. (PENDING)
Files ending in .tmp will also be deleted. (PENDING)
Directory and file renaming strategy
Any directory names and files containing any of the listed invalid characters (including any whitespace) will be renamed, replacing the violating characters with an underscore (_) character. Repeating underscore characters will be replaced with just a single underscore character.
Processing Methodology
In terms of processing, the following order of processing will be followed:

First, all directories will be traversed and file names will be checked:

If it is identified as a file to be deleted, write out a delete command
Process illegal characters and issue a rename command if required


Now traverse all directories and identify all directories to be renamed

After the list is determined: order the list in terms of length (from longest to least)
Loop through the list and commit rename commands


Now, assuming we have a list of final directory and file names, determine which items are over the total length limit and print warnings for these

Acknowledgements
Thanks to NanoDano for the examples I used to walk through the directories.
Geek Food
Manual Testing
To inspect the project and prepare for migrating to AWS Workdocs...
Clone the project and cd into the project directory
>>> from py_workdocs_prep.py_workdocs_prep import start
>>> start()

Memory Profiling
You can try the following:
> pip install -U memory_profiler

Then:
>>> from py_workdocs_prep.py_workdocs_prep import start
>>> from memory_profiler import memory_usage
>>> memory_usage((start, ('D:\\Dropbox',)))
Starting in "D:\Dropbox"
[15.54296875, 15.54296875, 15.54296875,..., 178.421875]

This means the script started scanning the directory D:\Dropbox and the application grew from a starting 15.5 MiB to 178.4 MiB (early testing).
My machine has plenty of RAM, so this was acceptable for me.
References:

memory_profiler

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files:

Customer Reviews

There are no reviews.