Linux Commands: Difference between revisions

From Sea of Fate
Jump to navigationJump to search
Created page with "==Introduction== we have a random collection of commands that we should remember but just in case a re listed and explained here for our convenience. ==STORAGE & INODES== * df -h : Check disk space (Is the 2TB NVMe or 800GB SSD full?). * df -h <path> : Show Human-readable Free space on the disk partition containing that directory. * df -h /mnt/fast_scratch how much room is left in the volume fast_scratch * df -i /mnt/fast_scratch : Check Inode usage (Crucial for the Gu..."
 
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Introduction== we have a random collection of commands that we should remember but just in case a re listed and explained here for our convenience.
==Introduction==
 
we have a random collection of commands that we should remember but just in case a re listed and explained here for our convenience.


==STORAGE & INODES==
==STORAGE & INODES==


* df -h : Check disk space (Is the 2TB NVMe or 800GB SSD full?).
df -h
* df -h <path> : Show Human-readable Free space on the disk partition containing that directory.
Check disk space (Is the 2TB NVMe or 800GB SSD full?).
* df -h /mnt/fast_scratch how much room is left in the volume fast_scratch
df -h <path>  
* df -i /mnt/fast_scratch : Check Inode usage (Crucial for the Gutenberg 850k+ file storm).
Show Human-readable Free space on the disk partition containing that directory.
* du -sh /mnt/fast_scratch/zim_extraction_scratch/dump : Check current expanded size of the HTML files.
df -h /mnt/fast_scratch  
* du -sh <path> : Show total Space used by a Human-readable Directory or file.
how much room is left in the volume fast_scratch
* du -ah <path> : Show space used by All individual files and directories within a path.
df -i /mnt/fast_scratch  
Check Inode usage (Crucial for the Gutenberg 850k+ file storm).
du -sh /mnt/fast_scratch/zim_extraction_scratch/dump  
Check current expanded size of the HTML files.
du -sh <path>  
Show total Space used by a Human-readable Directory or file.
du -ah <path>
Show space used by All individual files and directories within a path.


==FILE PROGRESS & FLOW==
==FILE PROGRESS & FLOW==


* ls -1 /mnt/fast_scratch/zim_extraction_scratch/dump | wc -l : Count raw HTML files.
ls -1 /mnt/fast_scratch/zim_extraction_scratch/dump | wc -l  
* ls -1 /mnt/fast_scratch/zim_extraction_scratch/md_temp | wc -l : Count finished Markdown files.
Count raw HTML files.
* tail -f /mnt/archive_data/ai_training/zim_markdown/audit.csv : Watch the final "PASS/FAIL" log live.
ls -1 /mnt/fast_scratch/zim_extraction_scratch/md_temp | wc -l
* watch -n 5 "df -h" means run df -h then wait 5 seconds then run again  
Count finished Markdown files.
* watch -d -n 5 "df -h" will highlight the '''D'''ifferences in the numbers that have changed between the last and now
tail -f /mnt/archive_data/ai_training/zim_markdown/audit.csv  
* watch -d -n 5 "df -h | grep -E 'sd[b-d]'" will do df-h every 5 seconds and highlight the difference beteewn last and now then it will pipe to grep an only display the lines with sdb sdc and sdd.
Watch the final "PASS/FAIL" log live.
* watch -d -n 5 "iostat -m -p sdc" this does iostat every 5 seconds and -m forces into megabytes, the -p sdc means device sdc, -d will highlight the difference.  
watch -n 5 "df -h"  
means run df -h then wait 5 seconds then run again  
watch -d -n 5 "df -h"  
will highlight the '''D'''ifferences in the numbers that have changed between the last and now
watch -d -n 5 "df -h | grep -E 'sd[b-d]'"  
will do df-h every 5 seconds and highlight the difference beteewn last and now then it will pipe to grep an only display the lines with sdb sdc and sdd.
watch -d -n 5 "iostat -m -p sdc"  
This does iostat every 5 seconds and -m forces into megabytes, the -p sdc means device sdc, -d will highlight the difference. Other possible commands can use watch as long as the command is wrapped in quotes, this is because the watch command interprets the command first than then pipes it to watch so if the quotes are not there the pip never actually ends and the watch simply waits for the command stream from the pipe to end


==DELETING DIRECTORIES==
==DELETING DIRECTORIES==
Line 61: Line 77:




=THE expand and flow pipline==
==THE expand and flow pipline==


Starting Rsync
Starting Rsync
Line 121: Line 137:
==PERFORMANCE & BOTTLENECKS==
==PERFORMANCE & BOTTLENECKS==


* htop : The main dashboard. Look for 14 cores hitting 100% (Conversion start).
htop  
* iostat -x 1 5 : Watch disk %util. If it's 100%, the disk is the bottleneck.
The main dashboard. Look for 14 cores hitting 100% (Conversion start).
* free -h : Check RAM. Look at the "available" column to see if the kernel is choking.
iostat -x 1 5
* vmstat 1 : Quick view of system health. Look at the 'wa' (IO Wait) column.
Watch disk %util. If it's 100%, the disk is the bottleneck.
* iostat -p <device> Shows I/O statistics for a specific device (like sdb) and all its partitions in one report.
free -h  
Check RAM. Look at the "available" column to see if the kernel is choking.
vmstat 1  
Quick view of system health. Look at the 'wa' (IO Wait) column.
iostat -p <device>  
Shows I/O statistics for a specific device (like sdb) and all its partitions in one report.


==SCREEN MANAGEMENT==
==SCREEN MANAGEMENT==

Latest revision as of 05:09, 6 March 2026

Introduction

we have a random collection of commands that we should remember but just in case a re listed and explained here for our convenience.

STORAGE & INODES

df -h

Check disk space (Is the 2TB NVMe or 800GB SSD full?).

df -h <path> 

Show Human-readable Free space on the disk partition containing that directory.

df -h /mnt/fast_scratch 

how much room is left in the volume fast_scratch

df -i /mnt/fast_scratch 

Check Inode usage (Crucial for the Gutenberg 850k+ file storm).

du -sh /mnt/fast_scratch/zim_extraction_scratch/dump 

Check current expanded size of the HTML files.

du -sh <path> 

Show total Space used by a Human-readable Directory or file.

du -ah <path>

Show space used by All individual files and directories within a path.

FILE PROGRESS & FLOW

ls -1 /mnt/fast_scratch/zim_extraction_scratch/dump | wc -l 

Count raw HTML files.

ls -1 /mnt/fast_scratch/zim_extraction_scratch/md_temp | wc -l

Count finished Markdown files.

tail -f /mnt/archive_data/ai_training/zim_markdown/audit.csv 

Watch the final "PASS/FAIL" log live.

watch -n 5 "df -h" 

means run df -h then wait 5 seconds then run again

watch -d -n 5 "df -h" 

will highlight the Differences in the numbers that have changed between the last and now

watch -d -n 5 "df -h | grep -E 'sd[b-d]'" 

will do df-h every 5 seconds and highlight the difference beteewn last and now then it will pipe to grep an only display the lines with sdb sdc and sdd.

watch -d -n 5 "iostat -m -p sdc" 

This does iostat every 5 seconds and -m forces into megabytes, the -p sdc means device sdc, -d will highlight the difference. Other possible commands can use watch as long as the command is wrapped in quotes, this is because the watch command interprets the command first than then pipes it to watch so if the quotes are not there the pip never actually ends and the watch simply waits for the command stream from the pipe to end

DELETING DIRECTORIES

  1. rmdir [dir] : Removes an Empty directory only. Safe, but picky.
  2. rm -rf [dir] : The "Nuclear Option". Deletes the folder and every single file inside it immediately. Use with caution.
  3. Wildcard usage
    1. ls -d partialfilname* : Use this first to confirm exactly what the shell sees before you delete it.
    2. rm -i partialfilname* : The "Interactive" mode. It will ask "Are you sure?" for every file. (Not recommended for 1.5M files, or you'll be there until 2036!)

find

When you run find /path -type f | wc -l, you are using three distinct tools to solve one puzzle:

find /mnt/fast_scratch/...
  1. This is your "Scout." It goes into the directory and looks at every single object it finds.
-type f
  1. This is the "Filter." By default, find sees everything—folders, files, hidden links, etc. -type f tells it: "Ignore the folders, only count the actual FILES." This is crucial for your Gutenberg run because it contains thousands of sub-folders you don't want to count.
wc -l
  1. This is the "Counter." wc stands for Word Count, and the -l flag tells it to count Lines.
  2. The Pipe (|): The find command outputs a long list of filenames (one per line). The pipe feeds that list into wc, which simply counts how many lines there are and gives you the total.
  3. FILE COUNTING
    1. find . -type f | wc -l : Count all Files (ignoring folders).
    2. find . -type d | wc -l : Count all Directories (ignoring files).


more control

find /mnt/fast_scratch/zim_extraction_scratch/dump -maxdepth 2 -type f | wc -l

hides files deeper than level 2.

  1. find [path] -maxdepth 1 : Stay in the top-level folder only.
  2. find [path] -mindepth 2 : Ignore the top folder, only look at things inside sub-folders
  3. find [path] -size +1M : Find only files larger than 1 Megabyte.
  4. find [path] -size -10k : Find only files smaller than 10 Kilobytes (The "Tiny File" filter).
  5. find [path] -mmin -5 : Find files modified in the last 5 minutes. (Great for seeing if the script is still actually writing data!)
find /mnt/fast_scratch/zim_extraction_scratch/dump -type f -mmin -1 | wc -l 
  1. -mmin -1: Tells find to only "Scout" for files whose Modification time was less than 1 minute ago.
  2. -type f : only return files
  3. | wc -l pipe wordcount as lines
find /mnt/fast_scratch/zim_extraction_scratch/dump -type f -mmin -60 | wc -l
  1. if this returns 0 it would mean that the process that was supposed to be adding files to that dir has probably stuck ( there has been nothing added for an hour)


THE expand and flow pipline

Starting Rsync

  1. Start the transfer with the "Trailing Slash" to ensure files land in the right "chest":
rsync -avP /source/path/ /destination/path/
    1. -a: Archive (preserves permissions).
    2. -vP: Verbose progress (shows you what’s happening).

The "Quiet Mode" (Managing IO Priority) Immediately reduce the "noise" so your SSH terminal stays responsive:

ionice -c 3 -p $(pgrep rsync)
  1. -c 3: Sets the process to "Idle" priority. It only moves data when the disk isn't busy doing anything else (like your typing!).

The "Freeze Ray" (Pausing when Full) If the destination disk is hitting 80-90%, freeze the flow:

sudo pkill -STOP rsync

Note: The process stays in RAM, but all Disk IO stops instantly.

The Hardware Upgrade (Proxmox/Cloud)

Go to your Host (Proxmox) and increase the disk size (e.g., add 400GB). Tell the VM to "Look at the Hardware" again:

echo 1 | sudo tee /sys/class/block/sdX/device/rescan (Replace X with your disk letter, e.g., d)

The "Stretch" (Growing the Filesystem) Choose the command based on your filesystem type:

  1. For EXT4 (Your current setup):
sudo resize2fs /dev/sdX
  1. For XFS (Common on RedHat/Enterprise):
sudo xfs_growfs /mnt/your_mount_point (Note: XFS uses the mount point, not the device path)

The "Thaw" (Resuming the Flow) Once you see the extra space in df -h, restart the flow:

sudo pkill -CONT rsync

The "Audit" (Terminating a Dud Job) If you realize the path was wrong (like a newdestinationdir error), Stop all data movement immediately to clear the lag and hold your place:

 sudo pkill -STOP rsync

Effect: All rsync jobs are now "Paused" (Status T). They are still in RAM, but using 0% Disk IO. With the disk quiet, your terminal should be responsive. Now, find the "Bad" actor:

ps aux | grep rsync
  1. Identify the PID: Look at the second column for the number.
  2. Identify the Path: Look at the end of the line for the wrong directory (e.g.,wrongDestinationDir)

Kill only the specific process ID (let’s say it’s PID 8742) that had the wrong path:

sudo kill -9 8742

Effect: The "Dud" job is deleted from the process list

Now, tell the remaining "Good" rsync jobs to start moving again:

sudo pkill -CONT rsync

Effect: Any rsync that wasn't killed in Step 3 picks up exactly where it left off.

Finally, The Cleanup & Flow Control delete the mistake:

sudo rm -rf /mnt/zim_turbo/zim/wrongDestinationDir

Ensure Responsiveness:

ionice -c 3 -p $(pgrep rsync)

Why ionice -c 3 is your best friend during lag: Think of your Disk IO like a narrow hallway. rsync is a giant guy carrying a sofa; he blocks everyone else. ionice -c 3 tells that guy: "If you see anyone else (like Nigel's SSH typing) trying to get through the hallway, step into a doorway and let them pass first."

PERFORMANCE & BOTTLENECKS

htop 

The main dashboard. Look for 14 cores hitting 100% (Conversion start).

iostat -x 1 5  

Watch disk %util. If it's 100%, the disk is the bottleneck.

free -h 

Check RAM. Look at the "available" column to see if the kernel is choking.

vmstat 1 

Quick view of system health. Look at the 'wa' (IO Wait) column.

iostat -p <device> 

Shows I/O statistics for a specific device (like sdb) and all its partitions in one report.

SCREEN MANAGEMENT

  • screen -ls : List your active "work" sessions.
  • screen -r [name] : Re-attach to your extraction script cockpit.
  • CTRL+A then D : Detach safely (leaves the script running in the background).
  • sudo ionice -c 3 -p $(pgrep rsync)
  • sudo ionice -c 3 -p $(pgrep zimdump)