For viewing tabular data (plain text format only) directly on the command line. Generally expects tab-delimited input files.
sheet.py is an interactive terminal-based spreadsheet viewer for tab-delimited files. Sort of like a bare-bones read-only version of Excel.
sheet.pl pre-processes input data into reasonable column-delimited tabular files that you can then pipe into less -S. Similar to unix ‘column’ command.
trash.pl is a “safer rm” that moves files to a new temporary directory in /tmp/ instead of immediately removing them. May fill up your /tmp partition if you delete extremely large files, so beware.
ditto_mark.pl marks duplicated cells in a tab-delimited file (just like ditto marks in an old ledger). Good for finding duplicates in a visually obvious fashion.
This is a version of “cut” that allows you to output the results in an arbitrary order. For example, cut.pl -f 2,1,3- would switch columns 2 and 1, and leave columns 3 and beyond in the same order.
A modified version of UNIX join. It can handle un-sorted input and deal with case-insensitive joins. Can also accept multiple input files all at once.
Can sort compressed (gzip/bzip2) files and can accept header line(s). It uses the fast UNIX sort internally. Frequency-of-use rating: 9/10.
mdverify.pl is a script to easily verify a bunch of files with md5 checksums. It runs on both Mac and Linux and can handle several types of input md5 file, unlike normal md5sum.
convert_SAM_or_BAM_for_Genome_Browser.pl converts input BAM/SAM files into tracks for the UC Santa Cruz Genome Browser (UCSC Genome Browser), and provides a track description file.
fasta2gtf.pl (a bioinformatics-specific tool) takes a FASTA file and makes a GTF file that spans each chromosome.
qplz.pl (“queue please”) can submit jobs to a PBS Pro queue in user-friendly fashion. Tested with PBS Pro version 13 (August 2016). May also work with TORQUE.
Randomly chooses a certain number of lines from a file. Can sample with or without replacement. It can also pull out multi-line records (for example, in a FASTQ file, each record is actually 4 rows). Becomes very slow if files have > 1 million lines.
matrix_from_edge_list.pl can turn a 2- or 3-column file into a matrix. The matrix will either be an adjacency matrix (2 column input) or will have the values of each edge (3 column input).
select_best_item.pl picks the best N items (rows) with a given key (in a user-specified column).
There are a ton of additional programs on the TimeForScience GitHub repository, some of which have even been properly documented.