Free Software

Contents

TimeForScience Repo on GitHub

Summary

A blue snake

Spreadsheet Viewers

For viewing tabular data (plain text format only) directly on the command line. Generally expects tab-delimited input files.

sheet.py

sheet.py is an interactive terminal-based spreadsheet viewer for tab-delimited files. Sort of like a bare-bones read-only version of Excel.

sheet.pl

sheet.pl pre-processes input data into reasonable column-delimited tabular files that you can then pipe into less -S. Similar to unix ‘column’ command.

A blue cat face

Recommended Tools

trash.pl

trash.pl is a “safer rm” that moves files to a new temporary directory in /tmp/ instead of immediately removing them. May fill up your /tmp partition if you delete extremely large files, so beware.

ditto_mark.pl

ditto_mark.pl marks duplicated cells in a tab-delimited file (just like ditto marks in an old ledger). Good for finding duplicates in a visually obvious fashion.

cut.pl

This is a version of “cut” that allows you to output the results in an arbitrary order. For example, cut.pl -f 2,1,3- would switch columns 2 and 1, and leave columns 3 and beyond in the same order.

join.pl

A modified version of UNIX join. It can handle un-sorted input and deal with case-insensitive joins. Can also accept multiple input files all at once.

sort.pl

Can sort compressed (gzip/bzip2) files and can accept header line(s). It uses the fast UNIX sort internally. Frequency-of-use rating: 9/10.

mdverify.pl

mdverify.pl is a script to easily verify a bunch of files with md5 checksums. It runs on both Mac and Linux and can handle several types of input md5 file, unlike normal md5sum.

A blue octopus

Bioinformatics

SAM/BAM → UCSC Browser (.pl)

convert_SAM_or_BAM_for_Genome_Browser.pl converts input BAM/SAM files into tracks for the UC Santa Cruz Genome Browser (UCSC Genome Browser), and provides a track description file.

fasta2gtf.pl

fasta2gtf.pl (a bioinformatics-specific tool) takes a FASTA file and makes a GTF file that spans each chromosome.

A blue snake

Scientific / Data Processing

qplz.pl (qplease.pl)

qplz.pl (“queue please”) can submit jobs to a PBS Pro queue in user-friendly fashion. Tested with PBS Pro version 13 (August 2016). May also work with TORQUE.

rand_lines.pl

Randomly chooses a certain number of lines from a file. Can sample with or without replacement. It can also pull out multi-line records (for example, in a FASTQ file, each record is actually 4 rows). Becomes very slow if files have > 1 million lines.

matrix_from_edge_list.pl

matrix_from_edge_list.pl can turn a 2- or 3-column file into a matrix. The matrix will either be an adjacency matrix (2 column input) or will have the values of each edge (3 column input).

select_best_item.pl

select_best_item.pl picks the best N items (rows) with a given key (in a user-specified column).

Other

Other programs

There are a ton of additional programs on the TimeForScience GitHub repository, some of which have even been properly documented.

A dangerous snake with a sword

Programs that aren’t on GitHub

hue.pl (Philips Hue lights)