Another complication comes from the fact that Bowtie can't read data from the stdin, which prevents me from doing something like
zcat my_compressed_fastq.fastq.gz | bowtie.
So the simple solution to this problem is to use named pipes, also known as FIFOs (First In First Out). Conceptually, they are the same as queues in real life. Let's say I'm waiting to pay for groceries at the store. If I was the first in line, I will be the first out of the line, unless someone cuts the line, but bytes never do that because they are respectful little creatures.
Concretely, this is done by using the
mkfifo name_of_the_queue.fifocommand in Linux or Mac OS. This command will create a queue on the filesystem that acts just like a regular file: we can write to it and read from it, but with the constraint of not being able to seek generic positions from it. In our case, we will create a queue, write the decompressed Fastq, and tell the second process in the pipeline (Bowtie) to read it's input from the previously create file-like-thingy (
file-like-thingy == queue == FIFO).
mkfifo my_queue.fifo zcat my_compressed_file.fastq.gz > my_queue.fifo & bowtie reference.fasta my_queue.fifo rm my_queue.fifo
The amperstand on line 2 is meant to start the zcat in a process without blocking the terminal. This way, Bowtie can start streaming from the queue as it is getting written.
To conclude, this method is efficient for two reasons. The first one is that there is no need to recompress the file using gzip as it would've been necessary to do if gunzip had been used. The second is that the decompression occurs concurrently with the mapping.
It is also important to note that since we're using a queue, it is impossible to "go back" in the file. This will only work if the file is read from start to end without ever looking back (you should never look back).
This was a simple article, but I hope it helps some of you saving some time and space.