The Computer Language
24.11 Benchmarks Game

regex-redux description

Background

HN discussion and regex engines on a curated set of tasks.

Variance

Some language implementations have regex built-in; some provide a regex library; some use a third-party regex library.

The regex algorithm implemented is very likely to be different in different libraries.

The work

The work is to use the same simple regex patterns and actions to manipulate FASTA format data. Don't optimize away the work.

How to implement

We ask that contributed programs not only give the correct result, but also use the same algorithm to calculate that result.

Each program should:

diff program output for this 10KB input file (generated with the fasta program N = 1000) with this output file to check your program output has the correct format, before you contribute your program.

Generate a larger input file (using one of the fasta programs with command line arguments: 5000000 > input5000000.txt) to check program performance.

Thanks to Jeremy Zerfas for insisting that the programs follow the "one pattern at a time" guideline, and developing the magic regex patterns. Thanks to Matt Brubeck for the good enough magic regex pattern.