source code
# The Computer Language Benchmarks Game
# https://salsa.debian.org/benchmarksgame-team/benchmarksgame/
#
# regex-dna program contributed by jose fco. gonzalez
# corrected to use regex instead of string substitution
# array not dictionary by Isaac Gouy
seq = STDIN.readlines.join
ilen = seq.size
seq.gsub!(/>.*\n|\n/,"")
clen = seq.length
[
/agggtaaa|tttaccct/i,
/[cgt]gggtaaa|tttaccc[acg]/i,
/a[act]ggtaaa|tttacc[agt]t/i,
/ag[act]gtaaa|tttac[agt]ct/i,
/agg[act]taaa|ttta[agt]cct/i,
/aggg[acg]aaa|ttt[cgt]ccct/i,
/agggt[cgt]aa|tt[acg]accct/i,
/agggta[cgt]a|t[acg]taccct/i,
/agggtaa[cgt]|[acg]ttaccct/i
].each {|f| puts "#{f.source} #{seq.scan(f).size}" }
# ruby 1.8.7: to iterate in-order use array not dictionary
[
[/tHa[Nt]/, '<4>'], [/aND|caN|Ha[DS]|WaS/, '<3>'], [/a[NSt]|BY/, '<2>'],
[/<[^>]*>/, '|'], [/\|[^|][^|]*\|/, '-']
].each { |f,r| seq.gsub!(f,r) }
puts
puts ilen
puts clen
puts seq.length
notes, command-line, and program output
NOTES:
64-bit Ubuntu quad core
ruby 3.3.5
(2024-09-03
revision ef084cc8f4)
+YJIT [x86_64-linux]
Wed, 04 Sep 2024 21:00:38 GMT
COMMAND LINE:
/opt/src/ruby-3.3.5/bin/ruby --yjit -W0 regexredux.ruby-9.ruby 0 < regexredux-input5000000.txt
PROGRAM OUTPUT:
agggtaaa|tttaccct 356
[cgt]gggtaaa|tttaccc[acg] 1250
a[act]ggtaaa|tttacc[agt]t 4252
ag[act]gtaaa|tttac[agt]ct 2894
agg[act]taaa|ttta[agt]cct 5435
aggg[acg]aaa|ttt[cgt]ccct 1537
agggt[cgt]aa|tt[acg]accct 1431
agggta[cgt]a|t[acg]taccct 1608
agggtaa[cgt]|[acg]ttaccct 2178
50833411
50000000
27388361