The Computer Language
Benchmarks Game

regex-redux Matz's Ruby #3 program

source code

# The Computer Language Benchmarks Game
# https://salsa.debian.org/benchmarksgame-team/benchmarksgame/
#
# Rewrite for regex-redux by Aaron Tavistock

class RegexRedux

  MATCHERS = [
    /agggtaaa|tttaccct/,
    /[cgt]gggtaaa|tttaccc[acg]/,
    /a[act]ggtaaa|tttacc[agt]t/,
    /ag[act]gtaaa|tttac[agt]ct/,
    /agg[act]taaa|ttta[agt]cct/,
    /aggg[acg]aaa|ttt[cgt]ccct/,
    /agggt[cgt]aa|tt[acg]accct/,
    /agggta[cgt]a|t[acg]taccct/,
    /agggtaa[cgt]|[acg]ttaccct/
  ]

  FINAL_TRANSFORM = {
    /tHa[Nt]/ => '<4>', 
    /aND|caN|Ha[DS]|WaS/ => '<3>', 
    /a[NSt]|BY/ => '<2>', 
    /<[^>]*>/ => '|',
    /\|[^|][^|]*\|/ => '-'
  }

  def initialize(io)
    @seq = io.readlines.join
    @original_size = @seq.size
    @clean_size = remove_breaks!
    @match_results = match_results
    @final_size = final_transform!
  end

  def to_s
    "%s\n\n%d\n%d\n%d" % [
      @match_results.join("\n"),
      @original_size,
      @clean_size,
      @final_size
    ]
  end

  def pattern_count(regex)
    count = 0
    @seq.scan(regex) { count += 1 }
    "#{regex.source} #{count}"
  end 

  def forked_pattern_count(regex)
    reader, writer = IO.pipe
    Process.fork do
      reader.close
      writer.write(original_pattern_count(regex))
    end

    writer.close
    results = reader.read
    reader.close
  
    results
  end

if (RUBY_PLATFORM != 'java') 
    alias_method :original_pattern_count, :pattern_count
    alias_method :pattern_count, :forked_pattern_count
  end

  def remove_breaks!
    @seq.gsub!(/>.*\n|\n/, '')
    @seq.size
  end

  def match_results
    threads = MATCHERS.map do |matcher|
      Thread.new do
        Thread.current[:result] = pattern_count(matcher)
      end
    end
    threads.each(&:join)
    threads.map { |t| t[:result] }
  end

  def final_transform!
    FINAL_TRANSFORM.each { |f,r| @seq.gsub!(f,r) }
    @seq.size
  end

end

regex_redux = RegexRedux.new(STDIN)
puts regex_redux.to_s

    
    

notes, command-line, and program output

NOTES:
64-bit Ubuntu quad core
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
 
So old that I haven't been able to get rubygems to work
no backport, no gmp


Sat, 04 Jan 2020 18:22:34 GMT

COMMAND LINE:
/usr/bin/ruby regexredux.mri-3.mri 0 < regexredux-input50000.txt

UNEXPECTED OUTPUT 

13c13
< 535239
---
> 273927

PROGRAM OUTPUT:
agggtaaa|tttaccct 3
[cgt]gggtaaa|tttaccc[acg] 12
a[act]ggtaaa|tttacc[agt]t 43
ag[act]gtaaa|tttac[agt]ct 27
agg[act]taaa|ttta[agt]cct 58
aggg[acg]aaa|ttt[cgt]ccct 16
agggt[cgt]aa|tt[acg]accct 15
agggta[cgt]a|t[acg]taccct 18
agggtaa[cgt]|[acg]ttaccct 20

508411
500000
535239