The Computer Language
24.11 Benchmarks Game

k-nucleotide Ruby yjit #8 program

source code

# The Computer Language Benchmarks Game
#   https://salsa.debian.org/benchmarksgame-team/benchmarksgame/
#
#   Naive transliteration from bearophile's program
#   contributed by Isaac Gouy 

def seq_lines
   lines = Array.new
   loop do
      line = $stdin.gets()
      break if line.nil? | line&.start_with?(">THREE")
   end 
   loop do
      line = $stdin.gets()
      break if line.nil? | line&.start_with?(">")
      lines << line.chomp   
   end
   return lines
end

def base_counts(bases, seq)  
   counts = {}
   size = seq.length + 1 - bases
   for i in 0 ... size  
      nucleo = seq[i...i+bases]      
      if counts.key?(nucleo)  
         counts[nucleo] += 1  
      else
         counts[nucleo] = 1
      end      
   end
   return counts   
end

def sorted_freq(bases, seq)
   kv_ = base_counts(bases, seq)
   size = seq.length + 1 - bases  
   sorted_ = kv_.to_a.sort {|a,b| b.last <=> a.last} 
   return sorted_.collect {|a| [a.first, 100.0 * a.last  / size]}   
end

def specific_count(code, seq)  
    return base_counts(code.length, seq)[code] 
end

def main
   lines = seq_lines
   seq = lines.map {|s| s.upcase!}.join
   
   for base in 1..2
      for kv in sorted_freq(base, seq)
         puts "%s %.3f" % [kv[0], kv[1]]
      end
      puts
   end
   
   for code in ["GGT", "GGTA", "GGTATT",
         "GGTATTTTAATT", "GGTATTTTAATTTATAGT"]    
      puts "#{specific_count(code, seq).to_s}\t#{code}"
   end
end

if __FILE__ == $0
   main
end
    

notes, command-line, and program output

NOTES:
64-bit Ubuntu quad core
ruby 3.3.5
(2024-09-03
revision ef084cc8f4)
+YJIT [x86_64-linux]


 Fri, 18 Oct 2024 23:37:54 GMT

COMMAND LINE:
 /opt/src/ruby-3.3.5/bin/ruby --yjit -W0 knucleotide.ruby-8.ruby 0 < knucleotide-input25000000.txt

PROGRAM OUTPUT:
A 30.295
T 30.151
C 19.800
G 19.754

AA 9.177
TA 9.132
AT 9.131
TT 9.091
CA 6.002
AC 6.001
AG 5.987
GA 5.984
CT 5.971
TC 5.971
GT 5.957
TG 5.956
CC 3.917
GC 3.911
CG 3.909
GG 3.902

1471758	GGT
446535	GGTA
47336	GGTATT
893	GGTATTTTAATT
893	GGTATTTTAATTTATAGT