Skip to content

Commit d904a9e

Browse files
committed
updates
1 parent 3e6eed0 commit d904a9e

File tree

2 files changed

+8
-8
lines changed

2 files changed

+8
-8
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ Text source: 79.4MB in 30 files
88
- Rust 1.51.0 with sorting: 7s, without sorting: 5s (no parallelism)
99
- Go 1.16.4 (parallel) with sorting: 7.32s, without sorting: 4.06s
1010
- Python 3.9.5 with sorting: 10s, without sorting 8.32s (no multiprocessing)
11+
- Crystal 1.0.0 with sorting: 17s, without sorting: 7s (non optimized sort, no parallelism)
1112
- Go 1.16.4 with sorting: 21s, without sorting: 11s (no parallelism)
12-
- Crystal 1.0.0 with sorting: 35s, without sorting: 7s (non optimized sort, no parallelism)
1313
- Elixir 1.12 (parallel) with sorting: 33s (without release build)
1414

1515

words_extractor_cr/src/fast_words_cr.cr

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,17 @@ module FastWordsCr
1919
def self.worker(path, outpath)
2020
text = File.read(path.gsub(".yml", ".txt")).gsub("\n", " ").downcase
2121

22-
# 35sec
23-
# words_json = (text.split(/[^\p{L}]+/).to_set - Set{""}).to_json.downcase
24-
# words = Array(String).from_json(words_json).sort { |x, y| self.word_cmp(x, y) }
22+
# 17sec
23+
words_json = (text.split(/[^\p{L}]+/).to_set - Set{""}).to_json.downcase
24+
words = Array(String).from_json(words_json).sort { |x, y| self.word_cmp(x, y) }
2525

26-
# 35s
26+
# 19s
2727
# words = (text.split(/[^\p{L}]+/).to_set - Set{""}).to_a.sort do |x, y|
2828
# self.word_cmp(x, y)
2929
# end
3030

3131
# 7s (no sort)
32-
words = (text.split(/[^\p{L}]+/).to_set - Set{""}).to_a
32+
# words = (text.split(/[^\p{L}]+/).to_set - Set{""}).to_a
3333

3434
meta = File.open(path) { |file| YAML.parse(file) }
3535
filepath = %Q(#{outpath}/słowa - #{meta["label"]}.txt)
@@ -45,8 +45,8 @@ module FastWordsCr
4545
end
4646

4747
def self.word_cmp(str1 : String, str2 : String, charset = "aąbcćdeęfghijklłmnńoópqrsśtuvwxyzźż")
48-
tokens1 = str1.downcase.split("")
49-
tokens2 = str2.downcase.split("")
48+
tokens1 = str1.downcase.chars
49+
tokens2 = str2.downcase.chars
5050
tokens1.each_with_index do |s1, i|
5151
return 1 unless tokens2[i]?
5252
idx1 = charset.index(s1) || -1

0 commit comments

Comments
 (0)