1

Я хочу конвертировать .wav файл в текст, используя плату Intel Edison. Я следовал за этим потоком и использовал команду pocketsphinx_continuous -infile, как предложено в потоке. Это дает длинный вывод CLI. Не уверен, как извлечь текст из этого. Кто-нибудь может помочь?

root@edison:/# pocketsphinx_continuous -infile /usr/share/sounds/alsa/Front_Right.wav
INFO: cmd_ln.c(691): Parsing command line:
pocketsphinx_continuous \
        -infile /usr/share/sounds/alsa/Front_Right.wav

Current configuration:
[NAME]          [DEFLT]         [VALUE]
-adcdev
-agc            none            none
-agcthresh      2.0             2.000000e+00
-alpha          0.97            9.700000e-01
-argfile
-ascale         20.0            2.000000e+01
-aw             1               1
-backtrace      no              no
-beam           1e-48           1.000000e-48
-bestpath       yes             yes
-bestpathlw     9.5             9.500000e+00
-bghist         no              no
-ceplen         13              13
-cmn            current         current
-cmninit        8.0             8.0
-compallsen     no              no
-debug                          0
-dict
-dictcase       no              no
-dither         no              no
-doublebw       no              no
-ds             1               1
-fdict
-feat           1s_c_d_dd       1s_c_d_dd
-featparams
-fillprob       1e-8            1.000000e-08
-frate          100             100
-fsg
-fsgusealtpron  yes             yes
-fsgusefiller   yes             yes
-fwdflat        yes             yes
-fwdflatbeam    1e-64           1.000000e-64
-fwdflatefwid   4               4
-fwdflatlw      8.5             8.500000e+00
-fwdflatsfwin   25              25
-fwdflatwbeam   7e-29           7.000000e-29
-fwdtree        yes             yes
-hmm
-infile                         /usr/share/sounds/alsa/Front_Right.wav
-input_endian   little          little
-jsgf
-kdmaxbbi       -1              -1
-kdmaxdepth     0               0
-kdtree
-latsize        5000            5000
-lda
-ldadim         0               0
-lextreedump    0               0
-lifter         0               0
-lm
-lmctl
-lmname         default         default
-logbase        1.0001          1.000100e+00
-logfn
-logspec        no              no
-lowerf         133.33334       1.333333e+02
-lpbeam         1e-40           1.000000e-40
-lponlybeam     7e-29           7.000000e-29
-lw             6.5             6.500000e+00
-maxhmmpf       -1              -1
-maxnewoov      20              20
-maxwpf         -1              -1
-mdef
-mean
-mfclogdir
-min_endfr      0               0
-mixw
-mixwfloor      0.0000001       1.000000e-07
-mllr
-mmap           yes             yes
-ncep           13              13
-nfft           512             512
-nfilt          40              40
-nwpen          1.0             1.000000e+00
-pbeam          1e-48           1.000000e-48
-pip            1.0             1.000000e+00
-pl_beam        1e-10           1.000000e-10
-pl_pbeam       1e-5            1.000000e-05
-pl_window      0               0
-rawlogdir
-remove_dc      no              no
-round_filters  yes             yes
-samprate       16000           1.600000e+04
-seed           -1              -1
-sendump
-senlogdir
-senmgau
-silprob        0.005           5.000000e-03
-smoothspec     no              no
-svspec
-time           no              no
-tmat
-tmatfloor      0.0001          1.000000e-04
-topn           4               4
-topn_beam      0               0
-toprule
-transform      legacy          legacy
-unit_area      yes             yes
-upperf         6855.4976       6.855498e+03
-usewdphones    no              no
-uw             1.0             1.000000e+00
-var
-varfloor       0.0001          1.000000e-04
-varnorm        no              no
-verbose        no              no
-warp_params
-warp_type      inverse_linear  inverse_linear
-wbeam          7e-29           7.000000e-29
-wip            0.65            6.500000e-01
-wlen           0.025625        2.562500e-02

INFO: cmd_ln.c(691): Parsing command line:
\
        -nfilt 20 \
        -lowerf 1 \
        -upperf 4000 \
        -wlen 0.025 \
        -transform dct \
        -round_filters no \
        -remove_dc yes \
        -svspec 0-12/13-25/26-38 \
        -feat 1s_c_d_dd \
        -agc none \
        -cmn current \
        -cmninit 56,-3,1 \
        -varnorm no

Current configuration:
[NAME]          [DEFLT]         [VALUE]
-agc            none            none
-agcthresh      2.0             2.000000e+00
-alpha          0.97            9.700000e-01
-ceplen         13              13
-cmn            current         current
-cmninit        8.0             56,-3,1
-dither         no              no
-doublebw       no              no
-feat           1s_c_d_dd       1s_c_d_dd
-frate          100             100
-input_endian   little          little
-lda
-ldadim         0               0
-lifter         0               0
-logspec        no              no
-lowerf         133.33334       1.000000e+00
-ncep           13              13
-nfft           512             512
-nfilt          40              20
-remove_dc      no              yes
-round_filters  yes             no
-samprate       16000           1.600000e+04
-seed           -1              -1
-smoothspec     no              no
-svspec                         0-12/13-25/26-38
-transform      legacy          dct
-unit_area      yes             yes
-upperf         6855.4976       4.000000e+03
-varnorm        no              no
-verbose        no              no
-warp_params
-warp_type      inverse_linear  inverse_linear
-wlen           0.025625        2.500000e-02

INFO: acmod.c(246): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(903): Loading senones from dump file /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(927): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1022): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1296): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 137543 * 20 bytes (2686 KiB) for word entries
INFO: dict.c(332): Reading main dictionary: /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic
INFO: dict.c(211): Allocated 1010 KiB for strings, 1664 KiB for phones
INFO: dict.c(335): 133436 words read
INFO: dict.c(341): Reading filler dictionary: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=5001, 2=436879, 3=418286
INFO: ngram_model_dmp.c(242):     5001 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288):   436879 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314):   418286 = LM.trigrams read
INFO: ngram_model_dmp.c(339):    37293 = LM.prob2 entries read
INFO: ngram_model_dmp.c(359):    14370 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379):    36094 = LM.prob3 entries read
INFO: ngram_model_dmp.c(407):      854 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(463):     5001 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 13428
INFO: ngram_search_fwdtree.c(338): after: 457 root, 13300 non-root channels, 26 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(371): pocketsphinx_continuous COMPILED ON: May 11 2016, AT: 01:08:03

INFO: ngram_search.c(474): Resized backpointer table to 10000 entries
INFO: ngram_search.c(482): Resized score stack to 200000 entries
INFO: cmn_prior.c(121): cmn_prior_update: from < 56.00 -3.00  1.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
INFO: cmn_prior.c(139): cmn_prior_update: to   < 40.70  3.65  2.47 -0.18  1.26  0.52  0.85  0.40 -0.07  0.56  0.30  0.10  0.59 >
INFO: ngram_search_fwdtree.c(1549):     6629 words recognized (25/fr)
INFO: ngram_search_fwdtree.c(1551):   960065 senones evaluated (3609/fr)
INFO: ngram_search_fwdtree.c(1553):  1491379 channels searched (5606/fr), 119734 1st, 172330 last
INFO: ngram_search_fwdtree.c(1557):    12770 words for which last channels evaluated (48/fr)
INFO: ngram_search_fwdtree.c(1560):   165129 candidate words for entering last phone (620/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 4.05 CPU 1.523 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 4.10 wall 1.541 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 146 words
INFO: ngram_search_fwdflat.c(937):     3683 words recognized (14/fr)
INFO: ngram_search_fwdflat.c(939):   249390 senones evaluated (938/fr)
INFO: ngram_search_fwdflat.c(941):   324546 channels searched (1220/fr)
INFO: ngram_search_fwdflat.c(943):    16896 words searched (63/fr)
INFO: ngram_search_fwdflat.c(945):     9422 word transitions (35/fr)
INFO: ngram_search_fwdflat.c(948): fwdflat 0.55 CPU 0.207 xRT
INFO: ngram_search_fwdflat.c(951): fwdflat 0.56 wall 0.211 xRT
INFO: ngram_search.c(1214): </s> not found in last frame, using <sil>.264 instead
INFO: ngram_search.c(1266): lattice start node <s>.0 end node <sil>.236
INFO: ngram_search.c(1294): Eliminated 17 nodes before end node
INFO: ngram_search.c(1399): Lattice has 317 nodes, 715 links
INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(<sil>:236:264) = -1833242
INFO: ps_lattice.c(1403): Joint P(O,S) = -1847205 P(S|O) = -13963
INFO: ngram_search.c(888): bestpath 0.05 CPU 0.019 xRT
INFO: ngram_search.c(891): bestpath 0.06 wall 0.021 xRT
000000000: who do
INFO: ngram_search_fwdtree.c(430): TOTAL fwdtree 4.05 CPU 1.528 xRT
INFO: ngram_search_fwdtree.c(433): TOTAL fwdtree 4.10 wall 1.547 xRT
INFO: ngram_search_fwdflat.c(174): TOTAL fwdflat 0.55 CPU 0.208 xRT
INFO: ngram_search_fwdflat.c(177): TOTAL fwdflat 0.56 wall 0.212 xRT
INFO: ngram_search.c(317): TOTAL bestpath 0.05 CPU 0.019 xRT
INFO: ngram_search.c(320): TOTAL bestpath 0.06 wall 0.021 xRT

1 ответ1

4

Чтобы отключить отладочный вывод pocketsphinx, добавьте опцию -logfn /dev/null , тогда pocketsphinx напечатает только декодированный текст, в вашем случае он напечатает

 000000000: who do

Всё ещё ищете ответ? Посмотрите другие вопросы с метками .