VoxPopuLin

Third Meeting - Open Source Speech Recognition Initiative
Place: OSSRI Listserv

The Third Directors meeting of the Open-Source Speech Recognition
Initiative came to order at 10:00 AM GMT.

The Clerk apologizes for the lateness of the start of the meeting and
has extended the time to respond 12 hours, to Friday, April 9th at 12:00
noon GMT.

The Clerk reminded OSSRI Members (those who have asked to become members
and who have submitted names and other information) and Friends
(LISTSERV members) that they may comment on the voted items but are not
permitted to vote.

ATTENDING

DIRECTORS:
Susan Cragin
Jessica Hekman
Dustin Wish
John Dowding
Volker Kuhlmann
Ivan Uemlianin

ABSENT: Ed Suominen

MEMBERS:
Turner Rentz III
Christophe Gerard
Tristram Metcalf

OPENING REMARKS:
Ivan: Hi everybody, where's the coffee? ...

AGENDA:

1. Nomination and amount change of Directors.

To add two new directors making the total nine and adding Tristram
Metcalfe and Christophe Gerard to the board. To amend the OSSRI Charter,
By-Laws, and other information to reflect this.

VOTE:
YES-6 NO-0 ABS-0

DISCUSSION:
Turner: Should add Turner Rentz to this list.
Volker: Welcome!
Ivan: Welcome. Glad to have them both on board.

2. Making WinDictator an offical project for OSSRI.

VOTE:
YES-6 NO-0 ABS-0

DISCUSSION:
John: I'd like to see OSSRI be a "big tent" supporting lots of
activities. I think our charter can include support for
command-and-control, large-vocabulary dictation, and other
specialized applications. That doesn't mean, though, that when it
comes time to decide how to best apply OSSRI's resources (when we have
some), that I think WinDictator should be the top priority.
Is it agreed that large-vocabulary dictation is our long-term highest
priority?

Turner: WinDictator does not represent the subset of code we need
to properly execute OSSRI Initiative.

Volker: It is a worthwhile project and would provide a (one of several
possible) speech recognition solutions in the short to medium term. As
it's not exactly taking up financial OSSRI resources we can only win
from adopting it.

3. Welcome of new members.
Eric Johansson
Arthur Chan

Discussion:
Susan: The listserv now has 46 names on it. Welcome new friends.
Volker: Welcome!
Ivan: Welcome.

4. Status Report on Incorporation, given by Susan Cragin
Our pro bono law firm, Goodwin Procter of Boston, has submitted to us
the information that will be filed with the Massachusetts Secretary of
State. The information is somewhat generic and geared toward getting us
favorable not-for-profit US Federal tax status.
The form is not of general interest; however, I will forward it to
anyone who requests it, and the directors have seen a copy.
Currently, they are evaluating our by-laws, hence the necessity to have
them altered and formalized as soon as possible.
Our attorneys at Goodwin Procter are Keith Ranta and Matthew Terry.

DISCUSSION:
John: I've been expecting to receive some paperwork to sign and send
back, but it has not arrived. Is that still pending, or has it been
overtaken by events?
Susan: The law firm is slow, that's all. I'll call them Monday.

Volker: I assume the forms previously sent need to be amended for new
directors and bylaws, and were for information only. If something needs
to be signed that would be no problem, no problem with printing out some
form and mailing it back either (format would need to be pdf, open
office, or something which the MSWin98 wordviewer can handle, or
anything else portable).

NEW ITEMS FOR DISCSSION ONLY:

Susan: OSSRI is now discussing the technical direction OSSRI will take.
I believe this discussion should be open to all, and that most items not
strictly about incorporation, or not confidential, should be forwarded
to the OSSRI list for general discussion, and not just to the directors
list.

Turner: Think we should formally adopt FESTIVAL as our codebase for TTS.
We should begin synthesis of our our TTS + we should look into working
forward from an ASR engine/ Any continuous speech should be done from
the perspective of limited training, and directed dialogue. Going for
the full continuity of speech processing right now is premature and will
require end user training. Recommend we start from CMU Sphinx.

Jessica: I'm not sure what this means. We want to begin synthesis of our
TTS, but we want to use Festival? So what are we synthesizing?
And
I'm not sure what "full continuity of speech processing" means. I do
think that we absolutely need to be thinking about a dictionary-type
speech recognition engine, and that this is not something to put off.

Turner: You synthesize the voice itself mon.
Festival - is this formant? The voice talent itself is
under copyright. So you can't for example - use speechify "mara"
coz its copyright scansoft.
Festival itself has voices. If its not formant, we simply build
our own using its prosody engine. If we're going for OSSRI the prosody
work is valuable in addition to having an exclusive TTS that we can
call home. That way we can extend Festival to be usable for our own
software, and according to fair use, end up with a package.
Plus TTS is a great way to get started.

Dustin: Thanks Turner, I didn't know about the copyright issue with
Festival. I do think it brings a lot to the table. I have used various
TTS programs and voices and think it should be an easy issue to
overcome. Have you used Festival much? How hard is its prosody engine to
work with? I do like the idea of having an engine OSSRI engine and/or
voice all our own. I think that could benefit us greatly.

Tris: I would like to place under the "big tent" a future project
potential just for the record now;
The Deaf (& HOH) will very likely some day to be able to communicate
fully with the rest of the world through use of voiced I/O. They will
utilize at that point in time, the state of the art in;
1. ASR that recognizes and transmits the actual voiced phonetic sounds
in speech. (including Pitch, Volume & Timbre).
2. User-friendly visual Displays. (Wearable systems continue to slowly
march ever onward) The Voiced sounds recorded very importantly include
recognizing their own actual sounds produced (or NASA's virtual
soundless?) for the required feedback in correcting an effective voice
creation.
*A* solution to connecting all deaf and hard of hearing speech to
all of humanity, is through use of actual voiced sounds using a
potential Phonetic Text Display concept. The PTD would not replace any
other ASR systems but would be an add on to them as an end points
interface,, or it could stand alone as just itself; (th i s r ee k w ah
er s l er n ee ng th u f oe r t ee f oe r p l u s f oe n ee m s).
The graphic display potentials are fairly unlimited and the user choice
to read phonemes or instantly later correctly translated words, also
becomes part of the world system.

VoxPopuLin

Monday, April 12, 2004

0 Comments:

About Me

Previous Posts