Department of Computer Science and Statistics

Seminars

Old Seminars: Spring '07

Computer Science Research Seminar Series: Introduction to Graduate Research Projects

Dept. of Computer Science and Statistics
University of Rhode Island

Date and Time: Friday, September 28, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

2:00 Dr. Lamagna
2:12 Dr. Peckham
2:24 Dr. Hamel
2:36 Dr. DiPippo
2:48 Dr. Fay-Wolfe

Language Design, Communication, and Programming

Chris Fry
Co-Founder and Chief Scientist Clear Methods
Cambridge MA

Date and Time: Friday, October 12, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

In case it wasn't already crystal clear, this millennium's favorite tools: email, cell phones, text messaging, and Facebook clinch the conclusion: Humans are communications machines. If we delve deeper than our peripheral senses and the hardware and software we use for communicating, we observe that a language or some combination of languages underlie complex, abstract communications. Take, for instance, the communications task of teaching a computer how to become smart enough to communicate well with another human in some limited domain. This usually entails more UI than AI and generally goes by a simpler name: programming. You program in a language, usually one that's unnecessarily difficult to learn and not well suited to your domain. A good way to reduce the complexity of programming is to write a language suited to your domain, THEN write the program in that new language. But wait, isn't writing a language even harder than just coding an application ad hoc? Not if you start with a language designed to write languages in and have the "language designer" mindset when you begin. This talk will start you down that path.

Bio: In High School, Fry was mediocre in French. As an undergrad only fair at Spanish and later became a C programmer who couldn't debug his own code. Then he discovered Lisp at the MIT AI Lab and realized the problem was not so much his brain but the convoluted conventions of languages designed for inhuman machines. The rest of his life has been spent working on tools that make programming easier. Fry is a founder of Clear Methods and the first guy to work on the Water language.

Learning Fuzzy Systems from Data: A Review and Proposal

Marcos Campus
Oracle Technology Development Manager for the Data Mining Technologies Group

Date and Time: Friday, October 26, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

Fuzzy systems are widely used in many domains due to their ability to approximate complex relationships with relatively compact and transparent models. However, when learning fuzzy systems from data, one is usually presented with a trade-off between model accuracy and transparency. Accurate models can be quickly learned at the expense of transparency or the time required training the model.

This talk will review some of the approaches used for learning fuzzy systems from data, highlighting their strengths and weaknesses. It will also present some ideas on how to quickly learn fuzzy systems from data for classification and regression tasks that are easy to interpret and have good accuracy.

BIO: Marcos M. Campos is Technology Development Manager for the Data Mining Technologies group at Oracle. Previously he was a Senior Scientist with Thinking Machines. Over the years he has been working on transforming databases into easy to use analytical servers. He has created, led, and participated in the design and implementation of multiple in-database analytics projects, including: Oracle Personalization, a distributed real-time recommendation system, and Oracle Data Mining. He is an active contributor to the machine learning community through technical papers and participation in industry standards: PMML (XML) and JDM (Java Data Mining specification).

Computer Vision Determination of 3-D Geometric Parameters of LDL Particles via Cryogenic Transmission Electron Microscopy

Lewis Collier
President Capstone Visual Product Development

Date and Time: Friday, November 16, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

Previous research has shown that the size of the LDL can have an effect on cardiovascular health and that LDL macromolecules may be non-spherical in shape. Some of these studies, however, used methods that are not conducive to automatic determination of the 3-D parameters of the particles. In particular, the prior methods used for determination of geometric parameters determination were either centrifugal separations or manual determination of parameters from cryogenic transmission electron micrographs. An application of computer vision techniques to automatically determine the 3-D parameters from cryogenic transmission electron microscopy (Cryo-EM) images will be described. Correlation of computer-generated geometric models to the projection Cryo-EM imagery were investigated to determine applicability of determining pertinent geometric parameters of the expected discoid shape of the LDL particles. The processing shows that the discoid shape can be verified using small angle rotations that are more amenable to the limitations of Cryo-EM imaging.

Bio: Lewis is President of Capstone Visual Product Development, a small business located in Exeter RI that specializes in development of video and image processing systems. Before founding Capstone Visual in 2001, he was a Corporate Engineer at Anteon Corp in Mystic CT where he led development of large scale efforts for signal processing, control systems, and information display for the US Navy. He received his MSEE from URI and a PS in Physics and a BA in Math from the University of North Carolina at Chapel Hill. He is currently working towards his PhD in Computer Science at URI.

Text Mining in a Nutshell

Tonia Durfee
Appalachian State University

Date and Time: Friday, November 30, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

Web 2.0 with its wikis and blogs allows many users to produce, store, share, manipulate and retrieve many types, shapes and sizes of data. Previous generations of digitalization and read-only Internet allowed fast access to mostly structured numeric or alphanumeric data types. Structured data came from many transaction-based or measurement systems aimed at creating and providing accurate access to facts. As opposed to structured data, which typically resides in tightly controlled applications, unstructured data such as text and video either does not have a specific structure or has a structure that is not easily readable and interpretable by a computer. Unstructured data is an appealing and natural way to convey messages among people. Storage systems and computer processing power enable creation and storage of massive quantities of both structured and unstructured data. This data holds a tremendous potential for analysis and knowledge sharing. Computer scientists and statisticians have promoted this as a computers ability to discover previously unknown pattern(s) that can be useful for particular purpose(s). As long as a business or a scientific unit accumulates enough data, they are promised the ability to discover Ògolden nuggetsÓ of information to provide sustainable competitive advantage and answers to challenging problems. The discovery became branded as knowledge discovery from databases, data mining or predictive analytics processes. Data discovery builds on pattern recognition, association, classification, and prediction techniques applied to various data types: structured data streams such as stock market time series or biological sequence, multirelational data such as graphs and social networks, and spatial and multimedia data such as text, audio and video.

Text is the most popular vehicle of modern communication from which semantic structure is opened for a multitude of interpretations. The meaning of any textual document depends on a context and the comprehension ability of a reader. Structural principles exist in the formation of words, in the creation of grammatical sentences, and representation of meaning. The authors and readers of the text often represent the same semantics using different words or describe different meanings using words that have various meanings. Morphology and syntax form a foundation for modern information retrieval systems such as document management systems, automatic thesauruses, and search engines based on keywords, co-occurrences, indexes or meta text properties, such as author, subject, type, word count, printed page count, and time last written. Keyword matching approaches, while powerful, remain bound to word counts, dictionary composition or word choice. Automatic tagging, keyword based association analysis, and clustering should be employed for multi-level analysis of complexity of syntactic construction, and multi-variance of the interpretations discussed above. Text mining technologies aim at discovery of previously unknown golden nuggets from large volumes of text which freely resides on the Internet, in corporate intranets, online libraries and emails. The user of TM technology is furnished with an ability to automatically categorize, prioritize, compare documents, and understand and utilize the meaning of any particular document without browsing, reading and analyzing an entire document collection. Text mining pattern recognition is built on the recognition of specific word choice, grammar constructions and other stylistically characteristics which are inherent for any given author. As a result text mining is widely used for authorship attribution, deception and plagiarism detection, summarization and content comparison and visualization. This has its roots in computational linguistics, natural language processing, content analysis, cognitive psychology, information retrieval, machine learning, statistics, and information and library sciences.

The focus of my talk will be an overview of the main techniques and cutting edge approaches that have been developed for text mining, a presentation of sample text mining applications and a discussion of the limitations of text mining.