Department of Computer Science and Statistics

Seminars

2006 - 2007 | 2007 - 2008 | 2008 - 2009
The Internet Democracy: Election Prediction Through Mining Internet Data

Scott Pion
Dept. of Computer Science and Statistics
University of Rhode Island

Date and Time: Friday, February 9, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

This presentation describes a system that predicts election results by mining internet data. These predictions were achieved by counting the number of results that were returned by an internet search engine when searching for exact phrases. Each result from the search query was considered to be a vote for an individual candidate.

The idea that counting internet results could predict an election is based on the wisdom of crowds hypothesis. This hypothesis states that under the proper conditions the aggregated opinion of a large number of non-experts is more accurate than a smaller number of experts.

The results indicated that in some cases when the error was lowest, the system predicted the elections correctly 74% of the time.

Clustering the Short Stories of Edgar Allan Poe: A Text Mining Application

Dr. Roger Bilisoly
Department of Mathematical Sciences
Central Connecticut State University

Date and Time: Friday, February 16, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

Edgar Allan Poe wrote seventy short stories in his lifetime, and literary critics have categorized these stories in many ways, e.g., by genres such as horror, detective or proto-science fiction. This talk discusses how a computer can group stories by using families of words related by a theme, e.g., words denoting colors.

This approach combines two different techniques. First, we use term-document matrices, which were originally developed for document searches in the field of information retrieval. Second, we use formal concept theory, which defines concepts in a way that forms a Galois lattice. These lattices have both a well developed mathematical basis and have been used in applications beyond the computer sciences, e.g., social networks in mathematical sociology.

Finally, we will discuss how meaningful these groups of stories are to a human reader.

Bipartition Visualization Using Self Organizing Maps

Neha Nahar
Dept. of Computer Science and Statistics
University of Rhode Island

Date and Time: Friday, February 23, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

Study has shown that early life on Earth has left variety of traces such as fossil and geological records, and most importantly information retained in living organisms. These traces can be utilized to reconstruct the history of life. In particular, evolutionary traces can be obtained from molecular records i.e. information about the history of life that is retained in structure and sequence of macromolecules found in extant organisms. Phylogenetics is the taxonomical classification of organisms based on how closely the organisms are related in terms of evolutionary differences. Bipartitions is one of the methods to represent phylogenetic information.

This seminar describes the envisioned tool that will enable scientists and researchers to trace the history of different parts of the cellular and metabolic machinery through time, thereby contributing to a better understanding of the early history of life on Earth. It will help researchers to identify genes that share common evolutionary history and those that do not. Given the bipartition matrix, the tool will be useful for the comparative genomic analysis of any living organisms, especially prokaryotic (bacteria and archea). It will generate a consensus tree for given organisms and also report the strongly supported and conflicting bipartitions. It uses SOM and focuses on visual clustering to detect the structure of the input data based on the similarity between points in high dimensional space.

An XQuery Servlet for RESTful Data Services

Jonathan Robie
Data Direct Technologies

Date and Time: Friday, March 2, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

Many servlets do nothing more than integrate data from multiple sources to create an XML or HTML result; in other servlets, this is a significant portion of the code. As an XML-oriented data integration language, XQuery is a particularly simple, productive, and efficient way to do this task. In this presentation, I show a servlet that provides a REST interface to any XQuery that a developer places in a secure deployment directory on an application server, then demonstrate the development of data services by writing XQueries that access XML, relational, and flat file formats such as EDI to create complex XML and HTML results, then copying to the deployment directory.

In this environment, each XQuery inherently defines a REST interface. I will develop queries by dragging and dropping from relational, XML, and EDI (or other flat file formats) into the text of an XQuery, using a standard XQuery GUI environment, then copy queries into a deployment directory, and invoke them using a web browser. The servlet will run under Apache Tomcat.

XML data is queried directly using XQuery, with an implement that uses document projection and streaming so that large XML files can be handled efficiently. Relational data is queried by converting XQuery to efficient SQL, executing the SQL, and returning the results as XML. EDI and flat file formats are queried by converting them physically to XML and querying them using the same document projection and streaming techniques used for querying XML documents.

The REST interface to a query consists of the name of the servlet and URI parameters that identify the name of the query and the query’s external variables. A query may also use the variable $content, which is bound to the content of an HTTP request if present.

Preparing queries dramatically improves performance. Each time a query is invoked by a client, the servlet does the following:

  • If this query has been prepared and is up-to-date, the prepared query is used. If the query has not been prepared, or a more recent version of the query exists in the deployment directory, the query is prepared.
  • All URI parameters are bound to the query as external variables.
  • The query is executed, and query results are returned to the client.

Jonathan Robie is the XQuery Technology Lead at DataDirect Technologies, and was recognized as an InfoWorld Innovator of the Year for his work on XML query languages. He is a lead designer of XQuery, the W3C XML Query language, and an editor of many of the specifications which define the XQuery language. He is also a co-inventor of Quilt, a predecessor of XQuery, and XQL, a predecessor of XPath. Jonathan has been significantly involved in several other W3C Working Groups, acting as an editor for documents produced by the XML Schema and Document Object Model Working Groups, and has also participated in the W3C XML Information Set and XML Stylesheet Language (XSL) Working Groups. He is well known in the XML world, both as an innovator and as a speaker.

Prior to joining DataDirect, Jonathan worked as an XML Research Specialist at Software AG, where he helped design architectures for XML servers and represented Software AG on the XML Query and XML Schema Working Groups. He has been on the architecture team for three XML databases or repositories, at Software AG, Texcel Research, and POET Software. He has a total of 18 years experience with advanced database systems and complex database applications, especially object oriented databases, multimedia databases, workgroup database applications, and XML/SGML databases.

Hacking the World with Microcontrollers

Brian Jepson
O'Reiley Publishers

Date and Time: Friday, March 9, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

Microcontrollers are small computer-on-a-chip devices. They can be wired up to a breadboard or printed circuit board to make a cheap, low-power, quick and dirty control system for lots of different projects. They've been used to make game systems such as the Mignon Game Kit (http://www.olafval.de/mignon/english/index.htm) and the XGameStation Pico (http://makezine.com/xgamestation/). They're also used to power general application boards such as the Make Controller (http://www.makezine.com/controller/) and Arduino (http://www.arduino.cc/), both of which can be programmed using open source toolkits.

In this session, I'll provide a brief overview of the available
microcontrollers and microcontroller kits out there, show how easily they can be programmed, and demonstrate a few cool things.

Brian Jepson is an Editor and Hacker for Make:. He's also one of the founders of Providence Geeks (http://www.providencegeeks.org) and helps keep the bits flowing over at AS220 (http://www.as220.org).

An Overview and Example of Data Mining

Dr. Daniel Larose
Dept. of Mathematical Sciences
Central Connecticut State University

Date and Time: Friday, March 30, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

We begin by asking what data mining is, and why it is needed. We discuss guarding against data dredging, and focus on the need for human direction of data mining. We emphasize that “data mining is easy to do badly”, because of the powerful black-box point-and-click analytical software that exists, and that therefore an understanding of the underlying algorithmic and statistical structures is crucial. Finally, we stress that, in order to avoid financial losses incurred by naďve application of data mining software, corporations should insure that human analysts be involved at every phase of the process. In other words, much of data mining is just the application of the best practices of statistical analysis applied to very large data sets.

We then examine a sample application of data mining. We develop models for classifying which customers are most likely to respond to direct mail advertising. K-means clustering, classification and regression trees, C4.5 decision trees, and neural networks are applied. We then ask the question, “Why does our ‘best’ model have the highest overall error rate?”

Background:
Dr. Larose is Professor of Statistics and Director of Data Mining @CCSU, Department of Mathematical Sciences, Central Connecticut State University. He is the author of Discovering Knowledge from Data: An Introduction to Data Mining (Wiley, 2005), Data Mining Methods and Models (Wiley, 2006), and the co-author (with Dr. Zdravko Markov) of Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage (Wiley, 2007, to appear). He is the author of Discovering Statistics, an undergraduate statistics textbook to be published by W.H. Freeman in 2009. His consulting work includes a $750,000 Phase II grant from the Air Force Office of Research, Storage Efficient Data Mining of High Speed Data Streams. He designed, developed, and directs the world’s first online Master of Science and Graduate Certificate programs in data mining. He is the Series Editor for the new Wiley Series on Methods and Applications in Data Mining.

Towards Optimal TDMA Frame Size in Wireless Sensor Networks

Tim Ren
Dept. of Computer Science and Statistics
University of Rhode Island

Date and Time: Friday, April 6, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

This talk presents a set of TDMA MAC protocols for wireless sensor networks that can achieve near-optimal throughput and good latency for regular periodic data delivery. The protocols are based on a novel graph coloring technique called the Color Constraint Heuristic (CCH). The CCH creates a near optimal reduction in the number of colors, which in turn produces near-optimal periodic data throughput, as measured by the reduction of the number of TDMA slots. It also describes a centralized TDMA slot assignment algorithm, Centralized Slot Assignment (CSA-CCH), which uses CCH and assumes knowledge of the entire network topology. Then, it presents a distributed version of the algorithm, Distributed Slot Assignment (DSA-CCH), which does not assume any prior knowledge of the network. A further refinement of DSA that is designed for query tree aggregation applications (DSA-AGGR) is also presented. It shows through simulations that our algorithm performs closer to the optimal bound on data throughput than several prominent TDMA slot assignment protocols for wireless sensor networks. In addition, the CCH-based algorithms carefully order the coloring to provide good latency for data delivery.

From Language Interaction to Domain Action: A Cognitive Systems Approach to Dialogue

Dr. Brad Miller
Raytheon

Date and Time: Friday, April 13, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

After a brief overview mapping cognitive taxonomy to traditional software architecture, I will address a particular cognitive application for HCI: an agent-based approach to dialogue processing. I will wrap up with a quick scan of why cognitive systems (should) matter to defense agencies (aka applications), and some barriers to widespread adoption (aka open research problems).

No Seminar on April 20, 2007

Knowledge Representation for Interoperable, Intelligent Systems

Dr. Chung Hee Hwang
Raytheon

Date and Time: Friday, April 27, 2007 @ 2:00 PM
Place: 126 Tyler Hall
Abstract:

How to represent knowledge is one of the fundamental problems in building cooperative, intelligent systems. Being able to make use of, and reason with, knowledge automatically and effortlessly is the key to intelligent behavior, and knowledge representation should support such processes. In the first half of this talk, we will briefly consider what knowledge is, what we mean by representation, certain samples of tough nuts in knowledge representation, and how to facilitate creating cooperative environment where multiple systems of diverse nature work together toward common goals. In the second half, I will focus on knowledge representation issues from a DoD perspective, addressing interoperability among systems of systems.

Chung Hee Hwang is a Senior Principal Software Systems Engineer at Raytheon, where she is currently working on R&D to apply A.I. techniques to Raytheon business areas while acting as the OSA Lab manager. Before joining Raytheon, she was a Distinguished Member of Technical Staff in the Intelligent Systems Laboratory, Motorola Labs, designing knowledge representation and a reasoning system for the in-vehicle driver task management system Driver’s Advocate™. Prior to that, she had been involved in developing various intelligent systems, including a knowledge management system KnowMan (Motorola), the agent-based information retrieval and fusion system InfoSleuth (MCC), a spoken natural language interface for an interactive planning system TRAINS (University of Rochester) and the reasoning system EPILOG (University of Alberta and Boeing).