BubbleBoy

Wednesday, March 30, 2011

Action classification defined to prototype

Movements

Entities: Left hand

Attributes: position (x, y), direction (left, right), and speed (none, slow, fast).

Events Decided

Left hand move fast right, Left hand move slow right, Left hand move fast left, Left hand move slow left

Behaviors Defined

Hit the object at the left, Hit the object at the right, Touch the object at the left, Touch the object at the right

Thursday, March 10, 2011

Logical view

Implementation of this project has three main components which should work independently. This independency facilitates the implementer to work without any conflict with other components. The entire process related to the project could be demonstrates by the Figure 1 and those components could be classified as follows.
• 2D to 3D modeling using maker based protocol
• Artificial Intelligence
• Environment
In 2D to 3D component it considers the image captured from the user in a sequential manner. This capturing tries to identify the markers on the user and develop a model from those details. Initially two web cameras are capturing images from the user and system which is going to implement would consider the time base and merge image to create a 3D model of the user. This process will continue for sequence of images and generate a sequence of 3D models. And these models will be forwarded to the next stage.

Artificial Intelligence system will be developed as a separate component. This component will get 3D model sequence as the input and deliver a prediction as the output which would be compatible with the inputs of the environment. This unit will examine 3D model sequence in precise manner and try to predict the user movement. Meanwhile this unit has the ability of growing with the experience through the inherent learning ability. This will sharpen the probability of correct prediction.
The virtual environment will act as the third and one of the important parts of this project. This unit is responsible to make the user feel the virtual realistic. This will be consist component such as environment talk back, sequential activities and distinct activities. This is the interface which user interacts with. So this should be well defined system in order to preserve the quality of entire project. Inputs will be provided from the AI and output is the experience get by the user. According to the user input environment must be able to change some artifacts of the virtual world in a real world manner.
Detailed analysis of the logical view is described using Figure 2. It basically demonstrates the layers of the project Bubble Boy.

Logic Tier

- This layer concludes the software application part of the BubbleBoy.
• Image Acquisition - Preparing of the images taken from user and feeding them to image processing will be done at this sub layer.
• Image Processing - This layer consists of three parts 2D image processor, 3D Image generator, Head Direction capture module. Initially the image is in the position came from the image acquisition sub layer and it will contain the marker with it.
• Gesture Recognition - This represent the recognition of the pose of the user in one time instant and the details will be compared with the knowledge base data. Then the results will be proceeds to the action detection layer.
• Action Detection - In this layer it will map the sequence of the images processed by the above layer with the details on human actions in Knowledge base and will detect the action of the user.
• Knowledge Base Access Handling - This layer is responsible for handling the communication of Knowledge Base with Logic Tear components. It will control the access requests from application and manage the data flow of the in and out of Knowledge Base.
• Environment Generation - This layer will take inputs from the action detection layer; generate appropriate environment change according to detected action of the user.
• Environment Modification - Environment Modification layer is the layer that will make changes immediately after the changes in environment are identified by the Environment generation layer.

Data Tier

• BubbleBoy Database - Database will be required for the Logic Tier components to store data apart from Knowledge Base. The Knowledge Base will be keeping specialized data for gesture recognition and action detection. Therefore the database for the other components of the Logic Tier will be necessary.
• Knowledge Base - Knowledge Base will consist of two types of data basically. Gesture model details and Action detection details can be identified in this stage as details to be included in Knowledge Base.

Presentation Tier

• Graphical Environment - This will be showing the virtual environment to the user according to the action he/she presented.

1. Human Interaction with Natural Environment

People do sense from 5 different organs. Those can be identified as human eye, ears, nose, mouth and body. These organs can sense pictures, sounds, smells, taste and feel respectively. Our main target is to capture the interaction of three main organs. Those are eyes, ears and body of the users. The exact point which is responsible for the interaction between an individual and the environment can be identified as the interface [9, 13]. In this project we can recognize the eye, ear and the body of the user comes as the main interfaces. The communication takes place through these interfaces.
Human eye has the ability of identifying three wavelength sections as short, medium and long. And it got a view in a conical manner. That is because of the structure of the human eye. Mainly color sensitive cones are situated in high density on the center of the retina. And the human eye concentrates mainly on the middle of the conical view [4]. And it gets continuous images stream over the time until person opens it and brain do processing. Since human has two identical eyes, he is able to see the distance to an object. And this condition makes human capable of recognizing three dimensions (3D). So he is able to clarify the height, length and the depth of an object he sees.
Human body also acts as an interface with the natural environment. Mainly it gets inputs such as touching, heat from the environment and interacts according to them. Physical movements of a person mainly occur as a result of a previous input on the person. The human body reacts according to the vision he gets or sounds he hears. Since the body has majority of the volume it involves with several actions of human.
As we identified with this project, the next main interface for human and environment interaction is based on sounds. Human ear can detect sounds in range between 20-20,000 Hz. And human ear takes sounds as an input and process them inside the brain. Although human can hear them he can only recognize few of them and others will filter as noise. Sounds can be identified as a fact which makes a huge impact on the human behaviors and reactions.
Human brain is capable of recognize any item which he saw early and that is named as experience. So the brain has gathered the information of these items, sounds, tastes, smell and feelings. Old experience makes easy to recognize things and new items keep the brain in a conflict for a while.
After the input passes through these interfaces it directly accesses the human brain. Then brain decides how to act according to that input. According to the output signal generated from the brain the organs will react and the interaction takes place with that particular objects, sense or feeling.

2. How to Build the Virtual Environment?

In this project we need to capture the concentration of the user towards our graphical virtual environment. When satisfying these needs we need to look at the facts described in section 1. Project consists of a sound environment which gives the experience of the real environment. This project will voyage the user to believe virtual realistic and demonstrate high interactive level with virtual world. One of the main influences given by the natural environment is talking back to the person.
To make user feel immersive, environment around user need to talk back to him as real environment do. As well as environment should consist of several continuous and distinct activities. Consider a user is walking through a jungle and stops where a leaf of a banana tree ahead. Now his vision is limited by that leaf. Now user is trying to lift the leaf and look forward. Then with a particular he must able to move it up ward and have the view ahead precisely. After he releases the leaf, it should come opposite direction and have some king of fluctuation until it become stable. That’s the level of talk back is expected from the environment.
Human eye gets a constant flow of video stream. But we are providing 60Hz output from the monitor. This will not be recognized by the human eye and it considers the output as a continuous flow. But natural view of the video stream is highly depends on the quality of the stream. If the video stream consists of the dull colors and faded images, then person will be able to detect that this is a fake vision. The resolution of the graphical model should be higher in this case. It would facilitate the stream in detailing.
This needs high graphic processing unit (GPU) power and CPU power. We need to refresh the graphical frame per every 1/30 seconds. And simultaneously it is required to draw the new frame. If it has high resolution frames then it is required to consume much processing power. But this constrain can be solved by using sound GPU in the PC.

Human eye has a conical view and mainly concentrate on the middle of that conical view. And human eye is the main interface of person when interacting with graphical interface which we have provided. If the distance to the graphical interface is high, user will feel that this is an artificially generated view on the monitor. It would be great if it is very much closer to the eye. But because of some constrains, such as setting cameras, we has to put the monitor at a medium length [fig. 1]. And this monitor should place on the middle of the conical view since human eye mainly focus on middle of the conical view.
Since the human eye is capable of identifying 3D objects we have to design the virtual environment in a 3D manner. Otherwise brain will figure it out as a 2D generated environment. So it is essential to draw 3D graphical models in the virtual environment and provide essential data about the depth.
We are going to embed suitable sounds for objects. Otherwise user won`t get the experience through environment. Then it is required to give essential sounds which are related to particular object. The collaboration of sounds and visuals make the user much interactive with the virtual environment.
This virtual environment is embedded with another essential property of the natural environment. That is the reaction time of the natural environment due to persons` action. Most of the time, it would be instance. So the graphical environment should implement with that efficiency.
The next main consideration should be paid on the continuity of the frame generation. When user gives inputs about gestures, the AI will identify them and send data via 2D object array [section 3]. According to this input, it is required to draw new frames. And this process should ensure the continuity of scene. Otherwise it would be out of this world and user will confuse and disturb.
Energy emitting or reflecting from the object allow a person to see that object [1]. In the virtual environment we need to decide the level of the energy emission from the object in order to give real world feeling through the virtual environment. The main effect related to this requirement is the brightness levels of the objects. This brightness level will help to increase the natural component of the virtual objects.
When considering activities carrying out by the user, we need to consider the speed of happening action [5]. User should able to get the control of the system according to users` Speed, Angle, Rotation and the plane of the hand movement. For example, consider the incident where the user is sitting alone by river side. Then environment should change but area remains the same. Now user picks a small piece of stone and throws it to the river. If user moves his hand in a vertical manner then it should sink at once. But if user moves hand in a horizontal plane with speed up manner then that piece of stone should jump about two to four times and sink.

We normally see video games with graphical interface which separate user n game. Our target is to give user real world experience. In video games which are on the commercial market, we can see embedded user to the game as in fig. 2.b. This scenario is ok with such games since user does use the keyboard or the mouse of that particular PC. But in the real world, when user does actions, he is able to see body parts such as hands and legs.
In our project user do act on the real world. So embedded user scenario may not work with this. So we have to move to the independent user which is demonstrated in the fig 2.a. This will make sure to the user that he is in the real work and join with the virtual world with real, natural manner.

3. Input Array

The artificial neural network will send an array of data to the virtual environment. AI will predict the actions from the users` gestures of the person who act in front of the camera. This array will contain several different information about the users` actions and this will help the virtual environment to change as users` actions [1].
This array will be a 2D array which will contains data objects about the user movements. This will contain data such as actions and coordinates. Actions will contain data about the user movements of chest and the head. This data will be provided the information about the direction he watches or direction he moves. Then it contains the data about the coordinates of the users` hands and legs. This can be used to determine object movement in the environment by facilitating interaction with items on that coordinates.

4. Body Model

In this project we are going to design a 3D model which is similar to a human pose. But this seems contradictory with facts provided previously. That is the argument about the independent user which is in fig 2.a. So we cannot display the human model in the virtual environment. And we keep it as invisible to the user but use it for our purpose [3, 10].
After that we use it to interact with objects in the give coordinates. We do operate the invisible user model by using the data in the input array. So it is trivial to get data from the input array and assign them in to the user model. Then we can get details through model to identify which objects going to interact with user. And that makes the interaction with the virtual environment more accurate.

5. Languages we are going to use.

Designing a Graphical Environment was done related to the project, while going through the research. This projects` main target is to interact with the movement of the user in real time. As the project requirement, this interaction should reflect via a high graphical interface [7]. So it was tested at several choices and did some testing to ensure the ability of available resources [14, 15]. We consider following choices in developing the user interactive interface.
• XNA
• Crytec
• JMonkey

XNA

XNA[16] is a framework developed based on native implementation of .NET Framework 2.0 on Windows. The framework runs on a version of the Common Language Runtime that is optimized for gaming to provide a managed execution environment. The runtime is available for Windows XP, Windows Vista, Windows 7, and Xbox 360. Since XNA games are written for the runtime, they can run on any platform that supports the XNA Framework with minimal or no modification. Games that run on the framework can technically be written in any .NET-compliant language, but only C# in XNA Game Studio Express IDE and all versions of Visual Studio 2008 are officially supported.

JMonkey

JMonkey Engine (jME)[17] is a high-performance 3D game framework, written entirely in Java. OpenGL is supported via LWJGL, with JOGL support in development. For sound, OpenAL is supported. Input via the keyboard, mouse, and other controllers is also supported. The most important thing in this is, it run on Windows, Mac OS, and Linux; true cross-platform thanks to the JavaVM.

Crytect

CryEngine[18] is a high performance 3D game engine developed by crytec and they recently release CryEngine 3. The new engine is being developed for use on Microsoft Windows, PlayStation 3 and Xbox 360. As for the PC platform, the engine is said to support development in DirectX 9, 10, and 11. It supports and contains rich graphical usages such as water reflection, Flow graph, Real time soft particle system & integrated FX editor, Road & river tools, Vehicle creator, Real time dynamic global illumination, Deferred lighting, Natural lighting & dynamic soft shadows Above technologies were compared with each other and came up with JME which is platform independent. When consider XNA it supports only to windows and Xbox. And when comparing the maturity levels of the XNA and JME, JME shows much efficient and effective performance. Cry engine is one of greatest 3D game engine in terms of detailing. But it needs certain level of hardware requirement and it is in the commercial market. JME stood alone in both cases.

Bibliography

[1] D Thalmann, "Using Virtual Reality Techniques in the Animation Process," in Virtual Reality Systems.: Academic Press, 1993, pp. 143-159.
[2] E Shenchang, "QuickTime VR – An Image-Based Approach to Virtual Environment Navigation," in Commum. ACM, 1995.
[3] T Rodden, J Pycock, C Greenhalgh, S Benford, "Collaborative Virtual Environments," Communications of the ACM, vol. 44, no. 7, pp. 79-85, 2001.
[4] A Roorda and D Williams, "The arrangement of the three cone classes in the living human eye," Nature, vol. 397, pp. 520-522, 1999.
[5] M Fraser, C Heath, S Benford, C Greenhalgh J Hindmarsh, "Establishing mutual orientation in virtual environments," In Proceedings of CSCW’96, Boston, pp. 67-76, 1996.
[6] R Waters, D Anderson J Barrus, "Supporting large multiuser virtual environments," IEEE Comput. Graph. App, pp. 50-57, 1997.
[7] Brutzman D, "The Virtual Reality Modeling Language and Java," Communications of the ACM, vol. 41, no. 6, pp. 57-64, 1998.
[8] C Carolina, J Sandin, and A DeFanti, "Surround-Screen Projection-Based Virtual Reality: The Design and Implementation of the CAVE," in SIGGRAPH '93 Proceedings, 1993, pp. 132-142.
[9] K Karen, K Ashman C Zastrow, Understanding Human Behavior and the Social Environment.: Cengage Learning, 2009.
[10] S Benford, J Bowers, E Fahlén, C Greenhalgh, and D Snowdon, "User Embodiment in Collaborative Virtual Environments," in ACM Conf. Human Factors in Computing Systems, 1995, pp. 242-249.
[11] A Shields, F Tavera, L Elford, H Scullin, A Reed, "Virtual Reality and Parallel System Performance Analysis," vol. 28, no. 11, pp. 57-67, 1995.
[12] A Wingrave A Bowman, "Design and Evaluation of Menu Systems for Immersive Virtual environments," in Proceedings of the Virtual Reality 2001 Conference, 2001, pp. 149 - 156.
[13] J Weigert A, Self, interaction, and natural environment:refocusing our eyesight.: SUNY Press, 1997.
[14] “Riemer's 2D & 3D XNA Tutorials” [Online]. Available: http://www.riemers.net/eng/Tutorials/XNA

[15] “Setting Up NetBeans IDE for jME 2.0.1” [Online]. Available: http://jmonkeyengine.org/wiki

[16] “Microsoft XNA” [Online]. Available: http://en.wikipedia.org/wiki/ Microsoft _XNA
[17] “JMonkey Engine” [Online]. Available: http://en.wikipedia.org/wiki/JMonkey_Engine

[18] “Welcome to our new CryENGINE® 2 website” [Online]. Available: http://www.cryengine2.com

Saturday, December 11, 2010

Layered Architecture of BubbleBoy

Layered architecture ( figure) of BubbleBoy is a detailed interpretation of main components of the system. Layers of the systems designed as tree tier architecture. Main tears can be identified as Presentation Tier, Logic Tier and Data Tier. It will clarify the system functionality in abstract and it will clear the main layers and sub layers of them.

Logic Tier

This layer concludes the software application part of the BubbleBoy.

· Image Acquisition

Preparing of the images taken from user and feeding them to image processing will be done at this sub layer.

· Image Processing

This layer consists of three parts 2D image processor, 3D Image generator, Head Direction capture module. Initially the image is in the position came from the image acquisition sub layer and it will contain the marker with it.

· Gesture Recognition

This represent the recognition of the pose of the user in one time instant and the details will be compared with the knowledge base data. Then the results will be proceeds to the action detection layer.

· Action Detection

In this layer it will map the sequence of the images processed by the above layer with the details on human actions in Knowledge base and will detect the action of the user.

· Knowledge Base Access Handling

This layer is responsible for handling the communication of Knowledge Base with Logic Tear components. It will control the access requests from application and manage the data flow of the in and out of Knowledge Base.
Environment Generation

This layer will take inputs from the action detection layer; generate appropriate environment change according to detected action of the user.

· Environment Modification

Environment Modification layer is the layer that will make changes immediately after the changes in environment are identified by the Environment generation layer.

Data Tier

· BubbleBoy Database

Database will be required for the Logic Tier components to store data apart from Knowledge Base. The Knowledge Base will be keeping specialized data for gesture recognition and action detection. Therefore the database for the other components of the Logic Tier will be necessary.

· Knowledge Base

Knowledge Base will consist of two types of data basically. Gesture model details and Action detection details can be identified in this stage as details to be included in Knowledge Base.

Presentation Tier

· Graphical Environment

This will be showing the virtual environment to the user according to the action he/she presented.

Monday, December 6, 2010

Four Major Phases of BubbleBoy

The system BubbleBoy can be divided in to four major phases in its development process. They are,

I. Human Computer Interaction phase

II. Image Analysis Phase

III. Action Recognition and

IV. Environment

Phase I

This step is the initiation step of the complete operation. It will capture images and feed them to the phase II. It will capture the image with passive markers on user. Two cameras will be used to the capturing operation.

Phase II

Initially the image is in the position came from the first phase of the system and it will contain the marker with it. It will progress in to next step. Then it will generate limb recognition with the help of the markers. The most critical part of this phase is in the next step. The 3D model will be generated using limbs recognized. Angles and the sizes will be main information for the conversion.

Phase III

Since this need to be implemented for a series of images which are fed by video camera, it need to be evaluated all incoming images efficiently. So the series of generated 3D models will be supplied to the next phase. This phase’s main task is to match the action according to the trained set of actions.

Phase IV

This will be the user feedback phase as the user can watch the results of the action he/she presented.

Thursday, December 2, 2010

Project meeting minute - 09

Date & time, duration

The first meeting was held on 2010- 12 -02 for one hour

Venue

Office room of Dr. Chandana Gamage

Members present

All the members were present there.

Topic of the discussion

Feedback on the second iteration of the Software Architecture Document (SAD)

Brief Discussion of the discussion

Today also we basically discussed about the content of Software Architecture Document. This time we presented our second iteration of the document and sir explained us the deficiencies we have made there. Those are

• Maintain the consistency of the document control

• Come up with diagrams that looks fine

• Use correct terms such as gesture recognition, image acquisition and action detection.

• Design document consisting of two components such Final design of our proposed system § as Assurance of our design with the work carried by other people(taken § from trusted sources)

• In the image acquisition phase sometimes we have to exclude some of the gestures depending on the context where we will be at that particular moment (attention on gesture inclusion and gesture exclusion)

• Explain the design for the 3-D model creator and its state of art(While we have design for overall system we have to have designs for all the sub components come under the overall system)

• And for every design it should carry the rationale behind it, historical work, current trends and future directions.

• And also we were asked to look at the web resource “http://www.cs.sfu.ca/~mori/research/”

Key topics discussed

1. Compare the things we did with other people in the history who has contributed to the things we are referring to.

2. Maintain the standard of the referencing process.

3. Maintain a log book for the things we carry out with related our project

Comments and agreements

Chandana sir made us understand the important points that we have to improve and asked those corrections to be made in the next iteration of the document and send him that version of the document tomorrow. And since the references are not in standard way once they are created from different ways sir suggested us to go to the latex which uses an approach called bib Text to have references in standard way.

Thursday, November 25, 2010

1 Design Doc Introduction

In the present emergent market, it is very trivial to locate and experience new advancement of the Human Computer Interaction but with some constrains. This is a field which still not unveiled to the society but could talk to a majority of population who do not have enough computer literacy at the moment. This emergent innovative field is still at research level at a certain degree. At a time where there are rapid advances in the field of Human Computer Interaction we have decided to give it another interpretation through the fulfillment of our project

The project “BubbleBoy” is an innovative, interactive 3D interface using full body motion. Obviously the interaction between the human and computers is done at the user interface which includes both software and hardware. Sometimes one may think is this something to be invented as there are some applications already of same nature and in that case is there a room for a research component.

HCI was one of the first examples of cognitive engineering, an approach which crosses boundary between academic disciplines and modern needs and professions have emerged; to effective human interaction. At the same time with the emergence of the area Software Engineering more attention was focused on non functional requirements such as usability and maintainability.

During the project BubbleBoy, it is required to follow certain software development model in order to achieve better quality output. Since the requirements and technologies which are changing rapidly, it is preferred to use a dynamic model which well comes any deviation without a conflict. In this project, it can identify a project life cycle with five different phases. Those are identification, plan, implementation, testing and completion. As it is planned, new technologies are welcome during any phase. So this eventually becomes highly flexible and up to date implementation. Even new ideas could arise while doing implementation. To create more competitive product, it should be up to date. This project basically can be defined as identification of an opportunity in the market. To meet up with these requirements, it is preferred to use an evolutionary/iterative model as software development model and at the same time team would make sure to adhere with coding paradigms and standards.

1.1 BubbleBoy the Innovative Interface

BubbleBoy is a project which facilitates the presence of hundreds of public who do not have enough computer literacy. This interface consists of several components which makes the life easy for the user.

• Environment which talk back to the user

• Reduced Processing cost due to Reduction in image processing.

• Reduced sophisticated hardware parts and do all those tasks in software phase

• Growing Artificial intelligence facilitates higher flexibility

Project consists of a sound environment which gives the experience of the real environment. This project will voyage the user to believe virtual realistic and demonstrate high interactive level with virtual world. To make user feel immersive, environment around user need to talk back to him as real environment do. As well as environment should consist of several continuous and distinct activities. Consider a user is walking through a jungle and stops where a leaf of a banana tree ahead. Now his vision is limited by that leaf. Now user is trying to lift the leaf and look forward. Then with a particular he must able to move it up ward and have the view ahead precisely. After he releases the leaf, it should come opposite direction and have some king of fluctuation until it become stable. That’s the level of talk back is expected from the environment.

This project uses makers to identify the movements rather than using pure image processing techniques. That was done with a great visionary. Image processing and identify user movements also possible. But it consumes more resources and utilizes much CPUs` processing power. This would be a higher risk since this is modeled by targeting the public market where there is lack of computers with enormous processing power. Not only that, this could possess the team to deviate from the core activities of the project.

One of the most important fact relate with this project is the independency of sophisticated hardware. As we can see there are few products on the current market, but with expensive hardware. This project will work with the aid of cheep web cameras which are affordable to any user.

This project will consist of Artificial Intelligence which would be able to grow with the experience. This component is going to use in user action recognition area and this will facilitate a system which is getting matured with time elapse.

1.2 BubbleBoy Implementation and Integration

• 2D to 3D modeling using maker based protocol

• Artificial Intelligence

• Environment

In 2d to 3d component it considers the image captured from the user in a sequential manner. This capturing tries to identify the markers on the user and develop a model from those details. Initially two web cameras are capturing images from the user and system which is going to implement would consider the time base and merge image to create a 3D model of the user. This process will continue for sequence of images and generate a sequence of 3D models. And these models will be forwarded to the next stage.

The virtual environment will act as the third and one of the important parts of this project. This unit is responsible to make the user feel the virtual realistic. This will be consist component such as environment talk back, sequential activities and distinct activities. This is the interface which user interacts with. So this should be well defined system in order to preserve the quality of entire project. Inputs will be provided from the AI and output is the experience get by the user. According to the user input environment must be able to change some artifacts of the virtual world in a real world manner.

1.3 Summary

This particular chapter gives a brief introduction of the project Bubble Boy which is innovative HCI. It is major discussed higher overview of the project in different perspectives while looking at appropriate project life which is suitable for entire project. Then under the sub section BubbleBoy the innovative interface, it describes more into the depth by describing several features which make the life easy for the user. The section of BubbleBoy Implementation and Integration, deals with the implementation aspects of different components and integration of those components without making constrains on integration.