Prototypes

Prototypes - Prototypes are representations of objects and their relationships to other objects. In computer science, these objects are called graphs.

A prototype can be any type of object

  • C#
  • Data
  • Language
  • Concepts

C#

                            
    int i = 0;
                                    
                                 

Prototype:

                            
    CSharp.VariableDeclaration
        .Type = CSharp.Type
            .TypeName = System.String[int]
            .ElementTypes = System.Collections.Generic.List`1[CSharp.Type]
            .IsNullable = System.Boolean[False]
            .IsArray = System.Boolean[False]
        .VariableName = System.String[i]
        .IsConst = System.Boolean[False]
        .Initializer = CSharp.Expression
            .Terms = System.Collections.Generic.List`1[CSharp.Expression]
               [0] = CSharp.IntegerLiteral
                  CSharp.Literal.Field.Value = System.String[0]
                                    
                                 

SQL Query

select top 10 * from Prototypes order by 1 desc

                            
    SQL.Select (5732)
        .Columns = System.Collections.Generic.List`1[SQL.Expression] (13740)
            [0] = SQL.Expression (5686)
             .Terms = System.Collections.Generic.List`1[SQL.Expression] (13740)
                [0] = SQL.Identifier (5691)
                    .Value = System.String[*] (5606)
        .Table = SQL.Table (5698)
            .TableName = System.String[Prototypes] (2917)
        .Limit = System.String[10] (3355)
        .Joins = System.Collections.Generic.List`1[SQL.JoinClause] (13744)
        .OrderBys = System.Collections.Generic.List`1[SQL.OrderByClause] (13748)
            [0] = SQL.OrderByClause (5729)
                .Expression = SQL.Expression (5686)
                    .Terms = System.Collections.Generic.List`1[SQL.Expression] (13740)
                        [0] = SQL.NumberLiteral (5712)
                            SQL.Literal.Field.Value = System.String[1] (746)
                .SortDirection = System.Int32[1] (1937)
                                    
                                 

Language:

I need to buy some covid-19 test kits

Prototype

                            
    Need#2852
        InfinitiveLinkable.Field.LinkedInfinitive = ToBuy
            NakedInfinitive.Field.Infinitive = ToBuy
            Action.Field.Object = COVID_TestKit
                QuantitySpecifiable.Field.Quantity = Some
                Object.Field.Plurality = Plural
            Action.Field.Subject = I
        NakedInfinitive.Field.Infinitive = ToNeed
        Action.Field.Subject = I

                                    
                                 

Concepts

If a person has cancer they may die

                            
    IfPersonHasCancer
        StateStateCausalitySememe.Field.TargetState = StateSememe
            .Qualifier = Dead
            DoableActionSememe.Field.SourceActor = Person
        StateStateCausalitySememe.Field.SourceState = StateSememe
            .Qualifier = Cancerous
            DoableActionSememe.Field.SourceActor = Person

                                    
                                 

If an animal is hungry then it eats

                            
    IfAnimalHungryThenEats
        StateActionCausalitySememe.Field.TargetAction = EatSememe
            DoableActionSememe.Field.SourceActor = Animal
        StateActionCausalitySememe.Field.SourceState = StateSememe
            .Qualifier = Hungry
            DoableActionSememe.Field.SourceActor = Animal

                                    
                                 

ProtoScript

ProtoScript is a C# based programming language designed to make working with graphs simple.

  1. Simplifies the process of creating graphs.
  2. Allows graphs to contain functions.

Some highlights:

  1. Multiple Inheritance - Prototypes require multiple inheritance to reflect the language.
  2. Type Mutability - Understanding language means inferring the type of things while we operate upon them. Sometimes we don't know the type of a prototype when we create it, and need to add that later.
  3. Native Graph Operations - Working with large graphs becomes cumbersome. Certain graph operations are built into the language to simplify 1) learning graphs and 2) manipulating graphs.
  4. Serialization First - Prototypes are a representation of knowledge. They need to be serialized for long term storage or communication between systems.

Let's look at some examples to start to understand how it all works.

Example 1: Simple Prototype

We create a single prototype with no relationships, no properties, and no functionality.

                            
    prototype Buffaly;
                                    
                                 

Example 2: Adding Is-A Relationships

Now let's add simple relationships.

                            
    Buffalo is a City 
    Buffalo is an Animal 
    Buffalo is an Action that means "to bully"

                                    
                                 

If we want to store these as triplets then we start to break them out

                            
    Buffalo -is-> City 
    Buffalo -is> Animal 
    Buffalo -Is> Action 


                                    
                                 

In ProtoScript

                            
    prototype City;
    prototype Animal;
    prototype Action;
    
    
    prototype Buffalo_City : City; 
    prototype Buffalo_Animal : Animal;
    prototype Buffalo_Action : Action;

                                    
                                 

Introducing the "typeof" operator

  • typeof - Boolean operator that checks if a parent prototype exists.

For example:

                            
    if (Buffalo_City typeof 	City)
    {
        //true code
    }

    if (Buffalo_City typeof 	Animal)
    {
        //won't be reached
    }

                                    
                                 

Example 3: Add Properties

Up to this point we've been building a simple Ontology -- a knowledge graph about our world. Let's extend it to give a bit more information.

Not all properties of a prototype are best represented by an "is-a" (typeof) relationship. For example:

                            
    prototype State;
    prototype City
    {
        State State = new State(); 
    }

    prototype NewYork_State : State; 
    prototype NewYork_City : City 

    NewYork_City.State = New_York_State; 


                                    
                                 

If we evaluate:

                            
    NewYork_City;

                                    
                                 

Result:

                            
    NewYork_City
        City.Field.Location = NewYork_State


                                    
                                 

Example 4: Adding Cycles to Graphs

Sometimes graphs have connections back to themselves.

New York (City) is in New York (State)

New York (State) has a city New York (City)

                            
    prototype State
    {
        Collection Cities = new Collection();
    } 

    prototype City
    {
        State State = new State(); 
    }

    prototype NewYork_State : State; 
    prototype NewYork_City : City 

    NewYork_City.State = New_York_State; 
    NewYork_State.Cities = [NewYork_City];
    NewYork_City.Location = NewYork_State;

                                    
                                 

Evaluating

                            
    NewYork_City

                                    
                                 

Result:

                            
    NewYork_City
        City.Field.Location = NewYork_State
            State.Field.Cities = Ontology.Collection
                [0] = NewYork_City

                                    
                                 

Example 5: Runtime modifications - Modifying the Prototype's Parents

When processing language we don't always know the type of an object. ProtoScript gives us the ability to work with unknowns.

                            
    prototype Buffalo; 		// We don't know which type of Buffalo yet.

                                    
                                 

Evaluate:

                            
        Buffalo typeof Animal

                                    
                                 

Result

                            
        false

                                    
                                 

Now let's assume there is some constraint we run into that let's us infer that Buffalo is the Animal:

                            
        TypeOfs.Insert(Buffalo, BuffalyAnimal)

                                    
                                 

Evaluate:

                            
        Buffalo typeof Animal 

                                    
                                 

Result

                            
        true; 

                                    
                                 

Example 6: Runtime modifications - Modifying the Prototype's Properties

We can add properties to a Prototype without having to declare them specifically as part of the prototype's definition.

                            
        prototype Buffalo;
        prototype Color;
        prototype Red : Color; 

        Buffalo.Properties[Color] = Red;


                                    
                                 

Evaluate:

                            
        Buffalo


                                    
                                 

Result:

                            
        Buffalo
            Color = Red



                                    
                                 

Example 7: SubTypes

We've already examined Super Types (Animal is a super type of Buffalo), but we can also define Sub Types. SubTypes are defined by

  1. The parent type upon which they operate.
  2. The categorization function that defines the members of this group.

Example:

                            
        prototype BuffalyCity : City; 
        prototype NewYorkCity: City; 
        prototype OrlandoCity: City; 

        [SubType]
        prototype CityInNewYork : City
        {
            function IsCategorized(City city) : bool 
            {
                return city -> { Location == NewYorkState };
            }
        }

                                    
                                 

Buffaly (City) and New York (City) are types of cities in New York (State). Orlando is not:

                            
        BuffaloCity typeof CityInNewYork		//true
        NewYorkCity typeof CityInNewYork		//true
        OrlandoCity typeof CityInNewYork		//false

                                    
                                 

This piece of code introduces three new concepts

  1. Annotations
    1. [SubType] - This is an annotation. It works similar to .NET annotations except that it works with the runtime and not the compiler.
  2. Member functions
    1. function - Prototypes in ProtoScript can have member functions that extend their capabilities beyond simple data.
  3. Categorization operator
    1. This operator 'city -> { this.Location == NewYorkState }' checks if the "city" has a Location of NewYorkState and returns true if so.

Language Model

The Buffaly Language Model consists of the

  • Lexical Model
  • Semantic Model
  • Part of Speech Model
  • Tagger
Round image

Lexemes

"a basic lexical unit of a language, consisting of one word or several words, considered as an abstract unit, and applied to a family of words related by form or meaning."

Assume for a minute that any string value that we want to work with is called a lexeme.

  • "Buffalo" is a lexeme
  • "City" is a lexeme
  • "Bully" is a lexeme

Sometimes a lexeme is a single word. Sometimes it can span multiple words:

  • New York
  • United States of America
  • Pick Up (the trash)
  • Turn Off (the light)

Sometimes a lexeme can contain a interrupting tokens:

  • Please [pick] it [up]
  • Please [pick] the freaking trash [up]

Sememes

"a semantic language unit of meaning"

For the word "buffalo" we have three "meanings":

  • Buffalo (the City)
  • Buffalo (the Animal)
  • Buffalo (the Action)

Each of these meanings is a semantic unit or sememe. We represent these in ProtoScript as discrete objects

Lexeme to Sememe Mapping

Let's setup the lexeme to sememe mappings for "buffalo" using ProtoScript.

Buffalo (lexeme)
->
Buffalo (the City)
Buffalo (the Animal)
Buffalo (the Action)

We use annotations to map the string value to the various semantic interpretations:

                            
        [Lexeme.SingularPlural("buffalo", "buffaloes")]
        prototype BuffaloAnimal : Animal;
        
        [Lexeme.Singular("buffalo")]
        prototype BuffaloCity : City;
        
        [Lexeme.Singular("buffalo")]
        prototype BuffaloAction : Action;


                                    
                                 

Grammar

Next we let the system know what is possible. In English, we have certain patterns that are available to us.

  • Unary Actions

    I eat
    She painted
  • Binary Actions

    I eat food
    She painted the house
  • Ternary Actions

    She painted the house red

We also have some rules on how to put different objects together. It is ok to to say

                            
        brown buffalo
                                    
                                 

It's not OK to say

                            
        buffalo brown 
                                    
                                 

Though, in other languages (like Spanish), the adjective comes after the noun.

For our specific example we allow

                            
        City Animal  
                                    
                                 

As in

                            
        Chicago Cubs 
        Phoenix Coyotes

                                    
                                 

Or

                            
        The Orlando buffalo called the zoo their home. 


                                    
                                 

We don't allow the opposite. The follow sentence sounds strange:

                            
        The buffalo Orlando called the zoo their home.  


                                    
                                 

For our example we only need a few patterns.

  • City Animal
  • Animal Action
  • Animal Action Animal

Let's setup those sequences in ProtoScript.

Chomsky Grammars and the Chomsky Hierarchy

The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a language's vocabulary (or alphabet) that are valid according to the language's syntax. Linguist Noam Chomsky theorized that four different classes of formal grammars existed that could generate increasingly complex languages.

This theoretical framework tells us the limits of our approach. On it's own, the sequences

  • City Animal
  • Animal Action
  • Animal Action Animal

are not capable of generating (or parsing) any language -- especially English. But sequences are not constrained by "pattern matching". We can run code, access context, or memory with each sequence match. This gives us the capacity to parse any recursively enumerable language according to Chomsky's theory.

In practice, this approach can parse a wide variety of English language sentences.

Deterministic Tagger

The Deterministic Tagger is a set of components used to turn unstructured text into a graph structures. It uses a reinforcement learning programming approach to build and test valid graphs.

It combines Exploration with Exploitation to find the best options for putting together the pieces of a sentence given known rules.

  • It's deterministic. There is no explicit randomness. It will follow the same highest probability path each time it sees the same inputs.
  • It learns from experience. Feedback from correct interpretations gets fed back into the system to allow for quicker interpretation next time.
  • It is inspectable. We can see why it made a choice at each point in time and adjust the functions that led to that choice

Given a sequence like

                            
            Buffalo buffalo buffalo

                                    
                                 

Tokenization

Tokenization turns the original string into something more granular. The model is agnostic to the type of tokenizer. We default to word tokenization.

                            
            ["Buffalo", "buffalo", "buffalo"]

                                    
                                 

But, the model is able to access sub-word or character level token representations as needed. It can also combine multiple tokens together - even when separated by intermediate tokens.

Hypothesize a sememe

The tagger uses a heuristic and previous training data to pick a starting token (lexeme) and decide which semantic interpretation to use.

                            
            [BuffalyAnimal, "buffalo", "buffalo"]

                                    
                                 

Finds the expectations for that sememe

    City Animal
  • Animal Action
  • Animal Action Animal

The first sequence does not fit. The second and third could possibly fit. So it tests

                            
            Animal 		        Action 
            BuffalyAnimal 		"buffalo"

                                    
                                 

There is indeed a match for this, so it "collapses" the lexeme to a correct interpretation:

                            
            BuffalyAnimal BuffalyAction "buffalo"


                                    
                                 

At this point, there is only one valid sequence

                            
            Animal Action Animal 


                                    
                                 

So we compare again:

                            
            Animal 	        Action 		    Animal 
            BuffaloAnimal	BuffaloAction	"buffalo"


                                    
                                 

There is a way to collapse the lexeme to the correct interpretation:

                            
            Animal 	        Action 		    Animal 
            BuffaloAnimal	BuffaloAction	BuffaloAnimal