Democratizing Machine Learning With C# - LINQ goes ML
(There are almost certainly transcription errors here; these are entirely my fault. - Bill S.)
Problem: Should I play golf or not?
Streaming, real-time weather information
Want to find Func<Weather, Play>
Once I have this, I can create a real-time LINQ query using IObservable<Weather> as an input.
But I don’t know how to write that function.
What I have is a history of the weather and my calendar history of golf-playing.
I can create a Dictionary<Weather, Play> of examples (test cases).
ML is nothing more than a computer doing test-driven development.
A dictionary is a representation of a finite function.
You want a function that works for all arguments.
People have data and want to write a query to extract data from it, typically.
But how do I group the data? How do I order it? How do I filter it?
They could circle the data points on a whiteboard, but may not be able to write the query.
From example data, you’re generating a function.
If you thought the impedance mismatch between (relational) queries and programs was large, then the mismatch between ML and programming is astronomical.
A query has data and a function, and gives you an answer.
ML has data and answers, and gives you a function.
BigML - super-easy way to do ML (commercial - ka-ching!)
Supports Actional Model Download - gives you the function you’re looking for in your choice of languages.
Writing a decision tree algorithm isn’t that hard.
If you write your algorithms using queries, then you can parallelize them.
Mathematicians are extremely sloppy.
Bayes is just monads.
When they talk about vectors, they really mean classes.
Play -> Distribution(Outlook) is the probability monad.
P(Weather|Play) is the way mathematicians represent this.
A monad is a type constructor with some functions.
Distribution<T> = List<Pair<T, double>>
Return :: T -> Distribution<T>
Return t = [(t,1.0)]
Bind :: Dist<S> x (S -> Dist<T>) –> Dist<T>
Bind ds f = [ (t, ps*pt) | (s,ps) <- ds, (t,pt) <- f(s) ]
The trick of functional programming is let the types do the work for you.
I do what the types tell me.
From our data we can compute P(Outlook|Play), P(Temperature|Play), etc.
But we want to know P(Play|Weather).
Bayes’ Theorem gives us a way to invert monadic functions.
It holds for any monad.
There’s a type error in it, though.
a :: A, f:: A -> B
g :: A -> AxB
g = a -> (a, f(a))
Using n for intersection, we can type-correct Bayes:
P(BnA|A) = P(B) * P(BnA| B)/P(A)
Keynote: Open Systems - Actors and Cloud
(His new company is Applied Duality - watch for great things!)
To hack, to help other developers, to solve real problems, to fix the world.
Duality - things have opposite aspects, but you should look at them as one whole. They’re not good or bad.
Nothing’s more different from SQL than Rx.
Question the relational database.
Does your data really look like this? It’s tied to the physical format.
Duality between Modelers (DB designers) & Developers. Plants vs. Zombies.
Modelers always talk about Nouns. Customers, Orders, Line Items.
There’s no reuse in SQL. There’s no difference between a table and a type.
There’s no abstraction.
It’s statically typed. It’s baked in to the layout of your files.
In order to make your database run fast, you have to have intimate knowledge of your data storage (to do joins, etc.).
It’s not really declarative, because you have to know what keys to join on. It doesn’t allow nested results, so you have to have the GROUP BY column in the SELECT. WITH RECURSIVE is another example - you have to explicitly write it out.
Declarative is relative - C is declarative to an assembly programmer.
Optimization - query plan - consultants tweak the SQL to get the plan you wanted - but why can’t you tweak the query plan directly? It’s just function pipeline.
ACID is also a lie - you set your transaction isolation level to get performance. Why so many hints?
The closed world assumption is the presumption that what is not currently known to be true is false.
The fallacies of Distributed Computing - well known
The fallacies of Declarative Computing
1. Exceptions do not exist.
2. Statistics are precise.
3. Memory is infinite.
4. There are no side-effects.
5. Schema doesn’t change.
6. There is one developer
7. Compilation time is free.
8. The language is homogeneous.
Once you go outside of SQL, the optimizer doesn’t know anything about this.
It’s really a B-tree. Just give me that and I’ll be happy! Leaky abstractions are a good thing! Then you know the performance characteristics. We don’t need all these layers of crust. Sometimes you need to go under the hood.
Task<T> is a co-monad in C#. See ContinueWith(Func<Task<T>, TResult>).
Avoids the need for multiple futures (one for failure case, etc.)
Comonads are gumball machines.
.Result - remove a gumball
Once out, never in. (It’s not hygenic.)
.Select (s/b map) - turns a gumball machine into a beer machine WITHOUT taking the gumballs out of the machine!
Comonad - .Franchise() - can create a machine of gumball machines
Can compose Franchise with Select using ContinueWith to create a beer machine franchise.
ContinueWith is the duality of SelectMany (aka flatmap).
Developers like Verbs.
We want to perform operations on things, while hiding how they’re accomplished.
We want to swap out implementations of interfaces. You can’t do that in SQL.
ConcurrentDictionary<K,V> implements the same interfaces as Dictionary<K,V>. If single-core code was written against the original interface, it would still work with multiple cores.
A dictionary is nothing more than a database.
Dependency Injection is a work-around for a deficiency in our programming languages.
Moving to the cloud - collections as a service (data structures that live in the cloud) - access with async/await (dream)
Developers don’t want REST. They want to program against objects. The first thing you do is write a wrapper. With an API, you can change the underlying implementation; they’re more abstract.
Similar to database transactions and recovery - intercept all the mutations so you can replay the log (I think this is ActorFx, AKA Ax)
An Actor is something where I send it updates and out come a stream of changes. In Rx we call that a Subject. (The streams are IObservable.)
A UI is also a Subject. As is a web browser, as is a web server.
Dream: Compose communicating stream processors to build applications. Like Erlang, only language-independent. Rx everywhere.
Like Yahoo Pipes but non-visual (developers like code).
Trying to move Rx to an Apache project.
Need to wrap existing services into these Subjects so we can have a consistent interface.
Reactive Message Queue (Qx)
Decouple producer & consumer in space & time.
Either can be offline.
We don’t care about type systems.
Comonads are used when you have an asynchronous communication that gives you one value; Rx is used when you … that gives you multiple values.
Pat Helland’s article on Condos & Clouds
You Are The Subject.
Get rid of all the noise and just give me essence.
One of the great things about the NoSQL movement is that developers are now empowered.
Closing Keynote: The App Universe After the Big Bang
We’re in the Silver Age. The Golden Age of apps is over.
The crazy Gold Rush is the anomaly.
In 2003, people were having their nephews write their web sites.
In 2013, we’re in the same boat for apps.
Find the intersection of Creativity and Commerce.
Pay attention to business.
See jury.me for a 5-part series on app pricing.
You need Marketing.
It’s like ___ with/for ___. It’s a Bad Idea.
It feels safe.
Don’t make games. It’s a Bad Idea.
You’re not good at it. It’s a hit-driven business.
Solve Boring Problems.
Make it cost me less money, or take up less time.
Narrow domain scope.
Perfection is unachievable.
Engineering’s about compromise. You need to ship.
Sacrifice is necessary.
You need help.
(And a lot of people have helped you.)
113 games ship to the App Store every single day.
What I wished I knew at the beginning of my career.
Software is Biological
It’s not a construction process. It’s organic, and often messy.
It’s hard to change decisions you made early in the process of growing a system.
We’re more like gardeners.
There is no Up or Down in Software
A layered architecture, etc. is a schema we impose - but it’s one of many.
Top-down, bottom-up - just ways of looking at a system.
Connected or not-connected (information flow) - another way of looking at it
Alistair Cockburn - hexagonal architecture
Everything is an Object
Another way we impose schema on things.
Smalltalk implements “if” as sending a message to a Boolean value. It mimics control structures with polymorphism.
A Type is a Set of Values
int is obvious.
An interface allows us to narrow our design, because it reduces the set within a given context (rather than access to 15 methods on the concrete type, you only get the 2 on the interface).
Types and Tests are the Same Thing
Test failures may be reported the same as compilation failures in the IDE.
Tests say that a system will act in a particular way, as do types.
Sweetness Is Painful
If we design mechanisms to do multiple things in our code, we risk violating SRP. Design elegance can bind things together. (Hairdryer that turns off when we hang it up in a hotel.)
You Can’t Make Software Stronger With Redundancy
We can get better uptime, but we can’t get more correctness.
N-version programming didn’t work well.
Protection is a Social Problem Not a Technical Problem
Sealed/final modifiers work poorly.
API design locks consumers in; they can’t extend things in the way they want.
Maybe the system isn’t testable.
We can’t foresee all the circumstances in which people will want to use our API.
On both sides of an interface we’ll have cruft.
There Are No Requirements (It’s All Design)
Have developers question the business. Are you sure you want to do this?
Names Come After Structure
For him, he thinks about actors and notifications. The names come later.
The Physical Shapes The Logical
MapReduce; Eventually Consistent
Consider the physical architecture first.
The Social Shapes Design
Tragedy of Commons - no one feels ownership (happens in software, too)
Consider changing team structure to target the current goals.
You Can Design Away Errors
You can make certain errors impossible. Strive for that.
Capture errors in the input before proceeding (don’t have to keep checking)
Design-by-contract - if doing, try to make the contracts simpler and simpler
Logging is a Design Failure
A method that could be 15 lines is expanded out to 30 because of logging.
Can cripple systems.
Freeman & Pryce - Growing Object Oriented Software, Guided by Tests (book)
We design our pieces to notify someone about their internal state.
Notifiers (similar to Observer Pattern)
Databases are a Design Tool
Try to do selective SELECTs.
Globals Hide Design Information
Sneaky action at a distance. Silent coupling.
Explicit constraint - this is a resource, and I’ll pass it to where it needs to be used
You learn that there are only these places that need this resource.
Better separation of concerns.
If people can touch something anyplace, often they will touch it anyplace.
Standards Dampen Design
Hard rules (cyclomatic complexity of X, conventions, etc.) may prevent people from making exceptions when it makes sense
Rules of thumb are better
Control can be counterproductive.
Context is King
If you’re talking about best practice without discussing the context, it’s just B.S.
(This goes back to Standards Dampen Design.)
Response to functional programming question from audience:
Exposure of data isn’t as big of a deal if it’s immutable.
Design can become compositional.
Patterns - it’s a language/vocabulary, so it has a name.
- it has a context.
We’re not trying to be productive, we’re trying to be effective.
In software, we want to produce the least amount of code/software that will solve the problem.
Fear -> Risk -> Process -> Hate
Risk is multidimensional. For instance, Likelihood and Impact.
Likelihood is probability - number from 0.0 to 1.0.
Impact goes to infinity (disaster).
Agile - what if we could minimize impact? Testing, CI, etc.
Another axis - Context/Stakeholder (security, regulatory, etc.)
We’re not following the Agile Manifesto.
BDD frameworks are tools.
Executable specifications are documentation.
We crave certainty! Faith becomes religion.
Complex questions become simplistic answers. In a Zen koan, the point is to struggle with the QUESTION.
Interpretation becomes dogma.
We would rather be wrong than uncertain.
We resist uncertainty of scope, technology, effort, & structure.
Proven methodologies, officially sanctioned frameworks, etc.
We’d rather have precision, even if it’s completely inaccurate.
The Hourglass: A model of change (Seth Thompson)
You are probably in Stage 2.
Three Ages: A model of growth
These are not phases; each may apply to an aspect of my project.
Agile methods optimize for the Second Age.
As such, they are systemically resisting discovery.
It does take us out of chaos into something that we can reason about.
Side note - you can’t get good at estimating.
Real Options (Matz, etc.)
Option - the right, but not the obligation, to trade something in the future.
Insurance (betting against yourself)
Apply this to every decision.
Options have value
“Never commit early unless you know why”
You will discover stuff as you go through a project (accidental discovery); the sound of that is “Oh crap.” They occur far too late for you to do anything.
Ignorance is your biggest constraint.
You are second order ignorant (you don’t know that you don’t know).
Ignorance is multivariate and disjoint.
Some unexpected bad things will happen. (Non-zero.) (We plan like there will be zero.) (These are things that are not on the risk log.)
Embracing uncertainty of:
Scope - explore, do things deliberately
Technology - learn, try another language/paradigm - options
Effort - get it in front of people very early on, bring the risk forward
Structure - fluid teams/organizations (specialists/enablers who consult with teams and help them!)
Expect the unexpectable.
Anticipate ignorance. (Can you increase the likelihood of serendipity within your organization? What if someone on another team has already solved your issue? How can you find out about that?)
Development Central is the blog of Bill Sorensen, a professional software developer. Much of this will relate to C#, .NET, and OOP in general.
These postings are provided "AS IS" with no warranties and confer no rights.