WealthLab - How-many-Hidden-Layers-and-Hidden-Neurons

2021-05-08T22:27:40Z - 2021-05-08T22:27:40Z ago

#1

How do you define hidden layers and neurons?

0

2021-05-08T22:35:39Z - 2021-05-08T22:35:39Z ago

#2

From an information theory point of view, the middle layer (hidden layer) shouldn't have more nodes than the number of inputs. Do do so would only be over fitting (and destabilizing) the model. I would think of it as one middle-layer node for each orthogonal input to maximize the "par" (i.e. R² fit) of the model. I would let this be the default setting. Knowledgeable users can manually override this at the risk of over fitting the model. Over fitting will significantly diminish the forecasting ability of the model with unseen data.

Now you can apply as much training data as you want. What that does--in effect--is increase the precision of the weights going to and from the middle layer, which is a good thing without over fitting the model if "proper model par" (i.e. #of middle-layer nodes) is maintained.

1

superticker8

2021-05-08T22:57:46Z - 2021-05-08T22:57:46Z ago

#3

QUOTE:
How do you define hidden layers and neurons?

Wikipedia has a crash description (about 15 pages) discussing NN design. The first 3-4 pages should cover your topology questions.
https://en.wikipedia.org/wiki/Artificial_neural_network

0

Carova8

2021-05-08T23:07:52Z - 2021-05-08T23:07:52Z ago

#4

QUOTE:
I would think of it as one middle-layer node for each orthogonal input to maximize the "par" (i.e. R² fit) of the model. I would let this be the default setting. Knowledgeable users can manually override this at the risk of over fitting the model.

I believe this is only generally true, HOWEVER I know that the code that Glitch is using has L1/L2 regularization (and may also have Dropout Regularization [Glitch please chime in here]), and may also permit noise injection into the hidden layers. Why do I mention all of this? If that is the case the heuristic approach that you suggest @superticker may not be valid. I do believe under the right conditions, with the correct NN algorithm, that the maximum number of neurons for the hidden layers is closer to "n-squared" inputs. This allows the coding of some "second-order" non-linear characteristics contained in the data. (non-linear cross-products)

The question of the number of hidden layers is more of a question of training speed. In the absence of GPU support, I doubt that anything more than 2-3 layers could be trained in a reasonable amount of time. Using a GPU I have experimented with training complex systems using a dozen layers in a reasonable time period.

Vince

1

superticker8

2021-05-08T23:31:30Z - 2021-05-08T23:31:30Z ago

#5

QUOTE:
maximum number of neurons for the hidden layers is closer to "n-squared" inputs

So you're saying if one has six orthogonal inputs, you have 36 degrees of freedom in the model? I totally disagree with that. Sorry. Such an over-fitted model would have poor forecasting for unseen data.

You can't pull significant terms from thin air with some kind of transform (or indicator). It doesn't work that way. The raw orthogonal data defines the number of significant terms. You can reduce noise, and that will improve the precision of the fit (particular solution), but that won't change the degrees of freedom of the fit (form of the general solution).

----
But the point we are all making is that the choice of the number of middle-layer (hidden) nodes should be user adjustable. I think we all agree with this (for one reason or another). :-)

0

Carova8

2021-05-09T00:00:00Z - 2021-05-09T00:00:00Z ago

#6

If we were dealing with linear systems I would be in total agreement with you superticker. But orthogonality is a linear construct that I do not believe belongs in the discussion of modeling of high-dimensionality nonlinear systems.

In early discussions of NNs (30-40 years ago), Principal Component Analysis (PCA), a linear process, was touted as a great way to "reduce" the dimensionality of the input data being used to train an NN. It has only been in the last 20 years that people have discovered the concept of "non-linear correlation", sometimes called "local correlation", (which has really been an outgrowth of modern ML models which have shown that they can extract more info from the data than previously thought). As a side note, this is why "trees", particularly Leo Breiman's Random Forests, are so effective at modeling high-dimensional non-linear systems, as almost all of the Kaggle competitions have demonstrated.

I realize that in the current absence of rigorous models underlying the foundations of ML that we must use a number of heuristics to build models effectively, but I think that we risk taking too many of our notional experiences of the modeling of linear systems into the process.

Vince

0

Carova8

2021-05-09T00:02:01Z - 2021-05-09T00:02:01Z ago

#7

QUOTE:
But the point we are making is that the choice of the number of middle-layer (hidden) nodes should be user adjustable. I think we all agree with this (for one reason or another). :-)

TOTAL AGREEMENT! :)

Vince

0

Carova8

2021-05-09T00:06:29Z - 2021-05-09T00:06:29Z ago

#8

Comment:

QUOTE:
Degrees of Freedom

- This again is a carryover from linear modeling which I feel is not particularly useful in most ML modeling systems that look to tease out and exploit "hidden" nonlinearities.

Vince

0

superticker8

2021-05-09T00:16:26Z - 2021-05-09T00:16:26Z ago

#9

QUOTE:
I realize that in the current absence of rigorous models underlying the foundations of ML that we must use a number of heuristics to build models effectively,...

I follow your point. Perhaps I'm not willing to bet my money on subtle data behavior or computational heuristics. I can't estimate/control my risk when heuristics are involved.

And how to apply heuristics (or risk for that matter) to investing is a religious issue with the individual investor. More reason why the number of middle-layer nodes should be adjustable.

1

Carova8

2021-05-09T00:57:18Z - 2021-05-09T00:57:18Z ago

#10

Good discussion superticker!

Glitch, are you sorry now that you asked?? ;)

Vince

0

pestocat28

2021-05-09T00:59:14Z - 2021-05-09T00:59:14Z ago

#11

In my strategy I use 8 indicators and each with a weighting of one and at each bar I take a poll of the pluses and minuses, add them up send a buy signal when the plus poll is greater than the minus poll. Is this something that would used here or don't I know what I'm talking about and this has nothing to do with hidden layers and nodes. From the discussion it would seem that I have 64 layers here.

0

Glitch8

( 11.27% )

2021-05-09T01:04:21Z - 2021-05-09T01:04:21Z ago

#12

Vince, no, this is exactly the kind of discussion I was hoping to ignite!
Pesto, when I said layers and neurons I was talking more specifically about those things in relation to neural network architecture.

0

Carova8

2021-05-09T01:17:12Z - 2021-05-09T01:17:12Z ago

#13

QUOTE:
Vince, no, this is exactly the kind of discussion I was hoping to ignite!

Well then, you got what you wanted!! :)

Vince

0

Replikant_m8

2021-05-09T07:14:34Z - 2021-05-09T07:14:34Z ago

#14

Isn't layers count and neurons per layer count an option that is up to user?

As usual, you can experiment with your architecture - changing layers types, neurons count, and activation function having some affect, but the best effect you will get with downloading state-of-the-art model)), like transformers or something.

So having built in NN templates (architechtures) for some kind of SotA would be good.

1

Carova8

2021-05-09T14:42:20Z - 2021-05-09T14:42:20Z ago

#15

(It took me a bit of time to remember where I had read this...)

QUOTE:
Empirically, greater depth does seem to result in better generalization for a wide variety of tasks. […] This suggests that using deep architectures does indeed express a useful prior over the space of functions the model learns.

Source: Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville, page 201

Glitch, this further makes a case for adding GPU support to WL7 eventually.

Vince

2

MustPlayOptions8

2021-05-09T15:59:09Z - 2021-05-09T15:59:09Z ago

#16

Is anyone still using fully connected layers for time series and stock data analysis or anything else for that matter? I would think using convnets on graphs, transformer networks or RNNs for time series data, or other current architectures would be more up to date.

0

Glitch8

( 11.27% )

2021-05-09T17:03:26Z - 2021-05-09T17:03:26Z ago

#17

MustPlayOptions, would love to see some of your ideas developed perhaps as a third party extension? Hopefully NeuroLab will provide some value despite your complete dismissal. 🤷🏼‍♂️

0

Carova8

2021-05-09T17:14:19Z - 2021-05-09T17:14:19Z ago

#18

Modern NN architectures use many techniques to reduce the connection weights (to zero, in the case of Dropout Regularization) to help improve generalization performance, so "fully-connected layers" are only of historical interest.

It really depends what you are trying to do with your model. If it is a prediction, then architectures such as RNNs, are most appropriate. If you are looking to do "stock analysis", such as ranking (regression), then that is much closer to classification. NN and trees seem to work best in those cases.

The problem defines the approach.

Vince

0

Glitch8

( 11.27% )

2021-05-09T18:09:40Z - 2021-05-09T18:09:40Z ago

#19

NeuroLab will try to predict an output time series based on fully connected layers. Maybe it’s out of fashion but I’ve seen encouraging results so far. Rather than guessing how indicators may or may not be predictive we can let NeuroLab figure it out. Plus the interface is solid and can serve as a good base to push forward with other techniques post release.

1

Carova8

2021-05-09T19:36:06Z - 2021-05-09T19:36:06Z ago

#20

Glitch,

While the NN will start fully connected, during the Training process Regularization will attempt to drive all of the weights towards "0", and only those weights that are continually reinforced will survive. I suspect that very few well-trained nets will be anything close to fully connected.

Vince

1

Glitch8

( 11.27% )

2021-05-09T20:08:04Z - 2021-05-09T20:08:04Z ago

#21

Makes sense, thanks Vince!

0

MustPlayOptions8

2021-05-09T23:40:15Z - 2021-05-09T23:40:15Z ago

#22

LOL Glitch. I'm not dismissing, just saying there are much stronger things out there. And I wish I could program that stuff in C# but it's all already available in Python and I'm having a hard enough time getting things to work the way they did in WL6.

With or without dropout, the point about a FCN is not how sparse it is but how much potential it has at generalizable pattern recognition. They pale in comparison to transformer networks, reinforcement learning networks, etc.

0

Carova8

2021-05-09T23:51:49Z - 2021-05-09T23:51:49Z ago

#23

MPO,

QUOTE:
there are much stronger things out there

Agreed, but not in retail-market trading software. In the professional market much of what you describe is commonplace, but for most people the cost of that software is out of reach. Integrating better ML software into WL7 is probably not trivial, especially in a multiprocessing environment. (I am not a good C# programmer, so this is based on discussions with more knowledgeable folks). I hope that the suite of ML software increases with time, but Glitch has a LOT on his plate at the moment.

Vince

0

MustPlayOptions8

2021-05-10T03:10:24Z - 2021-05-10T03:10:24Z ago

#24

I don't disagree and wasn't asking for that support at this time per se. There are a lot of other things I'd rather have first.

If I knew more about C# programming and dll's, I would know the answer to this question:

Is it possible to create a python library that can be accessed from C# in Wealth lab? If so then you don't really need specific ML support in WL.

0

Carova8

2021-05-10T11:29:18Z - 2021-05-10T11:29:18Z ago

#25

If you want to see see more ML capabilities in WL7, there is an item on the WishList, "Additional Machine Learning Algorithms and GPU support" that anyone can vote to improve its status.

Vince

1

mdosey8

2021-09-26T06:18:07Z - 2021-09-26T06:18:07Z ago

#26

As someone trying out the NeuroLab extension for the first time, and new to Neural Networks, it's difficult to tackle the configuration of the hidden layers. I appreciate superticker's suggestion to keep the number of hidden layers from exceeding the number of inputs, as it at least provides a starting point. Is there any similar rule-of-thumb when choosing the number of hidden neurons?

0

mjj38

2021-09-26T14:51:43Z - 2021-09-26T14:51:43Z ago

#27

Perhaps, it would be worthwhile to have python to .net functionality so Dion doesn't have to recreate the wheel for everything. Have any of you looked into using IronPython?

https://ironpython.net/

0